C#使用正则表达式过滤html标签_C#

C#使用正则表达式过滤html标签

2021-12-03 15:24WeihanLi C#

最近在开发一个项目，其中有需求要求我们把一段html转换为一般文本返回，使用正则表达式是明智的选择，下面小编给介绍下C#使用正则表达式过滤html标签，需要的朋友参考下

在项目中遇到这样一个需求，需要将一段html转换为一般文本返回，万能的正则表达式来了。

正则表达式来拯救你，代码如下：

				?

									public static string Html2Text(string htmlStr)

									{

									if (String.IsNullOrEmpty(htmlStr))

									{

									return "";

									}

									string regEx_style = "<style[^>]*?>[\\s\\S]*?<\\/style>"; //定义style的正则表达式 

									string regEx_script = "<script[^>]*?>[\\s\\S]*?<\\/script>"; //定义script的正则表达式 

									string regEx_html = "<[^>]+>"; //定义HTML标签的正则表达式 

									htmlStr = Regex.Replace(htmlStr, regEx_style, "");//删除css

									htmlStr = Regex.Replace(htmlStr, regEx_script, "");//删除js

									htmlStr = Regex.Replace(htmlStr, regEx_html, "");//删除html标记

									htmlStr = Regex.Replace(htmlStr, "\\s*|\t|\r|\n", "");//去除tab、空格、空行

									htmlStr = htmlStr.Replace(" ", "");

									htmlStr = htmlStr.Replace(""", "");//去除异常的引号" " "

									htmlStr = htmlStr.Replace(""", "");

									return htmlStr.Trim();

									}