您的位置：首页 > Web前端 > HTML

C#中可以使用正则表达式来过滤html字符

2008-05-22 11:26 609 查看

在C#中可以使用正则表达式来过滤html字符，比如，在验证用户输入时，为了保证安全性，就需要过滤html字符。

using System.Text.RegularExpressions;

Regex.Replace(htmlcode ,"<[^>]+>","");

解释一下：< 代表以 "< "开头

[^>] 其中[^...] 就是匹配任何字符，但不许匹配^之后紧跟的字符，也就是如果"<>" 出现在字符串中，是不会去过滤的，因为它部属于html标记.

然后就是那个 + 号，加号的意思就是匹配前面的至少一个搜索项

最后是 >，表示html标记以>结尾。

从客户端(Control_Message_SendBox1:dgrdSendBox:_ctl3:_ctl1="<div id="de" onclick...")中检测到有潜在危险的 Request.Form 值。

解决办法：

<pages validateRequest="false" />
也可以在webconfig加上
<pages validateRequest="false"/>

嵌入页面代码
<iframe frameborder="no" scrolling="no" width="100%" height="25" src="a.htm"
tabIndex="0">
</iframe>

替换，在HTML中，多个普通空格会作为一个空格来识别，所以用代码替换，具体看下面代码：

string Context = Content.Text.ToString();
Context=Context.Replace("<","<"); //过滤HTML代码
Context=Context.Replace(">",">");
Context=Context.Replace("/r","<BR>"); //回车
Context=Context.Replace(" "," "); //空格
Context=Context.Replace("/t"," "); //水平 Tab

写了一个类，用来过滤ASP.NET中用户输入
写的比较差。现在检查结果是通过返回值得形式给使用者的，其实还是用抛出异常的方式提示用户比较好，这样不用一次一次判断每一个函数的返回值，只需要一个try{}中包含所有的检查函数的调用，用一个catch捕获就可以了。

code

using System;
using System.Security.Cryptography;
using System.Text;
using System.Text.RegularExpressions;
using System.Xml;
using System.Web;

namespace InputSecurityCheck
{
/// <summary>
/// UserInputCheck 是一个用来检查用户输入有效性的类
/// </summary>
public class UserInputCheck
{
public UserInputCheck()
{
//
// TODO: 在此处添加构造函数逻辑
//
}

/// <summary>
/// 利用正则表达式匹配字符串的函数
/// </summary>
/// <param name="uncheckedString">待检查的字符串</param>
/// <param name="pattern">正则表达式</param>
/// <returns>
/// 匹配返回 true
/// 不匹配返回 false
/// </returns>
public static bool CheckString(string uncheckedString,string pattern)
{
string strpattern = pattern;
Regex regex = new Regex(strpattern,RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline | RegexOptions.IgnoreCase);
Match match = regex.Match(uncheckedString);
if (match.Success)
{
return true;
}
else
{
return false;
}
}
/// <summary>
/// 检查字符串是否为纯数字
/// </summary>
/// <param name="strUnChecked">待检查的字符串</param>
/// <returns>
/// 是返回 true
/// 否返回 false
/// </returns>
public static bool IsNumeric(string strUnChecked)
{
return CheckString(strUnChecked,@"^/d+$");
}
/// <summary>
/// 检查字符串中是存在有可能导致Sql Injection问题的字符,包括' " ; -
/// </summary>
/// <param name="strUnChecked">待检查的字符串</param>
/// <returns>
/// 存在非法字符返回 true
/// 不存在返回 false
/// </returns>
public static bool IsNonlicetChar(string strUnChecked)
{
return !CheckString(strUnChecked,@"^[^""';-]+$");
}
/// <summary>
/// 检查字符串是否是纯英文字母
/// </summary>
/// <param name="strUnChecked">待检查的字符串</param>
/// <returns>
/// 全部是英文字符返回 true
/// 存在非英文字符以外的字符返回 false
/// </returns>
public static bool IsEnglishChar(string strUnChecked)
{
return CheckString(strUnChecked,@"^[A-Za-z]+$");
}
/// <summary>
/// 检查IP地址的有效性
/// </summary>
/// <param name="strUnChecked">待检查的字符串</param>
/// <returns>
/// 有效返回 true
/// 无效返回 false
/// </returns>
public static bool IsIpAdderessFormat(string strUnChecked)
{
return CheckString(strUnChecked,@"^([01]?/d/d?|2[0-4]/d|25[0-5])/.([01]?/d/d?|2[0-4]/d|25[0-5])/.([01]?/d/d?|2[0-4]/d|25[0-5])/.([01]?/d/d?|2[0-4]/d|25[0-5])$");
}
/// <summary>
/// 检查字符串是否只包含英文字母和数字
/// </summary>
/// <param name="strUnChecked">待检查的字符串</param>
/// <returns>
/// 是返回 true
/// 否返回 false
/// </returns>
public static bool IsCharNumberAndUnderLine(string strUnChecked)
{
return CheckString(strUnChecked,@"^[A-Za-z0-9_]+$");
}
}
}

//vb

一、清楚内容中的Javsscript 代码

1 Function ClearJSCode(originCode)
2
3 Dim reg
4
5 set reg = New RegExp
6
7 reg.Pattern = "<SCRIPT[^<]*</SCRIPT>"
8 reg.IgnoreCase = True
9 reg.Global = True
10
11 clearJSCode = reg.Replace(originCode, "")
12
13 End Function
14
二、清除内容中的HTML代码

1 Function ClearHTMLCode(originCode)
2
3 Dim reg
4 set reg = new RegExp
5
6 reg.Pattern = "<[^>]*>"
7 reg.IgnoreCase = True
8 reg.Global = True
9
10 ClearHTMLCode = reg.Replace(originCode, "")
11
12 End Function
13

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航