您的位置:首页 > Web前端 > HTML

C#从html网页内容中提取指定个数的汉字

2016-08-11 20:42 267 查看
<span style="white-space:pre">	</span><strong><span style="color:#6633ff;">提取html网页中指定个数的汉字</span></strong>
<span style="white-space:pre">	</span>/// <summary>
/// 返回指定数量的汉字
/// </summary>
/// <param name="content">通知或文章内容</param>
/// <param name="num">返回汉字的数量</param>
/// <returns></returns>
public static string getProContent(string content, int num) {

string result = "";
if (string.IsNullOrEmpty(content))
return "";
//去除\r\n\t
result = content.Replace("\r", " ");
result = result.Replace("\n", " ");
result = result.Replace("\t", " ");

//去除<>以内的内容
result = Regex.Replace(result, @"<[^>]*>", string.Empty, RegexOptions.IgnoreCase);

//去掉特殊转义字符
result = Regex.Replace(result, @"&", "&", RegexOptions.IgnoreCase);
result = Regex.Replace(result, @" ", " ", RegexOptions.IgnoreCase);
result = Regex.Replace(result, @"<", "<", RegexOptions.IgnoreCase);
result = Regex.Replace(result, @">", ">", RegexOptions.IgnoreCase);
result = Regex.Replace(result, @"&(.{2,6});", string.Empty, RegexOptions.IgnoreCase);

//去除多余的空行空格
result = Regex.Replace(result, @" ( )+", " ");
result = Regex.Replace(result, "(\r)( )+(\r)", "\r\r");
result = Regex.Replace(result, @"(\r\r)+", "\r\n");

if (result.Length < num) {
return result;
}
return result.Substring(0, num) + "...";
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息