您的位置:首页 > 运维架构 > Tomcat

Ubuntu环境下Nutch+Tomcat 搭建简单的搜索引擎

2015-01-21 16:50 399 查看
简易的搜索引擎搭建

我的配置:

Nutch:1.2

Tomcat:7.0.57

[b]1 Nutch设置[/b]

修改Nutch配置

1.1 修改conf/nutch-site.xml

Metadata metaData = bean.getParseData(details).getContentMeta();
ParseData ParseData = bean.getParseData(details);
String content = null;
// String contentType = (String) metaData.get(Metadata.CONTENT_TYPE);
String contentType = ParseData.getMeta(Metadata.CONTENT_TYPE);
if (contentType.startsWith("text/html")) {
// FIXME : it's better to emit the original 'byte' sequence
// with 'charset' set to the value of 'CharEncoding',
// but I don't know how to emit 'byte sequence' in JSP.
// out.getOutputStream().write(bean.getContent(details)) may work,
// but I'm not sure.
//String encoding = (String) metaData.get("CharEncodingForConversion");
String encoding = ParseData.getMeta("CharEncodingForConversion");
if (encoding != null) {
try {
content = new String(bean.getContent(details), encoding);
}
catch (UnsupportedEncodingException e) {
// fallback to windows-1252
content = new String(bean.getContent(details), "windows-1252");
}
}
else
content = new String(bean.getContent(details),"GBK");
//content = new String(bean.getContent(details));


View Code

[b]3 开始实验[/b]

重启tomcat

通过浏览器访问:http://localhost:8080/nutch-1.2
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: