配置nutch
2016-01-05 17:42
405 查看
配置nutch
(nutch文件夹已在/home目录下)
1. 修改系统环境变量
//增加
2. 测试(nutch/runtime/local/bin中./nutch & ./crawl)
(nutch文件夹已在/home目录下)
1. 修改系统环境变量
sudo gedit /etc/profile
//增加
#set nutch export PATH=/home/nutch/runtime/local/bin:$PATH
2. 测试(nutch/runtime/local/bin中./nutch & ./crawl)
nutch
//结果如下: Usage: nutch COMMAND where COMMAND is one of: inject inject new urls into the database hostinject creates or updates an existing host table from a text file generate generate new batches to fetch from crawl db fetch fetch URLs marked during generate parse parse URLs marked during fetch updatedb update web table after parsing updatehostdb update host table after parsing readdb read/dump records from page database readhostdb display entries from the hostDB elasticindex run the elasticsearch indexer solrindex run the solr indexer on parsed batches solrdedup remove duplicates from solr parsechecker check the parser for a given url indexchecker check the indexing filters for a given url plugin load a plugin and run one of its classes main() nutchserver run a (local) Nutch server on a user defined port junit runs the given JUnit test or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters.
crawl
//结果如下: Missing seedDir : crawl <seedDir> <crawlID> <solrURL> <numberOfRounds>
相关文章推荐
- 配置nutch
- 修改DateTime ToString的默认格式
- OpenTsdb 核心设计
- 供参考使用的Microsoft Office 2010 Professional Plus
- linux学习笔记----1
- 学习笔记——支持向量机svm(2)对偶问题
- fork
- Android编译选项eng、user、userdebug的区别
- ORA-38760: This database instance failed to turn on flashback database
- Centos7安装haproxy
- URL特殊字符的转义
- 安卓自定义View基础-坐标系
- swift元组的理解和讲解
- (转)Nginx + uwsgi + web.py + MySQLdb
- apache开源项目--Apache Drill
- 【iOS开发】从Cocoa框架说开去--Fundation框架系列
- onTouchEvent 深入理解
- i386、amd64、i686...
- leetcode 总结part1
- PDF的加密解密,解密后随便怎么玩