您的位置:首页 > 其它

配置nutch

2016-01-05 17:42 405 查看
配置nutch
(nutch文件夹已在/home目录下)

1. 修改系统环境变量

sudo gedit /etc/profile


//增加

#set nutch
export PATH=/home/nutch/runtime/local/bin:$PATH


2. 测试(nutch/runtime/local/bin中./nutch & ./crawl)

nutch


//结果如下:
Usage: nutch COMMAND
where COMMAND is one of:
inject		inject new urls into the database
hostinject     creates or updates an existing host table from a text file
generate 	generate new batches to fetch from crawl db
fetch 		fetch URLs marked during generate
parse 		parse URLs marked during fetch
updatedb 	update web table after parsing
updatehostdb   update host table after parsing
readdb 	read/dump records from page database
readhostdb     display entries from the hostDB
elasticindex   run the elasticsearch indexer
solrindex 	run the solr indexer on parsed batches
solrdedup 	remove duplicates from solr
parsechecker   check the parser for a given url
indexchecker   check the indexing filters for a given url
plugin 	load a plugin and run one of its classes main()
nutchserver    run a (local) Nutch server on a user defined port
junit         	runs the given JUnit test
or
CLASSNAME 	run the class named CLASSNAME
Most commands print help when invoked w/o parameters.


crawl


//结果如下:
Missing seedDir : crawl <seedDir> <crawlID> <solrURL> <numberOfRounds>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: