Larbin learnin(3)——how to limit the scope of crawling
2010-04-07 14:00
453 查看
1 In larbin.conf, close to crawle external sites except in the list of startURL by setting noExternalLinks.
# do you want to follow external links,
noExternalLinks
2 The way to improve the crawlling speed
【引】http://hi.baidu.com/hustwk/blog/item/fd3325dde12598dc8c1029ef.html
1、将larbin.conf里面的waitDuration设置为1,这里不再考虑polite^_^,
设置为1大多数网站其实还能忍受;
2、将types.h里面的maxUrlsBySite修改为254;
3、将main.cc里面的代码做如下修改:
// see if we should read again urls in fifowait
if ((global::now % 30) == 0
) {
global::readPriorityWait =
global::URLsPriorityWait->getLength();
global::readWait = global::URLsDiskWait->getLength();
}
if ((global::now % 30) == 15)
{
global::readPriorityWait = 0;
global::readWait = 0;
}
# do you want to follow external links,
noExternalLinks
2 The way to improve the crawlling speed
【引】http://hi.baidu.com/hustwk/blog/item/fd3325dde12598dc8c1029ef.html
1、将larbin.conf里面的waitDuration设置为1,这里不再考虑polite^_^,
设置为1大多数网站其实还能忍受;
2、将types.h里面的maxUrlsBySite修改为254;
3、将main.cc里面的代码做如下修改:
// see if we should read again urls in fifowait
if ((global::now % 30) == 0
) {
global::readPriorityWait =
global::URLsPriorityWait->getLength();
global::readWait = global::URLsDiskWait->getLength();
}
if ((global::now % 30) == 15)
{
global::readPriorityWait = 0;
global::readWait = 0;
}
相关文章推荐
- How to limit an array of similar hashes to those that have more than one of the same key:value pair
- How to change the output color of echo in Linux
- How to customize the context menus of a WebBrowser control via the IDocHostUIHandler interface.
- how to update config parameter of hadoop mapred-site.xml without restarting the cluster
- How to count the number of threads in a process on Linux
- OpenERP how to set the tree view limit
- Notes of “Quotient Cube: How to Summarize the Semantics of a Data Cube”
- how to change the port of tomcat
- How to change the URL of your SVN repository
- How-To Find the Source of "Unaligned Access"
- how to extend the base functionality of standard web controls - by Venugopal Mallarapu
- Inventory of the materials to teach you how to query a date certain combination of dimensions
- How to workaround the IE default behavior of re-posting the entire form when pressing "刷新"(or F5)
- How to Cheat in Photoshop CS3: The art of creating photorealistic montages
- uiview 某点返回 色值 How to get the color of a pixel in an UIView?
- How to open the dialog out of iframe by using jquery dialog
- How to view the free space of your partitions in Ubuntu
- How to draw an Icon on the IndicatorPane of Series 60
- [未完待续]对安装CCMake的一点理解(how to install the latest version of ccmake)
- How to check the usage of net ports in linux?