VSCrawler 爬取美女图片
2017-06-27 09:53
453 查看
主函数CrawlDemo.java
EmptyPipeline.java 实现Pipeline接口
实测在https://www.meitulu.com网站中不到半分钟能畅通无阻爬取近600张妹纸图,效率不错!
import com.google.common.io.Files; import com.virjar.sipsoup.parse.XpathParser; import com.virjar.vscrawler.core.VSCrawler; import com.virjar.vscrawler.core.VSCrawlerBuilder; import com.virjar.vscrawler.core.event.support.AutoEvent; import com.virjar.vscrawler.core.event.support.AutoEventRegistry; import com.virjar.vscrawler.core.event.systemevent.SeedEmptyEvent; import com.virjar.vscrawler.core.net.session.CrawlerSession; import com.virjar.vscrawler.core.processor.CrawlResult; import com.virjar.vscrawler.core.processor.SeedProcessor; import com.virjar.vscrawler.core.seed.Seed; import com.virjar.vscrawler.core.util.PathResolver; import org.apache.commons.lang3.StringUtils; import org.jsoup.Jsoup; import java.io.File; import java.io.IOException; public class CrawlDemo { public static void main(String[] args) throws IOException { VSCrawler vsCrawler = VSCrawlerBuilder.create().addPipeline(new EmptyPipeline()) .setProcessor(new SeedProcessor() { private void handlePic(Seed seed, CrawlerSession crawlerSession) { byte[] entity = crawlerSession.getCrawlerHttpClient().getEntity(seed.getData()); if (entity == null) { seed.retry(); return; } try { Files.write(entity, // 文件根据网站,路径,base自动计算 new File(PathResolver.onlySource("E:/testpic", seed.getData()))); } catch (IOException e) { e.printStackTrace(); } } public void process(Seed seed, CrawlerSession crawlerSession, CrawlResult crawlResult) { if (StringUtils.endsWithIgnoreCase(seed.getData(), ".jpg")) { handlePic(seed, crawlerSession); } else { String s = crawlerSession.getCrawlerHttpClient().get(seed.getData()); if (s == null) { seed.retry(); return; } // 将下一页的链接和图片链接抽取出来 crawlResult.addStrSeeds(XpathParser .compileNoError( "/css('#pages a')::self()[contains(text(),'下一页')]/absUrl('href') | /css('.content')::center/img/@src") .evaluateToString(Jsoup.parse(s, seed.getData()))); } } }).build(); // 清空历史爬去数据,或者会断点续爬 vsCrawler.clearTask(); vsCrawler.pushSeed("https://www.meitulu.com/item/2125.html"); vsCrawler.pushSeed("https://www.meitulu.com/item/6892.html"); vsCrawler.pushSeed("https://www.meitulu.com/item/2124.html"); vsCrawler.pushSeed("https://www.meitulu.com/item/2120.html"); vsCrawler.pushSeed("https://www.meitulu.com/item/2086.html"); vsCrawler.pushSeed("https://www.meitulu.com/item/2066.html"); vsCrawler.addCrawlerStartCallBack(new VSCrawler.CrawlerStartCallBack() { public void onCrawlerStart(final VSCrawler vsCrawler) { AutoEventRegistry.getInstance().registerEvent(ShutDownChecker.class); AutoEventRegistry.getInstance().registerObserver(new ShutDownChecker() { public void checkShutDown() { // 15s之后检查活跃线程数,发现为0,证明连续10s都没用任务执行了 if (vsCrawler.activeWorker() == 0 && (System.currentTimeMillis() - vsCrawler.getLastActiveTime()) > 10000) { System.out.println("尝试停止爬虫"); vsCrawler.stopCrawler(); } } }); AutoEventRegistry.getInstance().registerObserver(new SeedEmptyEvent() { public void onSeedEmpty() {// 如果收到任务为空消息的话,尝试停止爬虫 // 发送延时消息,当前收到了任务为空的消息,产生一个发生在15s之后发生的事件, AutoEventRegistry.getInstance().createDelayEventSender(ShutDownChecker.class, 15000).delegate() .checkShutDown(); } }); } }); // 开始爬虫 vsCrawler.start(); } interface ShutDownChecker { @AutoEvent void checkShutDown(); } }
EmptyPipeline.java 实现Pipeline接口
import java.util.Collection; import com.virjar.vscrawler.core.seed.Seed; import com.virjar.vscrawler.core.serialize.Pipeline; public class EmptyPipeline implements Pipeline { public void saveItem(Collection<String> itemJson, Seed seed) { System.out.println(seed.getData() + " 处理完成"); } }
实测在https://www.meitulu.com网站中不到半分钟能畅通无阻爬取近600张妹纸图,效率不错!
相关文章推荐
- Python程序员闲暇时的写的网上抓取美女图片,真是缺女票!
- Python2.7抓取豆瓣美女图片
- python美女图片抓取
- 美女图片调色
- 用python写一个美女图片爬虫
- 专抓猫扑美女贴图版!21000张图片浏览~
- python爬取美女图片
- python爬取网站美女图片
- Scrapy爬取美女图片 (原创)
- Photoshop将美女人体图片打造出禁烟公益广告海报效果
- 【爬虫】python+urllib+beautifusoup爬取花瓣网美女图片
- 在imageView依次加入7个手势, 1.点击哪个button,往imageView上加入哪个手势.(保证视图上仅仅有一个手势). 2.轻拍:点击视图切换美女图片.(imageView上首先展示的美女
- 漂亮美女图片屏保
- 使用BEGAN生成美女图片
- c#实现抓取高清美女妹纸图片
- UIKit基础:11.利用UISlider-UISwitch-UIStepper-UIImageView创建美女图片浏览器
- 美女图片桌面高清壁纸1280*800
- Photoshop为美女图片打造出超酷的火焰壁纸效果
- Scrapy爬取美女图片续集 (原创)