Scrapy

2019年1月28日 (一) 07:45的最后版本

Scrapy 是一个Python语言编写的开源(BSD)网络爬虫软件。

@@ 第2行： / 第2行： @@
 ==指南==
+==项目==
+*[https://github.com/gnemoug/distribute_crawler distribute_crawler] 使用Scrapy, [[Redis]], [[MongoDB]], [http://graphiteapp.org/ Graphite] 实现的一个分布式网络爬虫。
+*[https://github.com/istresearch/scrapy-cluster Scrapy Cluster]
+*[https://github.com/rmax/scrapy-redis scrapy-redis]
+*[https://github.com/LiuXingMing/SinaSpider SinaSpider] 新浪微博爬虫（Scrapy、Redis）
+*[https://github.com/LiuXingMing/QQSpider QQSpider] QQ空间爬虫（日志、说说、个人信息）
+*[https://github.com/LiuXingMing/Tmall1212 Tmall1212] 天猫双12爬虫，附商品数据。
+==文档==
+*[http://docs.huihoo.com/infoq/qconshanghai/2015/%e5%85%ac%e6%9c%89%e4%ba%91%e6%9c%8d%e5%8a%a1%e4%b8%8e%e5%9f%ba%e7%a1%80%e8%ae%be%e6%96%bd%e5%bb%ba%e8%ae%be%e4%b8%93%e5%9c%ba/QCon%e4%b8%8a%e6%b5%b72015-%e4%ba%92%e8%81%94%e7%bd%91%e4%bf%a1%e6%81%af%e8%8e%b7%e5%8f%96%e6%8a%80%e6%9c%af%e5%ae%9e%e8%b7%b5%e4%b8%8e%e4%ba%91%e7%ab%af%e7%88%ac%e8%99%ab%e5%85%bb%e6%88%90%e8%ae%b0-%e8%b4%b9%e8%89%af%e5%ae%8f.pdf 互联网信息获取技术实践与云端爬虫养成记]
 ==图集==
+<gallery>
+image:scrapy-architecture.png|架构
+image:Scrapy-Cluster-Architecture.png|集群架构
+</gallery>
 ==链接==
@@ 第12行： / 第27行： @@
 [[category:search engine]]
 [[category:web crawler]]
+[[category:python]]
+[[category:huihoo]]