问答文章1 问答文章501 问答文章1001 问答文章1501 问答文章2001 问答文章2501 问答文章3001 问答文章3501 问答文章4001 问答文章4501 问答文章5001 问答文章5501 问答文章6001 问答文章6501 问答文章7001 问答文章7501 问答文章8001 问答文章8501 问答文章9001 问答文章9501

如何在ubuntu中安装scrapy

发布网友 发布时间:2022-04-23 14:43

我来回答

2个回答

热心网友 时间:2022-05-03 05:33

Scrapy是Python开发的一个快速,高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。 官网网站http://www.scrapy.org/
1、安装如下软件

sudo apt-get install build-essential;
sudo apt-get install python-dev;
sudo apt-get install libxml2-dev;
sudo apt-get install libxslt1-dev;
sudo apt-get install python-setuptools;
2、安装Scrapy

sudo easy_install Scrapy;
wang@ubuntu:/usr/local/lib/python2.7/dist-packages$ sudo easy_install Scrapy
Searching for Scrapy
Best match: Scrapy 0.16.1
Processing Scrapy-0.16.1-py2.7.egg
Scrapy 0.16.1 is already the active version in easy-install.pth
Installing scrapy script to /usr/local/bin

Using /usr/local/lib/python2.7/dist-packages/Scrapy-0.16.1-py2.7.egg
Processing dependencies for Scrapy
Searching for lxml
Reading http://pypi.python.org/simple/lxml/
Reading http://codespeak.net/lxml
Best match: lxml 3.0.1
Downloading http://pypi.python.org/packages/source/l/lxml/lxml-3.0.1.tar.gz#md5=0f2b1a063ab3b6b0944cbc4a9a85dcfa
Processing lxml-3.0.1.tar.gz
Running lxml-3.0.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-qibAzL/lxml-3.0.1/egg-dist-tmp-mSvUVN
Building lxml version 3.0.1.
Building without Cython.
Using build configuration of libxslt 1.1.26
Building against libxml2/libxslt in the following directory: /usr/lib/x86_64-linux-gnu
warning: no files found matching '*.txt' under directory 'src/lxml/tests'
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree__getFilenameForFile’:
src/lxml/lxml.etree.c:26310:7: warning: variable ‘__pyx_clineno’ set but not used [-Wunused-but-set-variable]
src/lxml/lxml.etree.c:26309:15: warning: variable ‘__pyx_filename’ set but not used [-Wunused-but-set-variable]
src/lxml/lxml.etree.c:26308:7: warning: variable ‘__pyx_lineno’ set but not used [-Wunused-but-set-variable]
src/lxml/lxml.etree.c: In function ‘__pyx_pf_4lxml_5etree_4XSLT_18__call__’:
src/lxml/lxml.etree.c:132608:81: warning: passing argument 1 of ‘__pyx_f_4lxml_5etree_12_XSLTContext__copy’ from incompatible pointer type [enabled by default]
src/lxml/lxml.etree.c:130569:52: note: expected ‘struct __pyx_obj_4lxml_5etree__XSLTContext *’ but argument is of type ‘struct __pyx_obj_4lxml_5etree__BaseContext *’
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree__copyXSLT’:
src/lxml/lxml.etree.c:133997:79: warning: passing argument 1 of ‘__pyx_f_4lxml_5etree_12_XSLTContext__copy’ from incompatible pointer type [enabled by default]
src/lxml/lxml.etree.c:130569:52: note: expected ‘struct __pyx_obj_4lxml_5etree__XSLTContext *’ but argument is of type ‘struct __pyx_obj_4lxml_5etree__BaseContext *’
src/lxml/lxml.etree.c: At top level:
src/lxml/lxml.etree.c:12128:13: warning: ‘__pyx_f_4lxml_5etree_displayNode’ defined but not used [-Wunused-function]
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDocFromFile’:
src/lxml/lxml.etree.c:86715:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDoc’:
src/lxml/lxml.etree.c:86403:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseUnicodeDoc’:
src/lxml/lxml.etree.c:86093:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]
src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDocFromFilelike’:
src/lxml/lxml.etree.c:86925:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized]
Adding lxml 3.0.1 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/lxml-3.0.1-py2.7-linux-x86_64.egg
Searching for w3lib>=1.2
Reading http://pypi.python.org/simple/w3lib/
Reading http://github.com/scrapy/w3lib
Best match: w3lib 1.2
Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9e
Processing w3lib-1.2.tar.gz
Running w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-ZAXTgy/w3lib-1.2/egg-dist-tmp-aU3vpc
zip_safe flag not set; analyzing archive contents...
Adding w3lib 1.2 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/w3lib-1.2-py2.7.egg
Searching for Twisted>=8.0
Reading http://pypi.python.org/simple/Twisted/
Reading http://www.twistedmatrix.com
Reading http://twistedmatrix.com/procts/download
Reading http://twistedmatrix.com/
Reading http://tmrc.mit.e/mirror/twisted/Twisted/9.0/
Reading http://tmrc.mit.e/mirror/twisted/Twisted/10.0/
Reading http://twistedmatrix.com/projects/core/
Reading http://tmrc.mit.e/mirror/twisted/Twisted/8.2/
Reading http://tmrc.mit.e/mirror/twisted/Twisted/8.1/
Best match: Twisted 12.2.0
Downloading http://pypi.python.org/packages/source/T/Twisted/Twisted-12.2.0.tar.bz2#md5=9a321b904d01efd695079f8484b37861
Processing Twisted-12.2.0.tar.bz2
Running Twisted-12.2.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-kw897y/Twisted-12.2.0/egg-dist-tmp-sZWFYb
In file included from /usr/include/python2.7/Python.h:8:0,
from twisted/internet/_sigchld.c:9:
/usr/include/python2.7/pyconfig.h:1161:0: warning: "_POSIX_C_SOURCE" redefined [enabled by default]
/usr/include/features.h:215:0: note: this is the location of the previous definition
twisted/internet/_sigchld.c: In function ‘got_signal’:
twisted/internet/_sigchld.c:15:13: warning: variable ‘ignored_result’ set but not used [-Wunused-but-set-variable]
Adding Twisted 12.2.0 to easy-install.pth file
Installing mailmail script to /usr/local/bin
Installing conch script to /usr/local/bin
Installing pyhtmlizer script to /usr/local/bin
Installing twistd script to /usr/local/bin
Installing lore script to /usr/local/bin
Installing tkconch script to /usr/local/bin
Installing tapconvert script to /usr/local/bin
Installing ckeygen script to /usr/local/bin
Installing tap2rpm script to /usr/local/bin
Installing manhole script to /usr/local/bin
Installing trial script to /usr/local/bin
Installing cftp script to /usr/local/bin
Installing tap2deb script to /usr/local/bin

Installed /usr/local/lib/python2.7/dist-packages/Twisted-12.2.0-py2.7-linux-x86_64.egg
Finished processing dependencies for Scrapy
表示安装成功。

3、测试

scrapy shell http://ziki.cn
获取所有a标签

hxs.select('//a').extract()
参考资料

http://doc.scrapy.org/en/latest/intro/install.html
http://doc.scrapy.org/en/latest/intro/tutorial.html

热心网友 时间:2022-05-03 06:51

这是一款提取网站数据的开源工具。Scrapy框架用Python开发而成,它使抓取工作又快又简单,且可扩展。我们已经在virtual box中创建一台虚拟机(VM)并且在上面安装了Ubuntu 14.04 LTS。
安装 Scrapy

Scrapy依赖于Python、开发库和pip。Python最新的版本已经在Ubuntu上预装了。因此我们在安装Scrapy之前只需安装pip和python开发库就可以了。
pip是作为python包索引器easy_install的替代品,用于安装和管理Python包。pip包的安装可见图 1。
sudo apt-get install python-pip

图:1 pip安装
我们必须要用下面的命令安装python开发库。如果包没有安装那么就会在安装scrapy框架的时候报关于python.h头文件的错误。
sudo apt-get install python-dev

图:2 Python 开发库
scrapy框架既可从deb包安装也可以从源码安装。在图3中我们用pip(Python 包管理器)安装了deb包了。
sudo pip install scrapy

图:3 Scrapy 安装
图4中scrapy的成功安装需要一些时间。

图:4 成功安装Scrapy框架
使用scrapy框架提取数据
关于Linux命令的介绍,看看《linux就该这么学》,具体关于这一章地址3w(dot)linuxprobe/chapter-02(dot)html
基础教程
我们将用scrapy从fatwallet.com上提取商店名称(卖卡的店)。首先,我们使用下面的命令新建一个scrapy项目“store name”, 见图5。
$sudo scrapy startproject store_name

图:5 Scrapy框架新建项目
上面的命令在当前路径创建了一个“store_name”的目录。项目主目录下包含的文件/文件夹见图6。
$sudo ls –lR store_name
声明声明:本网页内容为用户发布,旨在传播知识,不代表本网认同其观点,若有侵权等问题请及时与本网联系,我们将在第一时间删除处理。E-MAIL:11247931@qq.com
银耳茉莉汤制作要诀 茉莉银耳羹制作过程 银耳饮料茉莉银耳汤 发几张很好看的手机壁纸来 无纺壁纸的优缺点有哪些? 移动破碎机价格多少钱一台,移动破碎机需要办理什么手续 你知道窝瓜和倭瓜的区别是啥吗? 窝瓜和倭瓜是一种东西吗? 经常吃韭菜有什么危害 像素标定板 英雄无敌3里的被诅咒的大地,魔法平原,邪恶之雾等是什么 眼镜的鼻托变绿有什么办法清除 脾虚吃什么好?? 你是童话里的哪个公主? 我的眼镜戴着为啥镜框发绿?宝鸡哪里配眼镜质量好一点? 求第一张图的面具(好像是女的)原图,和这个全脸男面具头像的,全脸女面具头像。。配情侣跪求大神们 艺术的含义?在生活中有哪些应用 哈利波特第八本被诅咒的孩子主要讲了什么?麻烦剧透一下。 在python3.5.2中怎么安装scrapy 艺术学定义 我戴的金属镜架,腿腿和鼻托都发绿了,哪里可以清理维修? scrapy怎么安装?急!! 历史上有哪些被诅咒的物品 本人身高167左右,求戴面具全遮脸的角色,cosplay用。 怎样在pythonpath中安装scrapy 眼镜用久后,鼻托等部位会出现绿色的物质,怎么回事,这些物质是什么? 上古卷轴5被诅咒的奴隶靴怎么穿上 艺术的概念? Python安装Scrapy出现以下错误怎么办 艺术指的是什么? 眼镜上为什么会有绿色的锈? 关于python-Scrapy安装方法? 游戏王决斗联盟哪个卡包有被诅咒的右脚? 哈利波特8被诅咒的孩子剧情 眼镜鼻托为什么会变绿,是发霉了么 现在python爬虫用scrapy框架多吗?我安了好多次都安不上 眼镜托那有绿色的东西,又清理不出来,会不会对身体不好? 网球王子里,被诅咒的球拍是哪一集? win8环境下python3.4怎么样配置才能把scrapy安装成功 脾虚需要吃什么呢? QQ炫舞背景音乐,上几个星期里,大厅里一直播的是初音的magnet吗? 搞不懂怎么安装python的scrapy 初音未来直播上的和巡音一起唱的百合神曲是哪首啊 Windows XP 下怎么安装Scrapy 脾虚应该多吃什么? python3.6自学教程,用pyCharm的,有没有Scrapy框架的教程 安装Scrapy时,Python2.7.9怎么安装 《精通 Python爬虫框架 Scrapy》txt下载在线阅读全文,求百度网盘云资源... 沪江网校日语n3到n2百度云求分享,谢谢 哪有四级词汇txt文件?