Scrapy at a glance
What is Scrapy?
Scrapy is a python framework for
crawling a website or a webpage.
Using Scrapy is easy but a little
tricky , such that anyone can become fond of this . It requires some tricks and
logic.
Uses of Scrapy: Automation Testing
, Web Scraping, Data Mining, Information/text Processing.
A Moral Story:
A web designer made a website in
6 months and inserted data into the website which cost $10000 or more and 2-3
years of efforts. He was into the business and he was doing well. He has
millions of information on his website.
As the time passed competition
became tuff and many new website started in a short time. He was confused that
how were other websites getting the data like him. Were they spending a lot of
money for collecting the information?
What do you think?
For my view, No it’s not
necessary to spend money on data if you how to use scrappy. Yes, Data can be
captured from other websites and could be stored in Data Base, CSV Files, Json
Files.
If you got some interest then go
on…….to next page
Installing Scrapy and other
supported files:
//Note : All the files are available as .exe file for 32-bit python 2.7
& the link given below may change any
time //
The Information is only for
windows operating system.
1. Install
Python 2.7 of 32-bit version.
Python 2.7
32-bit is a stable version has no problem of version with other Python
packages. Don’t install 64-bit it may cause some problem in future.
Link for python
32-bit windows MSI file: http://www.python.org/ftp/python/2.7/python-2.7.msi
Go to the downloaded Python MSI file you have downloaded and install
it by double clicking on it.
2. Install
easy_install for installing other python
supporting packages without problems.
3.
Add the C:\python27\Scripts and C:\python27 folders to the system path by
adding those directories to the PATH environment variable from the Control Panel.
Start->search(variable)->edit
system path variable->
Edit the path of system as
shown below and click OK to all. Now start CMD
by typing CMD into the run. Type python on CMD and press enter<-. If
you see a python interactive shell than all is done is correct.
4.
install OpenSSL by following these steps:
o download Visual C++ 2008
redistributables for your Windows and architecture
o
download
OpenSSL for your Windows and architecture (the regular version, not the light
one)
o add the c:\openssl-win32\bin (or similar) directory to your PATH, the same way you addedpython27 in the third step.
5. Open CMD and type easy_install Scrapy and press Enter key
6.
Installing py-win32 Link: http://www.lfd.uci.edu/~gohlke/pythonlibs/#pywin32
8.
Zope interface: https://pypi.python.org/simple/zope.interface/
Down
load 32-bit MSI or exe file for your ease.
MSI or EXE available in the list.
9.
lxml 32-bit binary link: https://pypi.python.org/pypi/lxml/3.3.3
- pyOpenSSL: https://launchpad.net/pyopenssl
Up to last
chapter you have learned how to install scrapy and other supporting file. From
this chapter you will learn the scrappy. Scrapy follows the file pattern like
Django which is an efficient way to manage program source codes and files. Now
let’s move to an example:
1.
How to start?
Most
people waste their time in the beginning due to the lack of knowledge , So
don’t waste the time and follow the following procedure:-
a.
Run CMD and move to a directory for you code
& type : scrapy startproject project_name
Note: Some CMD
commands for you:-
1.
cd.. to
move previous directory.
2.
mkdir dir_name to
create a new folder/directory.
3.
cd dir_name
to
change directory.
I am here C:\scrapy_book>
b.
C:\scrapy_book> scrapy startproject
scrap_youtube
c.
After
executing this command you will see a folder name scrap_youtube and inside it
one more scrap_youtube :
C:\scrapy_book>scrap_youtube>scrap_youtube
d.
C:\scrapy_book>scrap_youtube>scrap_youtube
e.
C:\scrapy_book>scrap_youtube>scrap_youtube
folder contains 4 python files and one directory/folder:
1. Items.py
2. Pipe_line.py
3. Settings.py
4. __init.py__
5.
Dir > spiders
If
you are a experienced programmer who have some exposure to django must be aware
of the names. We will discuss about these files after a sample program.
How
and Where to write the program?
You
can type your program in notepad file with .py extension or python 2.7 comes
with python IDLE for python program writing.
Go
to scrapy project folder and edit file items.py like this:
And save(ctrl+S) it.
Now
move to the folder spiders and create a file inside it named as spider1.py
No comments:
Post a Comment