Creating a project
Before you start scraping, you will have set up a new Scrapy project. Enter a directory where you’d like to store your code and then run:
scrapy startproject tutorial
This will create a tutorial directory with the following contents:
tutorial/
scrapy.cfg
tutorial/
__init__.py
items.py
pipelines.py
settings.py
spiders/
__init__.py
...
- scrapy.cfg: the project configuration file
- tutorial/: the project’s python module, you’ll later import your code from here.
- tutorial/items.py: the project’s items file.
- tutorial/pipelines.py: the project’s pipelines file.
- tutorial/settings.py: the project’s settings file.
What is a Item?
Items are containers that will be loaded with the scraped data, they work like simple python dics but provide additional protecting against populating undeclared fields.
Populating items.py File:
from scrapy.item import Item, Field class DmozItem(Item): title = Field() link = Field() desc = Field()
No comments:
Post a Comment