1. Scrapy Items.py FIle

Creating a project

Before you start scraping, you will have set up a new Scrapy project. Enter a directory where you’d like to store your code and then run:
 
scrapy startproject tutorial

This will create a tutorial directory with the following contents:
tutorial/
    scrapy.cfg
    tutorial/
        __init__.py
        items.py
        pipelines.py
        settings.py
        spiders/
            __init__.py
            ...
These are basically:
  • scrapy.cfg: the project configuration file
  • tutorial/: the project’s python module, you’ll later import your code from here.
  • tutorial/items.py: the project’s items file.
  • tutorial/pipelines.py: the project’s pipelines file.
  • tutorial/settings.py: the project’s settings file.
Items.py File Contain Items.

What is a Item?

Items are containers that will be loaded with the scraped data, they work like simple python dics but provide additional protecting against populating undeclared fields.

Populating items.py File:

from scrapy.item import Item, Field

class DmozItem(Item):
    title = Field()
    link = Field()
    desc = Field()

No comments: