Python Requests + Python Selenium in action

When using Scarpy or Requests in python for data scraping , sometimes we login to websites and then start scraping. Most of the time have less parameters to send in post request.

But Asp.Net website has too many parameters and its very annoying.  

The easy way to login a website is selenium.

Yes, when you login to a website , the website sets parameters on the browser in terms of session and cookies.

So by Getting login using selenium then extract it's cookies and set in python requests.

           

from selenium import webdriver
from lxml import html
import requests

def request(driver):  
                   s = requests.Session()
                   cookies = driver.get_cookies()
                   for cookie in cookies:
                         s.cookies.set(cookie['name'], cookie['value'])
                   return s
def login():
       driver = webdriver.Firefox()
       driver.get("http://www.example.com")
       driver.find_element_by_id('username').send_keys(username)
       driver.find_element_by_id('password').send_keys(password)
       driver.find_element_by_id('login').click()
     
       # Now move to other pages using requests
       req = request(driver)
       response = req.get("myurl")      
       print response.status_code
       htmlObj = html.fromstring(response.text)
       yourdata = htmlObj.xpath("your xpath")

login()

The above code is just for a concept. It will not run on your machine.

Login a website Using Scrapy

Scarpy is a great framework for scraping purpose. It has a very good structure for large projects and Oops style remove duplicate code.

This Post is just for education Purpose.

This code shows how to get login in a website using Scrapy:-

```def parse(self, response):
         yield FormRequest(posturl,
                                       formdata{'username':'yourUserName','Password':'!Z6Dj2hqaR'},
                                       callback=self.after_login)

    def after_login(self, response):
        yield Request("pageurl,callback=self.directory_page, meta={'page':1})
```
After this Scrapy takes care itself of session and cookies.