Python Requests + Python Selenium in action

When using Scarpy or Requests in python for data scraping , sometimes we login to websites and then start scraping. Most of the time have less parameters to send in post request.

But Asp.Net website has too many parameters and its very annoying.  

The easy way to login a website is selenium.

Yes, when you login to a website , the website sets parameters on the browser in terms of session and cookies.

So by Getting login using selenium then extract it's cookies and set in python requests.

           

from selenium import webdriver
from lxml import html
import requests

def request(driver):  
                   s = requests.Session()
                   cookies = driver.get_cookies()
                   for cookie in cookies:
                         s.cookies.set(cookie['name'], cookie['value'])
                   return s
def login():
       driver = webdriver.Firefox()
       driver.get("http://www.example.com")
       driver.find_element_by_id('username').send_keys(username)
       driver.find_element_by_id('password').send_keys(password)
       driver.find_element_by_id('login').click()
     
       # Now move to other pages using requests
       req = request(driver)
       response = req.get("myurl")      
       print response.status_code
       htmlObj = html.fromstring(response.text)
       yourdata = htmlObj.xpath("your xpath")

login()

The above code is just for a concept. It will not run on your machine.

2 comments:

vladvamp said...

Hi, I have got a typeError when doing this with my own project.
It says:

Traceback (most recent call last):
File "/Users/fazrin_rahman/Desktop/KOST SCRAP/Code/SeleniumTest.py", line 62, in
response = requests.get(url, cookies=cookies)
File "/Users/fazrin_rahman/Library/Python/3.6/lib/python/site-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/Users/fazrin_rahman/Library/Python/3.6/lib/python/site-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/fazrin_rahman/Library/Python/3.6/lib/python/site-packages/requests/sessions.py", line 461, in request
prep = self.prepare_request(req)
File "/Users/fazrin_rahman/Library/Python/3.6/lib/python/site-packages/requests/sessions.py", line 372, in prepare_request
cookies = cookiejar_from_dict(cookies)
File "/Users/fazrin_rahman/Library/Python/3.6/lib/python/site-packages/requests/cookies.py", line 516, in cookiejar_from_dict
cookiejar.set_cookie(create_cookie(name, cookie_dict[name]))
TypeError: list indices must be integers or slices, not dict

can you help me with it?

Manoj Kumar said...

Sorry, I was away from the blog but now available. If you have any questions then let me know.