When using Scarpy or Requests in python for data scraping , sometimes we login to websites and then start scraping. Most of the time have less parameters to send in post request.
But Asp.Net website has too many parameters and its very annoying.
The easy way to login a website is selenium.
Yes, when you login to a website , the website sets parameters on the browser in terms of session and cookies.
So by Getting login using selenium then extract it's cookies and set in python requests.
from selenium import webdriver
from lxml import html
import requests
def request(driver):
s = requests.Session()
cookies = driver.get_cookies()
for cookie in cookies:
s.cookies.set(cookie['name'], cookie['value'])
return s
def login():
driver = webdriver.Firefox()
driver.get("http://www.example.com")
driver.find_element_by_id('username').send_keys(username)
driver.find_element_by_id('password').send_keys(password)
driver.find_element_by_id('login').click()
# Now move to other pages using requests
req = request(driver)
response = req.get("myurl")
print response.status_code
htmlObj = html.fromstring(response.text)
yourdata = htmlObj.xpath("your xpath")
login()
The above code is just for a concept. It will not run on your machine.
But Asp.Net website has too many parameters and its very annoying.
The easy way to login a website is selenium.
Yes, when you login to a website , the website sets parameters on the browser in terms of session and cookies.
So by Getting login using selenium then extract it's cookies and set in python requests.
from selenium import webdriver
from lxml import html
import requests
def request(driver):
s = requests.Session()
cookies = driver.get_cookies()
for cookie in cookies:
s.cookies.set(cookie['name'], cookie['value'])
return s
def login():
driver = webdriver.Firefox()
driver.get("http://www.example.com")
driver.find_element_by_id('username').send_keys(username)
driver.find_element_by_id('password').send_keys(password)
driver.find_element_by_id('login').click()
# Now move to other pages using requests
req = request(driver)
response = req.get("myurl")
print response.status_code
htmlObj = html.fromstring(response.text)
yourdata = htmlObj.xpath("your xpath")
login()
The above code is just for a concept. It will not run on your machine.