These frontend frameworks are complicated to deal with because there are often using the newest features of the HTML5 API.
Headless Chrome with Python
You will need to install the selenium package:
pip install selenium
And of course, you need a Chrome browser, and Chromedriver installed on your system.
On macOS, you can simply use brew:
brew install chromedriver
Taking a screenshot
> chrome.py from selenium import webdriver from selenium.webdriver.chrome.options import Options options = Options() options.headless = True options.add_argument("--window-size=1920,1200") driver = webdriver.Chrome(options=options, executable_path=r'/usr/local/bin/chromedriver') driver.get("https://www.nintendo.com/") driver.save_screenshot('screenshot.png') driver.quit()
The code is really straightforward, I just added a parameter –window-size because the default size was too small.
You should now have a nice screenshot of the Nintendo’s home page:
Waiting for the page load
Most of the times, lots of AJAX calls are triggered on a page, and you will have to wait for these calls to load to get the fully rendered page.
A simple solution to this is to just time.sleep() en arbitrary amount of time. The problem with this method is that you are either waiting too long, or too little depending on your latency and internet connexion speed.
The other solution is to use the WebDriverWait object from the Selenium API:
try: elem = WebDriverWait(driver, delay) .until(EC.presence_of_element_located((By.NAME, 'chart'))) print("Page is ready!") except TimeoutException: print("Timeout")
This is a great solution because it will wait the exact amount of time necessary for the element to be rendered on the page.
As you can see, setting up Chrome in headless mode is really easy in Python. The most challenging part is to manage it in production. If you scrape lots of different websites, the resource usage will be volatile.
This is one of the reason I started ScrapingNinja, so that developers can focus on extracting the data they want, not managing Headless browsers and proxies!
This was my first post on about scraping, I hope you enjoyed it!
If you did please let me know, I’ll write more 😊
Post originally written by Kevin Sahin, do not hesitate to check his awesome blog about scraping 😊