Web Scraping with AutoHotkey 109a- Triggering an EventListener on a page


You ever plugging along on web scraping a page and have a problem with an element (drop down, edit field, radio button, etc) not updating?  Chances are the page has an EventListener watching that element for a specific Event type.  We used to be able to reliably “click” an element or send .fireEvent(“onchange”) / .fireEvent(“onclick”) howe

ver more and more pages are using this newer approach where they build an Event Listener and monitor for events on a given element.

If you’re a non-coder like me, this was very problematic to deal with as the EventListener is located in a different place in the DOM.  In the below video I walk through how to spot the problem and offer up a couple of solutions (like Visual Event) that should greatly help. The second video below demonstrates using my updated AutoHotkey syntax writer.

Triggering an EventListener on a page

Web Scraping with AutoHotkey 109a- Triggering an EventListener on a page

Updated AutoHotkey Syntax Writer

Here I demonstrate using my updated AutoHotkey Syntax writer to provide the needed information for the Events.

Web Scraping with AutoHotkey 109b- AHK syntax writer dealing with Events

AutoHotkey Webinar- Intro to Webservice / API calls

In this AutoHotkey Webinar we cover an Webservice / API calls.

Video Hour 1: High-level:

  • What is a Webservice API call
  • Web Scraping vs. WebServices / API
  • List of some “fun” WebService / APIs as well as links to resources to tens-of-thousands!
  • Differences between a Get vs. Post request
  • High-level look at oAuth1 vs. oAuth2
  • Resources
    • HTTP: Intro to HyperText Protocol, Types of Requests, oAuth, Parsing XML, XPath, JSON
    • AutoHotkey resource: Syntax writer, winHTTPrequest, Msxml2 vs. WinHTTPrequest, Parsing XML & JSON
  • Basic examples of API requests (compared using browser and winHTTP request)

Video Hour 2: Coding and Q&A

  • Played with several APIs
  • Discussed & demonstrated reverse-engineering API calls from a website
  • Passing key-value pairs (and using a function to keep it organized)
  • Reviewed additional APIs
  • Q&A

Script Highlight: Select text and “pretty” hyperlink text

The below script demonstrates how you can use AHK to automate highlighting text on a page and then, using the winClip library, constructing a “pretty” html link.  Here are links to the GetActiveBrowserURL and WinClipAPI / WinClip


What is a Webservice API? (Application Programming Interface?)


Examples of Webservice / API calls

  • Application / Software querying products for sale on Amazon.com
  • App on your phone getting latest Weather
  • Database pulling updated sales report
  • Using your Tablet to Select movies to watch on Netflix
  • DropBox application syncing files between your computer & cloud
  • Google places search

APIs are becoming increasingly available!

…Since 2005, we’ve seen APIs grow from a curiosity, to a trend, and now to the point where APIs are core to many businesses. APIs have provided tremendous value to countless organizations and developers, which is reflected in their continued growth.   Source: Programmable Web

growth in webservice APIs since 2005

 

 


Main Differences between WebService / API call & Web Browser

Webservice verse web browser calls


When to use API verse Web Scraping?

When to use webservice API call verse web scraping


Some Webservices / APIs

Some of the Amazing APIs out there…

Programmable Web – list of thousands of APIs as well as a great, six-part, article reviewing APIs

Short-list of APIs open to public on Wikipedia,   Google APIs– Over 100 APIs @ Google; Yahoo APIs

Social Sites : Twitter, Pinterest, Google+, YouTube, Facebook, LinkedIn, reddit, StumbleUpon, WordPress, Instagram

Contact / Business Lookup: ClearBit, Pipl, FullContact, Dunn & Bradstreet , Foursquare, Google Places, Yelp, GeoNames

Weather: Weather Underground, Yahoo Weather, World Air

Finance: Google Finance, Yahoo Finance, PayPal, Stripe, XigniteRealTime, Ally Invest

Government: A bunch of Government APIs, Cloud.Gov, US Census, 18F.gsa.gov

Additional: Wikipedia, Amazon Product, email to text, SmartSheet,SurveyGizmo, Bitly, SnapOCR, PasteBin, DropBox, Zoho, Zillow, imugr, Mailgun, MailChimp,  Microsoft: Graph, MS Office: Excel  Word  SharePoint  OneNote Outlook Yammer PowerPoint


Break-down of a REST API request via COM object

  • Create COM object
  • Open Endpoint (w/ parameters & Authentication if GET request)
  • Set RequestHeader(s)
  • Send (w/”payload & Authentication” for POST requests)
  • Get response (body or text)

Main differences of oAuth1 verse oAuth2

Unfortunately there is no “standard” implementation of oAuth1 or oAuth2 however, at a high-level, here are some of the main conceptual differences:

oAuth1:

  • Need a Key & Token from Webservice API (typically different than your username and password)
  • You use your Key & Token in your API call
  • oAuth1 is less secure and, generally, being phased out
  • While being phased out often the “developers” (us) can use oAuth1 for development of the “app”

oAuth2:

  • Need a Key and Token (same as oAuth1) however you use the Key & Token and some other parameters to perform a “handshake” which returns a secure token which typically times-out in seconds / minutes/hours
  • Your token is restricted to the level of your account (or what has been authorized)
  • The secure token is what is shared with your actual endpoint. (this allows other Social sites (like LinkedIn, Facebook, etc.) to assist your login but not have your username/password to the endpoint

HTTP and AutoHotkey Resources

HTTP Protocol & General Tutorials

AutoHotkey specific



Let’s make some Webservice / API calls!

Weather Underground

  1. Dallas forecast in XML
  2. Dallas forecast in JSON
  3. Using WinHttpRequest
  4. Example with browser

Yahoo business

  1. Pizza restaurants in zip code 75019 in XML
  2. Pizza restaurants in zip code 75019 in JSON
  3. Using WinHTTPRequest

Cross browser web scraping with AutoHotkey and Selenium

 

While AutoHotkey is an amazing tool for Web Scraping, many people complain about being limited to connecting with COM to IE.   In the below videos I walk through how you can use AutoHotkey and Selenium to automate web scraping in virtually any browser you wish.  🙂

What is Selenium & why should AutoHotkey users care?

Installing Selenium

In order to control Selenium with AutoHotkey you need to install the SeleniumBasic.  The current version is 2.09.0 and can be downloaded here.  Selenium is now on version 3 and there is a new SeleniumBasic version promised to be released soon which will connect to version 3 of Selenium.  Make sure you download the WebDrivers of choice for your browsers.

If all of this sounds confusing, don’t feel bad.  It is ridculous!  I found this post which documents/clarifies much of the confusion (although Selenium 3 is now out)

Please note several people reported when installing Selenium Basic it did not install in the program files location (i.e. here: C:\Program Files\SeleniumBasic or C:\Program Files (x86)\SeleniumBasic).  They also had problems getting Selenium to launch.   I recommend you make sure Selenium installs into one of the Program Files location and also make sure you get the Selenium drivers installed.  After install I had the following files on my computer:

  • C:\Program Files\SeleniumBasic\operadriver.exe
  • C:\Program Files\SeleniumBasic\chromedriver.exe
  • C:\Program Files\SeleniumBasic\edgedriver.exe
  • C:\Program Files\SeleniumBasic\iedriver.exe

I also updated the Chrome driver here and am keeping my Chrome version up to date (currently on version 59.0.3071.71)

Here are some links you might want to review (but you’ll need to adapt them for your purposes)

Installing Selenium for use with AutoHotkey

 

Using AutoHotkey and Selenium across various browsers

This video I show two ways I’ve learned how to start-up the Selenium Webdriver with AutoHotkey.

Tutorial showing how to start up and navigate with AutoHotkey and Selenium

Using AutoHotkey & Selenium- Starting the browsers & navigating to a page

Getting information from a page with Selenium and AutoHotkey

While there are a lot of similarities to data extraction in Selenium, there are quite a few differences as well. The below code is what I use in the following video.  It demonstrates some ways that you can extract data from a web page via Selenium and AutoHotkey.

 

using AutoHotkey and Selenium : Getting information from a page

Setting information on a page

Selenium and AutoHotkey are pretty different in how you set information.  Selenium has a “sendkeys” method which seems to be pretty reliable at triggering events on a page.

  • Make sure you review this:  Send keys to value:  ;Note: you need to add “driver” e.g. .SendKeys(driver.Keys.ENTER)

Setting text & clicking items on a page with Selenium and AutoHotkey

Selenium and AutoHotkey: Setting text and clicking elements

 

Using your Chrome Profile (Avoiding the need to re-login to a site)

In this video I demonstrate how you can leverage your Chrome profile so you do not need to keep logging into a website with Chrome

 

Setting Selenium Chrome profile with AutoHotkey

Iterating over Objects with Selenium and AutoHotkey

In the below video I demonstrate some of the important differences when iterating over objects with Selenium and AutoHotkey.    A COM based object does not have an enumerator thus you cannot simply use a for-loop to iterate over them.   Selenium does does have an enumerator however the objects are held in the Keys (not the values).

 

Demo video showing how to iterate over objects in Selenium

Various Selenium methods for getting & setting data on a page

In this tutorial I walk through various ways to get/set data on a page. With Selenium you can use both CSS and Xpath which are like QuerySelector.

I also shared these two resources from Michael Sorens which present the same data grouped by Method and grouped by Tool.

 

Selenium & AutoHotkey- Various methods for Getting/Setting data

Various methods from Selenium & by using JavaScript Execution

I went through and documented some of the additinol methods I used from Selenium & by injecting JavaScript. Check them out below as well as the video walking through the usage.

Selenium & AutoHotkey- Methods and JavaScript execution

Maneuvering Frames in Selenium with AutoHotkey

The below code provides some insights on how to navigate frames with AutoHotkey

Navigating Frames with Selenium and AutoHotkey

Downloading files with Selenium and AutoHotkey

In this tutorial I demonstrate how I used AutoHotkey and Selenium to download a PDF file. The same process will work for other files that are not, automatically, opened by Chrome.

Downloading a PDF file with Selenium and AutoHotkey

Connecting to a current instance of Chrome

Thankfully tmplinshi has come up with a solution on how to connect to an already launched version of Chrome.  Granted, you’ll need to launch Chrome with some command line parameters but this is an easy tweak to do by just adding them to your main shortcut to Chrome you’ll be able to connect to a current running Chrome window!

Here’s what you need to do for prep-work:

  1. Make sure all current versions of Chrome are closed
  2. Create a shortcut to chrome with this path: chrome.exe –remote-debugging-port=9222
  3. Launch Chrome from your new shortcut

Then you can use the below code to connect with it!  Check out the below video demonstrating how it works.

 

Connecting to a running instance of Chrome with Selenium and AutoHotkey

Update to Web Scraping syntax writer for AutoHotkey

I made a few updates to my Web Scraping syntax writer.  This new version adds in funWeb Scraping syntax writerctionality to Get/Set attributes of an element as well easily find all tables on a given page, then extract the text and dump it into a dynamic Listview.

Both can be pretty helpful when scraping data from the web.  For an extensive review of the tool, checkout the videos on my main Web Scraping page.

Demonstration of updated Web Scraping syntax writer

Updates to Web Scraping / AutoHotkey syntax writing tool