Pharo-Chrome as a scraping solution

Most of the web pages I want to scrape use javascript to construct the
DOM, which makes Soup. XMLHTMLParser, etc. useless.

I’ve extended Torsten’s Pharo-Chrome library and use that to navigate
the DOM in a way similar to Soup:

https://github.com/akgrant43/Pharo-Chrome

This gets around the issue with javascript since it waits for the
browser to load the page, run the javascript and construct the DOM.

HTH,
Alistair

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: