Scraping Wikipedia tables with Google Spreadsheets

Fitting in nicely with the discussion on pulling financial data into Google Spreadsheets, the OUseful blog recently demonstrated another Spreadsheet data import function, importHTML(), which allows you to easily link an external HTML table to your workbook.

The Google spreadsheet function =importHTML("","table",N) will scrape a table from an HMTL web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N'th table in the page (counting starts at 0) as the target table for data scraping.

The author goes on to show you how to pull a country population table from a Wikipedia entry into a spreadsheet, create a graph from it, publish the spreadsheet as a CSV, consume the CSV in Yahoo Pipes, export the Pipe output to KML, and import the KML into a Google Map. Whew!

The importHTML function will accept either "list" or "table" as the second parameter, which allows you to retrieve records from either UL/OL/DL lists or TABLE contents, respectively. If you want to retrieve something that's not table or list based, the importXML may also come in handy. With importXML, you can pull data from any XML or HTML file using an XPath query to target a specific tag or attribute. For more information on these import functions, consult the official documentation below.

Data Scraping Wikipedia with Google Spreadsheets
Google Docs Documentation: Functions For External Data

Previously:
HOWTO - track stocks in Google Spreadsheets

Posted by Jason Striegel | Oct 15, 2008 11:49 PM
Ajax, Data, Google, Google Maps, Life, Mapping, Yahoo! | Permalink | Comments (0) Bookmark and Share

Recent Entries

Comments

Newest comments listed first.

Leave a comment



Bloggers

Welcome to the Hacks Blog!

Brian Jepson.Brian Jepson


Jason Striegel.Jason Striegel


Philip Torrone.Phillip Torrone



See all of the books in the Hacks Series!
Advertise here.

Recent Posts

www.flickr.com
photos in Hacks More photos in Hacks