Scraping Wikipedia tables with Google Spreadsheets
Fitting in nicely with the discussion on pulling financial data into Google Spreadsheets, the OUseful blog recently demonstrated another Spreadsheet data import function, importHTML(), which allows you to easily link an external HTML table to your workbook.
The Google spreadsheet function =importHTML("","table",N) will scrape a table from an HMTL web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N'th table in the page (counting starts at 0) as the target table for data scraping.
The author goes on to show you how to pull a country population table from a Wikipedia entry into a spreadsheet, create a graph from it, publish the spreadsheet as a CSV, consume the CSV in Yahoo Pipes, export the Pipe output to KML, and import the KML into a Google Map. Whew!
The importHTML function will accept either "list" or "table" as the second parameter, which allows you to retrieve records from either UL/OL/DL lists or TABLE contents, respectively. If you want to retrieve something that's not table or list based, the importXML may also come in handy. With importXML, you can pull data from any XML or HTML file using an XPath query to target a specific tag or attribute. For more information on these import functions, consult the official documentation below.
Data Scraping Wikipedia with Google Spreadsheets
Google Docs Documentation: Functions For External Data
Previously:
HOWTO - track stocks in Google Spreadsheets
Posted by Jason Striegel |
Oct 15, 2008 11:49 PM
Ajax, Data, Google, Google Maps, Life, Mapping, Yahoo! |
Permalink
| Comments (0)
Recent Entries
- Minty soldering jig
- Selecting row number in MySQL
- iPhone 3G software unlock
- Python on Android
- Controlling Sony camcorders with the Arduino
- Gradient text effect in CSS
- Retro gaming emulators that include (legal) ROMs?
- Das DereLicht - ham radio transmitter from a CFL bulb
- Using Google App Engine as a personal CDN
- Route-me - Open Source mapping library for iPhone
Bloggers
Welcome to the Hacks Blog!
Categories
- Ajax
- Amazon
- Android
- AppleTV
- arduino
- Astronomy
- Baseball
- BlackBerry
- Blogging
- Body
- Cars
- Cryptography
- Data
- Design
- Education
- Electronics
- Energy
- Events
- Excel
- Excerpts
- Firefox
- Flash
- Flickr
- Flying Things
- Food
- Gaming
- Gmail
- Google Earth
- Google Maps
- Government
- Greasemonkey
- Hacks Series
- Hackszine Podcast
- Halo
- Hardware
- Home
- Home Theater
- iPhone
- iPod
- IRC
- iTunes
- Java
- Kindle
- Knoppix
- Language
- LEGO
- Life
- Lifehacker
- Linux
- Linux Desktop
- Linux Multimedia
- Linux Server
- Mac
- Mapping
- Math
- Microsoft Office
- Mind
- Mind Performance
- Mobile Phones
- Music
- MySpace
- MySQL
- NetFlix
- Network Security
- olpc
- Online Investing
- OpenOffice
- Outdoor
- Parenting
- PCs
- PDAs
- Perl
- Philosophy
- Photography
- PHP
- Pleo
- Podcast
- Podcasting
- Productivity
- PSP
- Retro Computing
- Retro Gaming
- Science
- Screencasts
- Security
- Shopping
- Skype
- Smart Home
- Software Engineering
- Sports
- SQL
- Statistics
- Survival
- TiVo
- Transportation
- Travel
- Ubuntu
- User Interface
- Video
- Virtualization
- Visual Studio
- VoIP
- Web
- Web Site Measurement
- Windows
- Windows Server
- Wireless
- Word
- World
- Xbox
- Yahoo!
- YouTube
Archives
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
Recent Posts
- Minty soldering jig
- Selecting row number in MySQL
- iPhone 3G software unlock
- Python on Android
- Controlling Sony camcorders with the Arduino
- Gradient text effect in CSS
- Retro gaming emulators that include (legal) ROMs?
- Das DereLicht - ham radio transmitter from a CFL bulb
- Using Google App Engine as a personal CDN
- Route-me - Open Source mapping library for iPhone
www.flickr.com
|






Leave a comment