Wikipedia offline reader: put all of Wikipedia on your laptop

Wikipedia periodically publishes full data dumps of the encyclopedia's content. If you wanted to make your own copy of the Wikipedia site for offline viewing, you'd typically convert and import that content into MySQL using MediaWiki's importDump.php utility. The initial import process can take over a day. Building the indexes for searching articles takes even longer.
Thanassis Tsiodras came up with a better way of using the Wikipedia dump for offline reading:
Wouldn't it be perfect, if we could use the wikipedia "dump" data JUST as they arive after the download? Without creating a much larger (space-wize) MySQL database? And also be able to search for parts of title names and get back lists of titles with "similarity percentages"?
The end result is a Wikipedia reader that indexes the entire dump in under 30 minutes, stores the entire data in the original, though segmented, bz2 compressed format, and comes complete with a light web interface for searching and reading entries.
Building a (fast) Wikipedia offline reader - Link
Related:
WikipediaFS - a Linux MediaWiki file-system - Link
Posted by Jason Striegel |
Aug 19, 2007 10:05 PM
Web |
Permalink
| Comments (3)
Recent Entries
- Poromenos' hello world curve
- USB CapsLocker and Sun keyboard simulation
- Robosapien has a coil gun
- Faster Windows shutdown
- Assign USB drives to a folder
- Little drummer bot
- CSS ad blocking for Firefox and Safari
- Design Coding: web standards rap
- Shredz64: Guitar Hero for C64
- BATMAN: adhoc mesh routing
Comments
Newest comments listed first.
| Posted by: jason_striegel on August 20, 2007 at 10:08 AM |
Whoops! Thanks, naikrovek. It should be fixed now.
| Posted by: prashanthellina on October 17, 2007 at 10:26 AM |
Wikipedia dumps are a great idea. I am exploring interesting ways to use them. I am posting progress here http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/
Bloggers
Welcome to the Hacks Blog!
Categories
- Ajax
- Amazon
- AppleTV
- Astronomy
- BlackBerry
- Blogging
- Body
- Cars
- Cryptography
- Data
- Education
- Electronics
- Energy
- Events
- Excel
- Excerpts
- Firefox
- Flash
- Flickr
- Flying Things
- Food
- Gaming
- Gmail
- Google Earth
- Google Maps
- Government
- Greasemonkey
- Hacks Series
- Hackszine Podcast
- Halo
- Hardware
- Home
- Home Theater
- iPhone
- iPod
- IRC
- iTunes
- Java
- Kindle
- Knoppix
- Language
- LEGO
- Life
- Lifehacker
- Linux
- Linux Desktop
- Linux Multimedia
- Linux Server
- Mac
- Mapping
- Math
- Microsoft Office
- Mind
- Mind Performance
- Mobile Phones
- Music
- MySpace
- MySQL
- NetFlix
- Network Security
- olpc
- OpenOffice
- Outdoor
- Parenting
- PDAs
- Perl
- Philosophy
- Photography
- PHP
- Pleo
- Podcast
- Podcasting
- Productivity
- PSP
- Retro Computing
- Retro Gaming
- Science
- Screencasts
- Shopping
- Skype
- Smart Home
- Software Engineering
- Sports
- SQL
- Statistics
- Survival
- TiVo
- Transportation
- Travel
- Ubuntu
- Video
- Virtualization
- Visual Studio
- VoIP
- Web
- Web Site Measurement
- Windows
- Windows Server
- Wireless
- Word
- World
- Xbox
- Yahoo!
- YouTube
Archives
Recent Posts
- Poromenos' hello world curve
- USB CapsLocker and Sun keyboard simulation
- Robosapien has a coil gun
- Faster Windows shutdown
- Assign USB drives to a folder
- Little drummer bot
- CSS ad blocking for Firefox and Safari
- Design Coding: web standards rap
- Shredz64: Guitar Hero for C64
- BATMAN: adhoc mesh routing
www.flickr.com
|





Leave a comment