Archive: MySQL

May 8, 2007

MySQL database migration: latin1 to utf8 conversion

Until version 4.1, MySQL tables were encoded with the latin1 character set. By default, the character set is now utf8. This is a good thing in terms of non-latin character support, but if you're upgrading from an older database you may run into a lot of character encoding problems.

I've recently migrated several older databases from 4.0 to MySQL 5 and converted a few tables from latin1 to utf8, and I've put together a few tricks that might help you through the same. So far, I've basically run into three main issues: converting a table from the latin1 charset to utf8, upgrading a whole database set that was properly exported with mysqldump, and (the worst) migrating an entire set of database's MYI and MYD files that weren't properly exported.

Converting a latin1 table to utf8
If you've already migrated successfully to the new MySQL version, but you have an older latin1 encoded table and you simply want to change it's character set to utf8, it's a fairly simple exercise. First, export the table with mysqldump:

mysqldump -u username -p database --default-character-set=latin1 table > tableoutput.sql

First make a back up of that file and just edit it to adjust two things. Find the line that contains "SET NAMES latin1" and change that to "SET NAMES utf8". Then, look at the table definition and change "DEFAULT CHARSET=latin1" to "DEFAULT CHARSET=utf8". Finally, reimport the table:

cat tableoutput.sql | mysql --default-character-set=utf8 -u username -p database

When it's imported back in, it will properly be created as a utf8 table and you shouldn't have any character encoding problems. You can do this for a whole database with the same technique and using a search and replace to switch latin1 to utf8.

Upgrading a whole database set that was properly exported with mysqldump
This is a pretty simple scenario. You've exported your entire database from the previous MySQL version with something like this:

mysqldump --default-character-set=latin1 -u username -p database > dboutput.sql

You can then import it either as a latin1 table, or convert it to utf8 as we did above. Just make sure to adjust the "SET NAMES" and "CHARACTER SET" values appropriately, and set the "--default-character-set" parameter to utf8 or latin1, as appropriate.

Note that the default character set is now utf8 unless you change it in the my.cnf file. I've been on systems where it's been set to latin1 for compatability reasons, however, so you can't just assume this. Make sure that all of your database code (in your php, perl, whatever) issues the "SET NAMES utf8" SQL statement before issuing any further SQL commands. This will ensure that what you set in INSERTs and receive in SELECTs isn't mis-encoded. If you are using a legacy latin1 table, make sure you do the same, but with the latin1 setting.

Migrating an entire set of database's MYI and MYD files that weren't properly exported
I was a bonehead the other day and didn't properly export one of my server's databases before upgrading it to MySQL 5. After upgrading, all of my databases were corrupted, including the mysql database, meaning I couldn't even log in. The mysql_upgrade command didn't fix things properly, and it was all because MySQL assumed my tables from that older version were encoded in utf8.

It was, like, a bummer.

I don't know if there is an easier way, but this is how I managed to transition all those MYI and MYD files to the new database, without reverting to the older version and exporting properly.

First, you need to change the mysql config so that when it loads your old databases it assumes they are latin1 encoded. To do this, edit the my.cnf file. Find all the lines that say "default-character-set=utf8" and change them all to say "default-character-set=latin1".

Start up mysqld and have it ignore permissions:

mysqld_safe --skip-grant-tables &

Now, run "mysql_upgrade". When it finishes, do a SELECT on the mysql.user table. You should see that usernames and encoded passwords haven't been truncated in half and replaced with gobbledeguk.

Kill and restart the mysql server. You should be able to log in, meaning the mysql users table is, in fact, uncorrupted. You'll find that all your other tables are fine as well. One thing that you might notice is that the mysql_upgrade script will have updated the mysql databases to use utf8. All of your other databases are still in latin1, however, so you'll need to convert them to utf8 (if you desire) using the instructions above. Also, if you don't require legacy support for older applications, you should probably go back to the my.cnf file and change the default character set back to utf8 for everything.

Your thoughts
From this point forward, everyone should be using utf8 across the board, but it seems like making the jump is more painful than it should be. If you know any good MySQL character encoding tips and tricks, help out your fellow hacker. Share them in comments!

Posted by Jason Striegel | May 8, 2007 08:56 PM
MySQL | Permalink | Comments (2) | TrackBack | Digg It | Tag w/del.icio.us

April 26, 2007

MySQL Storage Engine for Amazon S3

mysqls3_20070426.jpg
In a MySQL Conference presentation with the subheading "How to store a hundred billion BLOBs without buying a single disk," Mark Atwood described a new open source storage engine that he built which allows you to store data in Amazons S3 service using MySQL.

It doesn't (yet) allow you to create any generic table schema you desire. Instead, you create tables with a simple key/value pair structure. For the scenario where storing data in S3 might be most useful, however, this is actually the data structure that you'd most likely need. You can use your local mysql server to store your indexes and data relationships, then outsource large BLOB data such as images or videos to the slower remote S3 service, identifying each resource by a unique key.

It looks like it's pretty simple to use. Here are a few example SQL statements that make use of the MySQL S3 engine (lifted from Mark's slides).

Connecting to the AWS server
CREATE SERVER 'MyAWSAcct'
FOREIGN DATA WRAPPER 'AWS'
OPTIONS
(USER 'aws id string',
PASSWORD 'aws secret string');

Creating a table
CREATE TABLE 'bierce' (
'word' VARCHAR(255) NOT NULL PRIMARY KEY,
'defn' BLOB)
CHARSET=utf-8
ENGINE=AWSS3
CONNECTION='awss3 DevilDictionary $server MyAWSAcct';

Insert and Select

INSERT INTO bierce (word, defn) VALUES
('AUTHOR', 'One noted for confusing bitterness with humor.');

SELECT defn FROM bierce WHERE word='WIT';

One note on selects: make sure you use a WHERE clause to select a single value by key or you'll be transfering the whole table across the network. This is not only slow, but it will cost you in bandwidth (your own bandwidth costs plus the S3 bandwidth fee).

You can view Mark's full presentation online. There's a powerpoint slideshow, and a seperate text file containing the slide notes -Link.

The full source of the MySQL S3 plugin is available for download as well -Link.

Posted by Jason Striegel | Apr 26, 2007 07:37 PM
MySQL | Permalink | Comments (0) | TrackBack | Digg It | Tag w/del.icio.us

February 26, 2007

Translate SQL Syntax Between Databases

SQL::Translator is an interesting Perl module that, among other things, can convert database table definitions to and from several DB platforms. Essentially, this could allow you to write and maintain your table definition code for a single platform, say MySQL, and then use Translator to output table structure into Oracle, Sybase, or PostgreSQL dialects.

Manipulation of data, such as SELECT, INSERT, UPDATE and DELETE are not yet supported, so you're still on your own there if you're writing platform agnostic code. That said, this is an incredibly useful tool. Just consider this example that Chris Dolan posted on use Perl:

MySQL understands this syntax:

create table book (
id int,
author_id int,
FOREIGN KEY fk_author_id (author_id) REFERENCES author (id)
) TYPE=InnoDB;

but not this nicer syntax (it silently ignores the "references" clause):

create table book (
id int,
author_id int references author (id),
) TYPE=InnoDB;

Perl to the rescue! I can write my schema in the latter syntax and use SQL::Translator to rewrite into the supported syntax.

References:

Posted by Jason Striegel | Feb 26, 2007 12:14 AM
Data, MySQL, Perl | Permalink | Comments (0) | TrackBack | Digg It | Tag w/del.icio.us

February 3, 2007

Simple Zip Code Geocoding

mpls_20070203.jpg

The ability to geocode, or translate into latitude and longitude, postal codes is a fairly useful hack to have in your programming toolbox. Quick and dirty zip geocoding allows you to do some neat things fairly efficiently and with a minimal amount of code. Though it's U.S. centric, it allows you to add location-based functionality to your apps without requiring any real personal information to be transfered or stored.

If your application only needs to convert a zip code (or any address) into a lat/lon coordinate, say for simple mapping purposes, the easiest solution is to use the Google Maps Geocoding API. In addition to the client-side javascript functionality, you can directly query the geocoding system from php using an http request like this:

http://maps.google.com/maps/geo?q=12345&output=xml&key=yourkeyhere

Just change 12345 to the zip (or any address) that you are looking up, and "yourkeyhere" should be your Google Map API key, which you can obtain here. Developer.com has a good PHP example for making use of the returned XML in your server-side code.

Often times, it's useful to be able to do zip lookups based on a geographic region. Maybe you want a list of all zip codes within a certain radius or bounding box. Applications for this could include clustering map items that are near eachother, or searching a database for items that are nearest to a given location. For this, it's really nice to have a MySQL table that contains zip codes along with their lat/lon coordinates. Fortunately, several people have compiled this sort of information from public domain data, and you can even download a full MySQL table dump here, for free.

At this point, it's a pretty simple matter to query the database for location-based information. For instance, let's say you have a web site with a guestbook that allows guests to leave their name and zip. You could easily whip up an application that tells your guests how many other guests are in their area by using a basic bounding box with a query like this:

SELECT guest.name from guest, zipcode
WHERE guest.zip = zipcode.zip
AND zipcode.lat < [maxlat] AND zipcode.lat > [minlat]
AND zipcode.lng < [maxlng] AND zipcode.lng > [minlng]

These are just a few ideas, but hopefully this should be enough to get you started. If you have some good ideas for other geocoding applications (or any mapping/gis hacks in general), please give us a shout in the comments.

Posted by Jason Striegel | Feb 3, 2007 09:42 PM
Ajax, Google Maps, Mapping, MySQL | Permalink | Comments (3) | TrackBack | Digg It | Tag w/del.icio.us

Bloggers

Welcome to the Hacks Blog!

Brian Jepson.Brian Jepson


Jason Striegel.Jason Striegel


Philip Torrone.Phillip Torrone



See all of the books in the Hacks Series!
Advertise here.

Recent Posts

www.flickr.com
photos in Hacks More photos in Hacks