Saturday, May 09, 2009

MySQL ODBC Hell


You can roll your eyes and pass on this one. I don't care.

Like most people, I started using databases after I ran out of rows in Excel. But I passed on Microsft Access. Never cared for it. Can't stand it.

Now, I'm a database abuser, still stuck in that old Excel mindset. What I call a "database" is what most knowledgeable people would call a "Big Fucking Table" (or "BFT"). I cram everything I can into a table and use SQL queries to get what I want out of it. It's a very brute force approach.

The Proxy List is a good example of this. 1.75 million rows of junk gathered over the past year to make a crappy Web page. I do basically the same thing at work and lately I ran into a brick wall with a MySQL BFT that pulls data from multiple MSDE BFTs.

MSDE is the Lite version of Microsoft SQL Server (MSSQL).

The point of this exercise is to mash together a bunch of Little Big Fucking Tables (LBFTs) into one Really Big Fucking Table (RBFT). The actual, real life solution would be to just make one RBFT in the first place, using MSSQL instead of MSDE. The reality is I can't justify the licensing costs for MSSQL so I have to use MySQL instead.

Plus there is a lot of cruft in the LBFTs that I don't need in the RBFT (I don't want a Really Really Big Fucking Table, or RRBFT, because I don't have the disk space). I can ignore all that by writing an SQL query.

So I made a Distributed Transaction Services (DTS) package that used the MySQL ODBC driver to pump the data between the LBFTs and my RBFT.

It worked fine for over a month, or about five and a half million rows. Then it stopped, complaining about running out of memory.

Naturally, I Googled the error.

I found that a lot of people were running up against this problem. I tried every fix posted in every link I could find - usually twiddling settings in the MySQL ODBC driver - but nothing worked.

So in desperation I fired up Wireshark to see what was going on over the network between the three boxes - the MSDE server with the data, the MSSQL server with the DTS package, and the MySQL server with the RBFT - in this process, and I found the answer.

And it was pure stupidity.

In the first step, the ODBC driver pulled the entire RBFT over the network from the MySQL box to the MSSQL server with the DTS package. It would then run out of memory and die.

Well, DUH!

The actual first step of the DTS package is "Get the data from the MSDE server". The second step is "Put the data on the MySQL server". At least, this is how the DTS package is designed.

How it works is something completely different.

With that discovery, I added a temporary, empty BFT to the MySQL database, changed the DTS package to use that instead of the RBFT, and scheduled a cron job to move the temporary BFT to the RBFT. After the move, all the rows in the temp BFT are deleted, cleaning it out for the next run.

No more ODBC "out of memory" issues. For now.

I'm not sure who is to blame here. After all, it is the ODBC driver that ran out of memory. That is a fact. But it is very tempting to point the finger at Microsoft, although they're just the middleman in this transaction. Did the DTS coordinator ask the ODBC driver for the entire table? Or does the ODBC driver always need a full copy of the data it's going to work on, whether you use it in a DTS package or as a data source for an Excel spreadsheet? Is this an issue with ODBC drivers in general?

I don't know. I do know I'm not interested in doing the research to find out the answers to these questions. I just want this shit to work.

When I Googled the problem, very few people were reporting any success with any of the posted solutions.

And nobody ever said "use a temporary table, dumbass".

No comments:

Post a Comment