Interesting Problem with MySQL & SQLAlchemy

2012-06-12 14:31

While at work a few days ago, I had an interesting albeit weird problem which started with the following cryptic error message:

ERROR 1005: can’t create table `qtn_formdisplay_product` (errno: 150)

It was produced by a local MySQL server running on my development machine when I tried to rebuild test database to accommodate for some model changes happening in the codebase. As you might have noticed, it’s not terribly informative, with the errno number as the only useful tidbit. A cursory glance at top search result for this message said that the most probable cause was a malformed FOREIGN KEY constraint inside the offending CREATE TABLE query.

Upon reading this, I blinked several times; something here was definitely off. The query wasn’t of course written by hand – if it was, we could at least consider an actual mistake to be a problem here. But no, it came from ORM – and not just an ORM, but the best ORM known to mankind. While obviously nothing is perfect, I would think it’s extremely unlikely that I found a serious bug in a widely used library just by doing something as innocent as creating a table with foreign key. I’m not that good, after all ;)

Well, except that it could totally be such a bug. The before mentioned search results also pointed to MySQL issue tracker where it was suggested that the error might happen after trying to create foreign key constraint with duplicate name. Supposedly, this could “corrupt” the parent table and no new FOREIGN KEYs could reference it anymore, yielding the errno 150 if we attempted to create one. While it could not explain the behavior I observed (the parent table was freshly created), it raised some doubts whether MySQL itself may be to blame here.

These were exacerbated when one of my colleagues tried out the same procedure, and it worked for him just fine. He turned out to use newer version of MySQL, though: 5.5 versus 5.1. This appeared to support the hypothesis about a possible bug in MySQL but it didn’t seem to help one bit to get the thing running on the older version.

However, it was an important clue that something relevant changed in between, that had an influence on the whole issue. It was not really any particular bugfix or new feature: it was a change of defaults.

Engine-o-logy

As you might know, the MySQL database management system is in this peculiar place where it uses two quite distinct storage engines at the same time – potentially in the same database, too. The older and simpler MyISAM is a winner for read speed but it lacks support for transactional update of stored data. The newer InnoDB, on the other hand, offers both ACID transactions and foreign key constraints.

Those two engines can coexist in a single database, handling their own tables separately. Which engine a particular table belongs to is decided at its creation time (in CREATE TABLE statement), and there is a default engine to be used if no explicit setting is specified.

And this default changed in version 5.5: from MyISAM previously to InnoDB now.

MyOwnDefault

But, frankly, this change shouldn’t have mattered at all. Any reasonable application that uses MySQL is storing its data inside InnoDB tables. Sure, they might be some specific performance needs which are better addressed with MyISAM (or a NoSQL database *wink*), but it doesn’t change the fact that InnoDB is a de facto default.

Of course, prior to 5.5, it’s a “default” that you actually have to specify in your application’s code. Here’s how it is done in SQLAlchemy using declarative style:

  1. from sqlalchemy.ext.declarative import declarative_base
  2.  
  3. class _Base(object):
  4.     """Base class for SQLAlchemy model classes."""
  5.     __table_args__ = {'mysql_engine': 'InnoDB'}
  6.  
  7. Base = declarative_base(cls=_Base)

As you inherit from declarative Base, the __table_args__ are propagated accordingly. This way, every ORM class will be backed with MySQL table that correctly uses the InnoDB engine. Hence a model class such as this:

  1. from sqlalchemy import Column, Integer, Text
  2.  
  3. class Foo(Base):
  4.     __tablename__ = 'foos'
  5.     id = Column(Integer, primary_key=True)
  6.     stuff = Column(Text)

shall result in emitting the following DDL query when creating tables:

  1. CREATE TABLE `foos` (
  2.     id INTEGER NOT NULL AUTO_INCREMENT,
  3.     stuff TEXT,
  4.     PRIMARY KEY (id)
  5. ) ENGINE=InnoDB;

The important part is of course ENGINE=InnoDB – a direct translation of the Base.__table_args__ dictionary.

Unintentional override

Yet, with all this code in right place, I found that one table is still created with MySQL-default storage engine. And surely enough, it was one of those referenced by FOREIGN KEY constraint in the infamous CREATE TABLE query which resulted in errno 150. Essentially, then, the error was somewhat accurate: there was indeed a problem with FOREIGN KEY.

How did that happen, though? The __table_args__ from Base clearly stated that 'mysql_engine' should be 'InnoDB', and I haven’t changed this setting… right?

Oh wait, I actually did:

  1. from sqlalchemy.schema import UniqueConstraint
  2.  
  3. class FormDisplay(Base):
  4.     __table_args__ = (UniqueConstraint('form_id', 'event_id'), {})
  5.     # ... column definition ...

I needed a multi-column UNIQUE constraint, and the only way to specify it is through arguments to underlying Table object – which is exactly what __table_args__ attribute contains. It can be a dictionary, like in Base, but if we need positional arguments (such as constraints), it has to be a tuple. A tuple with a dictionary at the end, because we can still supply keyword arguments there; arguments like, you know, the 'mysql_engine'.

It would help then if that dictionary wasn’t empty, right? :) Let’s just fix it, then:

  1. __table_args__ = (UniqueConstraint('form_id', 'event_id'), Base.__table_args__)

And that’s it, problem solved.

I’m not entirely comfortable with the fact that SQLAlchemy doesn’t do that by default, although I can understand the reasoning behind this decision. It’s desired that any model class can completely specify their __table_args__, but this also means it has to do that. If it needs to reuse the values from base class, it must do so explicitly.

It’s a trade-off I can live with, personally, even if it leads to amusing bugs such as the one I just described. And if someone really doesn’t like it, well… There aren’t many problems in Python that cannot be a solved with carefully placed metaclass ;-)

Tags: , , , ,
Author: Xion, posted under Computer Science & IT »


One comment for post “Interesting Problem with MySQL & SQLAlchemy”.
  1. Anonymous:
    March 17th, 2013 o 2:07

    Thanks for the hint on Base.__table_args__

Comments are disabled.
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.