Alex's Tech Rants: testing

Wednesday, January 1, 2014

Unit testing SQLAlchemy apps, part 2: the universal method

The problem

In an earlier post I presented a method for unit testing SQLAlchemy applications cleanly, without leaving any trace of the tables created during the run of the test suite. However, the problem with this method was that it required the target RDBMS to support both transactional DDL and nested transactions. This requirement left two prominent vendors in the cold: MySQL and Oracle.

What didn't work

In search for a solution, I was reminded of the existence of a feature called temporary tables. This seemed like a perfect fit for the problem! Most RDBMS's can create both "local" and "global" temporary tables. Local temporary tables are only visible to the connection that created them, while global temporary tables are structurally visible to all connections, but the data still remains specific to each connection. Unfortunately, Oracle only supports global temporary tables which are unusable for unit testing since they stick around after the test suite has finished. MySQL, however, does support local temporary tables, so I modified the original unit testing code to create all the tables as temporary. But that didn't work either because apparently foreign keys between temporary tables aren't allowed.

Since temporary tables didn't provide a working solution, I had to look into other alternatives. The most obvious one would of course be to reflect the metadata at the beginning and use metadata.drop_all(). This approach, however, has one subtle issue – circular relationships. If the tables are linked in a cyclic relationship, like A → B → C → A, then these tables can't be dropped without first dropping the foreign key constraints. But then I heard of the DROP TABLE ... CASCADE command which supposedly drops the foreign keys too along the way. This got my hopes up, even though SQLAlchemy didn't support this directly. Those hopes, however, died quickly when I looked at the MySQL documentation and saw this:

RESTRICT and CASCADE are permitted to make porting easier. In MySQL 5.6, they do nothing.

The solution

With this, I had to admit defeat and settle for the lowest common denominator. As such, the revised testing process goes as follows:

Reflect the metadata from the database reserved for unit testing
Drop all foreign key constraints using the old metadata
Drop all tables using the reflected metadata
Create all tables from the current model
Add any base data (fixtures) and commit the session
Prevent your application and framework from committing or closing the session
Run the tests, rolling back the transaction at the end of each test

Step 1 is necessary because there's no telling how much your model has changed between test runs, so simply running drop_all() with your current model's metadata is not guaranteed to do the right thing.
Step 2 is necessary because of potential circular foreign key dependencies preventing some tables from being dropped (see the previous section for a deeper explanation).
Step 6 is necessary for two reasons:

Allowing commit() would break test isolation by leaking database changes to other tests
Allowing the session to close would mean that any changes made between requests within a single test would be rolled back

Finally, a couple reminders:

Remember to point the connection URI to a dedicated testing database so you don't lose development (not to mention production) data
When testing on MySQL, use InnoDB tables since MyISAM doesn't support transactions

Putting it to practice with Flask and py.test

I developed this testing method to run unit tests on a new, mid-sized Flask based web app of mine that had to use MySQL as its data store for compatibility with an older version. I've also recently migrated from nose to the wonderful py.test testing framework.

The following code is a generic testing example, adapted from my application's test suite. It works with a single-db configuration, but could be adapted to a multi-db configuration. I've tested it against MySQL 5.5, PostgreSQL 9.1 and SQLite 2.6.0.

It should be mentioned that I experimented with applying the faster method (based on nested transactions) but my test suite ran only 0.6 seconds faster with it and the approach required a whole different code path on several fixtures so I decided to drop it.

from sqlalchemy.schema import MetaData, DropConstraint
import pytest

from yourapp import db, create_app


@pytest.fixture(scope='session')
def app(request):
    return create_app()


@pytest.fixture(scope='session', autouse=True)
def setup_db(request, app):
    # Clear out any existing tables
    metadata = MetaData(db.engine)
    metadata.reflect()
    for table in metadata.tables.values():
        for fk in table.foreign_keys:
            db.engine.execute(DropConstraint(fk.constraint))
    metadata.drop_all()

    # Create the tables based on the current model
    db.create_all()

    # Add base data here
    # ...
    db.session.flush()
    db.session.expunge_all()
    db.session.commit()


@pytest.fixture(autouse=True)
def dbsession(request, monkeypatch):
    # Roll back at the end of every test
    request.addfinalizer(db.session.remove)

    # Prevent the session from closing (make it a no-op) and
    # committing (redirect to flush() instead)
    monkeypatch.setattr(db.session, 'commit', db.session.flush)
    monkeypatch.setattr(db.session, 'remove', lambda: None)


def test_example(app):
    with app.test_client() as client:
        response = client.get('/some/path')
    
    assert response.status_code == 200
    assert db.session.query(Foo).count() == 5

Sunday, August 25, 2013

Unit testing SQLAlchemy apps

I help out a lot on the #sqlalchemy channel on Freenode IRC. When people ask questions there, one of the issues that comes up often is unit testing applications that use SQLAlchemy. Almost all developers instinctively use SQLite for testing due to its simplicity and the possibility to use an in-memory database that leaves no garbage behind to be cleaned up. It seems like a clean and easy solution, yes? Well, my answer is that if you don't plan on deploying with SQLite support, don't test on SQLite! Always use the production RDBMS for testing. Going to deploy with PostgreSQL? Then test with PostgreSQL! Why?

There are at least two good reasons why testing and deploying with the same RDBMS is a good idea. The first and foremost is that SQLite is vastly different from other RDBMS's. For one, it does not really enforce column types, so code that erroneously inputs data of the wrong type won't cause errors when it should. There are also many semantic differences on how embedded and your typical client-server RDBMS's work, so you may run into bugs that only occur in production while all the tests pass just fine. The second reason is that SQLite's rather modest design, which lets it fit into small memory spaces, is also a big hindrance since it can't support some more advanced database features like window functions or recursive queries. This shortcoming prevents you from taking full advantage of the features of your chosen RDBMS.

If I managed to convince you, then you'll probably be asking how testing should be done on RDBMS's other than SQLite. The answer boils down to whether your RDBMS supports two crucial features: nested transactions and transactional DDL. Nested transactions are savepoints within a transaction, to which you can roll back without losing any changes done before the savepoint. Transactional DDL means that in addition to normal data modification (INSERT, UPDATE, DELETE), schema changes are also transactional. That means they can be rolled back, which is a very nice thing to have when unit testing. According to the article linked to above, the following RDBMS's support transactional DDL: PostgreSQL, SyBase, Microsoft SQL Server, DB2, Informix and Firebird. Most notably, MySQL and Oracle do not support it.

If your RDBMS does support the aforementioned two features, then you can conveniently test your SQLAlchemy apps in the following manner:

Make sure you have an empty database for unit testing
Create the engine, create one connection and start a transaction in it
Create the tables
Optionally, insert test fixtures
For every test, repeat:

Create a savepoint
Run the test
Roll back to the savepoint

Roll back the transaction

This way all the tests are isolated from each other. Every tests gets the same "blank slate" as far as the database state is concerned. After the tests are done, the database will look just as empty as before the test suite was run, even if the tests were interrupted.

So how to actually accomplish this in practice? I'll give you an example using PostgreSQL and nose. This example should be adaptable for other equally capable test runners. The following code should be placed in __init__.py of the root package of your tests.

from sqlalchemy.engine import create_engine
from sqlalchemy.orm.session import Session

from your.package import Base  # This is your declarative base class


def setup_module():
    global transaction, connection, engine

    # Connect to the database and create the schema within a transaction
    engine = create_engine('postgresql:///yourdb')
    connection = engine.connect()
    transaction = connection.begin()
    Base.metadata.create_all(connection)

    # If you want to insert fixtures to the DB, do it here


def teardown_module():
    # Roll back the top level transaction and disconnect from the database
    transaction.rollback()
    connection.close()
    engine.dispose()
 

class DatabaseTest(object):
    def setup(self):
        self.__transaction = connection.begin_nested()
        self.session = Session(connection)

    def teardown(self):
        self.session.close()
        self.__transaction.rollback()

To take advantage of this setup, your test class should inherit from DatabaseTest. If you override either setup() or teardown(), make sure you remember to call the superclass implementations too.

If you want to use this testing scheme with your web framework or other application framework, you should be aware of the requirement that the framework's SQLAlchemy extension/middleware/whatever must have a way to receive the connectable (Engine or Connection) as a Python object and not just as a connection URL.