TIL: Running Python code as a migration operation in Django

2023-08-21

When I started learning Django, I was confused with the difference between makemigrations and migrate commands. I got it after a while, but I never truly had to work with complex migrations or write custome ones. I recently had to write a custom migration to change a ForeignKey field to a OneToOneField and I used the RunSQL special operation which allows to run raw sql queries. It allowed me to better understand migrations and gave me the confidence to push my changes in production.

Now I wanted to refactor an unnecessary many-to-many model into a one-to-one field in another model. The models looked like the following code:

from accounts import models as account_models


# the model I want to get rid of
class HomeAddress(BaseModel):
    home = models.ForeignKey(Home, on_delete=models.CASCADE)
    address = models.ForeignKey(
        account_models.Address,
        on_delete=models.CASCADE,
    )
    class Meta:
        verbose_name_plural = "home addresses"


class Home(BaseModel):
    name = models.CharField(max_length=50, default="Home")
    address = models.OneToOneField(  # the field I want to replace the model with
        account_models.Address,
        on_delete=models.PROTECT,
    )

The challenge was to copy the Address refered in each HomeAddress row to the refered Home. I wasn't sure how I would solve this and dreaded the raw SQL query, not wanting to mess up the production data. I asked ChatGPT for some inspiration and it suggested using the RunPython special operation. You first declare a function that takes two arguments; the first being the name of the app containing the historical models that matches the operation and the second is an instance of SchemaEditor the class that turns operations into SQL. The function code describes the changes that will be applied.

A second function is declared that accepts the same two arguments and its code should undo what has been done by the first function. This makes the migration reversible otherwise the changes are permanent. The two functions are passed to the RunPython function called inside the operations list of the Migration class. For database backends that do not support Data Definition Language (DDL) transactions (CREATE, ALTER, etc.), RunPython will run its content inside a transaction.

For databases that do support DDL like PostgreSQL, no other transactions are added besides the one generated by a migration. A migration with both schema changes and RunPython operation will raise an exception stating the changes cannot be applied because it has pending trigger events.

What I ended up doing is generating a migration to add the Home.address field, then generated an empty migration and wrote the following code:

from django.db import migrations

from accounts import models as account_models


def replace_home_with_address(apps, schema_editor):
    home_address_model = apps.get_model('my_app', 'HomeAddress')

    for home_address in home_address_model.objects.all():
        address = home_address.address
        address = account_models.Address.objects.create(
	    address_line_1=f"123 Fake Street",
	    address_line_2="Building 1",
	    city="Test City",
	    zip_code="12345",
	    country_id=home_address.address.country.id,
	    subdivision_id=home_address.address.subdivision.id,
        )
        home_address.home.address_id = address.id
        home_address.home.save()


class Migration(migrations.Migration):

    dependencies = [
        ('my_app', '0002_auto_20230821_1308'),
    ]

    operations = [
        migrations.RunPython(replace_home_with_address),
    ]

So here, I loop over the HomeAddress rows and for each one of them, I put the HomeAddress.address reference into HomeAddress.home.address and save. Note that for readability purpose, I didn't include the query prefetching code.