How to add index to a big table of your Rails app
How to add index to a big table of your Rails app
When your application is successful, some of the tables can grow pretty big — I’m looking at you users table. If you’re curious enough, you periodically check how your database performs. If any slow query pops up in the metrics, there’s a great chance that some index is missing.
State of the DB engines
While most modern database engines can create indexes in an asynchronous, non–blocking manner, it’s good to get familiar with all the exceptions from this rule. I highly recommend reading the documentation of PostgreSQL, MySQL. I’m not sure about SQLite here as the documentation doesn’t clearly state that. However, my quick chit–chat with LLM may give you some insights.
What’s the problem then?
As you already know CREATE INDEX
statement will be handled asynchronously by the database if appropriate algorithm is used. This means that no reads, writes and update will be blocked.
Typically, for the Rails application, you’ll run a migration via Ruby process during deployment, using all the goodies from ActiveRecord::Migration
class and its surroundings.
Let’s imagine the database schema like that:
ActiveRecord::Schema[7.1].define(version: 2024_06_13_121701) do
create_table "users", force: :cascade do |t|
t.string "email"
t.boolean "active"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
end
Imagine you want to quickly find all those users which didn’t activate the account:
User.where(active: false)
If you have enough users, speaking of dozens or hundreds of millions, doing full table scan could simply kill the database performance. Full table scan happens when database has no index to use and need to check every row whether it meets the criteria.
I will stick with PostreSQL in the examples if not stated otherwise.
Obvious solution
Let’s add the index then:
➜ trololo git:(master) bin/rails g migration AddIndexOnActiveUsers
invoke active_record
create db/migrate/20240613121751_add_index_on_active_users.rb
Implementation:
# PostgreSQL
class AddIndexOnActiveUsers < ActiveRecord::Migration[7.1]
disable_ddl_transaction!
def change
add_index :users, :active, algorithm: :concurrently
end
end
# MySQL
class AddIndexOnActiveUsers < ActiveRecord::Migration[7.1]
def change
add_index :users, :active, algorithm: :inplace
end
end
Let’s run it:
➜ trololo git:(master) bin/rails db:migrate
== 20240613121751 AddIndexOnActiveUsers: migrating ============================
-- add_index(:users, :active)
-> 0.0013s
== 20240613121751 AddIndexOnActiveUsers: migrated (0.0013s) ===================
Ok, it was quite fast, right? That’s correct for your dev machine, not necessarily for the production setup. Now magnify this time 0.0013s
by 6 or 7 orders of magnitude if your table is big enough:
➜ trololo git:(master) bin/rails db:migrate
== 20240613121751 AddIndexOnActiveUsers: migrating ============================
-- add_index(:users, :active)
-> 13000.4928s
== 20240613121751 AddIndexOnActiveUsers: migrated (0.0013s) ===================
I’ll do the maths for you: 1300.4928s means 21 minutes 40.49 seconds. But it can be even longer — don’t ask me how I found about this.
While the process of migration will end up eventually, the migration blocking any other deployments to your application during this time may be unacceptable for various reasons:
- you cannot release any other change to production until migration completes
- something other goes wrong and you need to rapidly deploy a hotfix, but you can’t since the deployment is blocked by long running migration™
- process manager on a deployment machine may expect output within, e.g. 5 minutes. Long running migration will get killed in such scenario.
What to do then?
Simply skip the migration body for RAILS_ENV=production
:
class AddIndexOnActiveUsers < ActiveRecord::Migration[7.1]
disable_ddl_transaction!
def change
add_index :users, :active, algorithm: :concurrently, if_not_exists: true unless Rails.env.production?
end
end
The migration will be executed on production environment, but will have no effect on your database. schema.rb
or structure.sql
(depending on what you use) will be aligned. Sufficient entry will also appear in schema_migrations
table. All the developers will have index added on their local databases, test environment will be aligned too.
But Hey, where’s my index on production?! you might ask. And that’s a pretty valid question. What you’ll need is a way to run a rails runner or rake task on production. Depending on the capabilities you have, you might choose to:
- Run it within
bin/rails console
:
# PostgreSQL
ActiveRecord::Migration.add_index :users, :active, algorithm: :concurrently, if_not_exists: true
# MySQL
ActiveRecord::Migration.add_index :users, :active, algorithm: :inplace, if_not_exists: true
- Do the same via
bin/rails runner
:
# PostgreSQL
bin/rails r "ActiveRecord::Migration.add_index :users, :active, algorithm: :concurrently, if_not_exists: true"
#MySQL
bin/rails r "ActiveRecord::Migration.add_index :users, :active, algorithm: :inplace, if_not_exists: true"
- Last, but not least, implement a
Rake
task. It has the advantage that it has to be committed to the repository so you don’t lose the history what’s happened:
# PostgreSQL
namespace :indexes do
task add_index_on_active_users: :environment do
ActiveRecord::Migration.add_index :users, :active, algorithm: :concurrently, if_not_exists: true
end
end
# MySQL
namespace :indexes do
task add_index_on_active_users: :environment do
ActiveRecord::Migration.add_index :users, :active, algorithm: :inplace, if_not_exists: true
end
end
Execute it with bin/rails indexes:add_index_on_active_users
.
For the last option it’s also easy to enhance it with logging to easily identify execution in Grafana, Datadog or any other tool you use for your logs.
namespace :indexes do
task add_index_on_active_users: :environment do
Rails.logger.info("task indexes:add_index_on_active_users started")
ActiveRecord::Migration.add_index :users, :active, algorithm: :concurrently, if_not_exists: true
Rails.logger.info("task indexes:add_index_on_active_users finished”)
end
end
Tiny details
If you’re aware enough, you probably spotted if_not_exists: true
flag. We like idempotence and that’s the reason. If anyone runs this task again, nothing will happen. If you prefer to see ActiveRecord::StatementInvalid
instead, feel free to skip it.
As mentioned in the preface, to use appropriate algorithm for index creation algorithm: :concurrently
for PostgreSQL and algorithm: inplace
for MySQL has to be specified.
There’s another quirk for PostgreSQL:
Another difference is that a regular
CREATE INDEX
command can be performed within a transaction block, butCREATE INDEX CONCURRENTLY
cannot.
Each Rails migration execution is wrapped within transaction. To disable this behavior, you need to use disable_ddl_transaction!
method within your migration. Otherwise, following error will pop up:
bin/rails aborted!
StandardError: An error has occurred, this and all later migrations canceled: (StandardError)
PG::ActiveSqlTransaction: ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
/Users/fidel/code/fidel/trololo/db/migrate/20240613121751_add_index_on_active_users.rb:5:in `change'
Caused by:
ActiveRecord::StatementInvalid: PG::ActiveSqlTransaction: ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block (ActiveRecord::StatementInvalid)
/Users/fidel/code/fidel/trololo/db/migrate/20240613121751_add_index_on_active_users.rb:5:in `change'
Caused by:
PG::ActiveSqlTransaction: ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block (PG::ActiveSqlTransaction)
/Users/fidel/code/fidel/trololo/db/migrate/20240613121751_add_index_on_active_users.rb:5:in `change'
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
However, it’s not a problem for our custom script or Rake
task run on production environment.