Why you should avoid nested STI | ActiveRecord, Rails 6

Younes Serraj
Younes SerrajJun 2, 2020

Nested Single Table Inheritance doesn’t work well. Here’s what you must know to make it work or work around it.

Why you should avoid nested STI


Some context for illustration

I recently stumbled across the following scenario.

Initial specifications: a project owner creates a project and donors can contribute any amount of money to that project.

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
end

class User < ApplicationRecord
  # ...
end

class User::ProjectOwner < User
  # ...
end

class User::Donor < User
  # ...
end

class Project < ApplicationRecord
  # ...
end

class Contribution < ApplicationRecord
  # ...
end

Later, a little change was made to the specifications: a donor may either be a natural person (an individual human) or a legal person (a corporation or any other kind of legal entity).

Since both are donors and will share some significant amount of logic, it seems obvious that they are both a specialization of User::Donor, hence:

class User::Donor::Natural < User::Donor
  # ...
end

class User::Donor::Legal < User::Donor
  # ...
end

So far, this is classic OOP and we rely on ActiveRecord’s STI mechanism to do its magic (.find type inference and so forth).

Spoiler alert: it doesn’t work.



STI doesn’t play well with lazy code loading

This part is not specific to (nested) STI or to ActiveRecord but it’s worth knowing.

Given a recordless database (working on a new project):

User.count
# => 0

User.descendants
# => []

This is unexpected. I thought User.descendants would give me an array of all subclasses of User (%i[User::ProjectOwner User::Donor User::Donor::Natural User::Donor::Legal]) but I have none of that. Why??

You don’t expect a constant to exist unless it has been defined, do you? Well, unless you load the file that defines it, it won’t exist.

Here is roughly how it goes:

Me: …start a rails console…

Me: User.descendants
Me: #=> []

Me: puts "Did you know: you can clap for this article up to 50 times ;)" if User::Donor.is_a?(User)
Code loader: Oh, this `User::Donor` const does not exist yet, let me infer which file is supposed to define it and try to load it for you.
Code loader: Ok I found it and loaded it, you can proceed
Me: #=> "Did you know: you can clap for this article up to 50 times ;)"

Me: User.descendants
Me: #=> [User::Donor]

Me: puts "Another Brick In The Wall" if User::Pink.is_a?(User)
Code loader: Oh, this `User::Pink` const does not exist yet, let me infer which file is supposed to define it and try to load it for you.
Code loader: Sorry, this `User::Pink` is nowhere to be found, I hope you know how to rescue from NameError.
Me: #=> NameError (uninitialized constant #<Class:0x00007fb42cb92ef8>::Pink)

Now you see why lazy loading doesn’t play nice with Single Table Inheritance: unless you’ve already accessed every single one of your STI subclasses const names to preload them, they won’t be known to your app.

It’s not that STI doesn’t work at all, it’s just mildly frustrating because oftentimes we need to enumerate the STI hierarchy and there’s no easy, out-of-the-box way to do it.

Ruby on Rails’ guide mentions this issue and suggests an (incomplete) solutionhttps://guides.rubyonrails.org/autoloading_and_reloading_constants.html#single-table-inheritance

TL;DR: use a concern that collects all types from inheritance_column and force-preloads them.

Why it’s incomplete: because a subtype that has no record yet won’t be preloaded, which means there are things you won’t be able to do. For instance, you can’t rely on inflection to generate select options because recordless types won’t be listed in your options.

Another (really not recommended) solution would be to preload all your app’s classes. It’s killing a fly with a hammer.

My solution is based on the concern suggested by Rails’ guide but instead of collecting types from inheritance_column, I use an array that contains all of the STI’s subclasses. This way I can use inflection at will. I agree that it’s not 100% SOLID-complient but it’s a trade-off I’m willing to make.

That being said, let’s talk about the main topic of this article.



STI + lazy loading + nested models = unpredictable behavior

Single Table Inheritance is made for one base class and any number of subclasses you want as long as they all directly inherit from the base class.

Take a look at the two following samples. The first one works perfectly fine while the second will give you headaches.

# Working example

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
end

class User < ApplicationRecord
end

class User::ProjectOwner < User
  has_many :projects
end

class User::Donor < User
  has_many :contributions
end

class Project < ApplicationRecord
  belongs_to :project_owner, class_name: 'User::ProjectOwner', foreign_key: 'user_id'
end

class Contribution < ApplicationRecord
  belongs_to :project
  belongs_to :donor, class_name: 'User::Donor', foreign_key: 'user_id'
end

# Not working example

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
end

class User < ApplicationRecord
end

class User::ProjectOwner < User
  has_many :projects
end

class User::Donor < User
  has_many :contributions
end

class User::Donor::Natural < User::Donor
end

class User::Donor::Legal < User::Donor
end

class Project < ApplicationRecord
  belongs_to :project_owner, class_name: 'User::ProjectOwner', foreign_key: 'user_id'
end

class Contribution < ApplicationRecord
  belongs_to :project
  belongs_to :donor, class_name: 'User::Donor', foreign_key: 'user_id'
end

Why does the first one work in a predictable manner and not the second? Find out yourself by paying attention to the SQL queries:

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
end

class User < ApplicationRecord
end

class User::ProjectOwner < User
  has_many :projects
end

class User::Donor < User
  has_many :contributions
end

class Project < ApplicationRecord
  belongs_to :project_owner, class_name: 'User::ProjectOwner', foreign_key: 'user_id'
end

class Contribution < ApplicationRecord
  belongs_to :project
  belongs_to :donor, class_name: 'User::Donor', foreign_key: 'user_id'
end

# ...open a rails console...

project_owner = User::ProjectOwner.create
# => User::ProjectOwner(id: 1)

project = Project.create(project_owner: project_owner)
# => Project(id: 1, project_owner_id: 1)

donor = User::Donor.create
# => User::Donor(id: 1)

contribution = Contribution.create(donor: donor, project: project, amount: 100)
# => Contribution(id: 1, user_id: 1, project_id: 1, amount: 100)

# ...CLOSE the current rails console...

# ...OPEN a NEW rails console...

Contribution.last.donor
  Contribution Load (0.5ms)  SELECT "contributions".* FROM "contributions" ORDER BY "contributions"."id" DESC LIMIT $1  [["LIMIT", 1]]
  User::Donor Load (0.3ms)  SELECT "users".* FROM "users" WHERE "users"."type" = $1 AND "users"."id" = $2 LIMIT $3  [["type", "User::Donor"], ["id", 1], ["LIMIT", 1]]
# => User::Donor(id: 1)

Now with a nested STI (base class, mid-level subclass and leaf-level subclasses):

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
end

class User < ApplicationRecord
end

class User::ProjectOwner < User
  has_many :projects
end

class User::Donor < User
  has_many :contributions
end

class User::Donor::Natural < User::Donor
end

class User::Donor::Legal < User::Donor
end

class Project < ApplicationRecord
  belongs_to :project_owner, class_name: 'User::ProjectOwner', foreign_key: 'user_id'
end

class Contribution < ApplicationRecord
  belongs_to :project
  belongs_to :donor, class_name: 'User::Donor', foreign_key: 'user_id'
end

# ...open a rails console...

project_owner = User::ProjectOwner.create
# => User::ProjectOwner(id: 1)

project = Project.create(project_owner: project_owner)
# => Project(id: 1, project_owner_id: 1)

donor = User::Donor::Natural.create
# => User::Donor::Natural(id: 1)

contribution = Contribution.create(donor: donor, project: project, amount: 100)
# => Contribution(id: 1, user_id: 1, project_id: 1, amount: 100)

# ...CLOSE the current rails console...

# ...OPEN a NEW rails console...

Contribution.last.donor
  Contribution Load (0.5ms)  SELECT "contributions".* FROM "contributions" ORDER BY "contributions"."id" DESC LIMIT $1  [["LIMIT", 1]]
  User::Donor Load (0.3ms)  SELECT "users".* FROM "users" WHERE "users"."type" = $1 AND "users"."id" = $2 LIMIT $3  [["type", "User::Donor"], ["id", 1], ["LIMIT", 1]]
# => nil

See? The SQL query to find the donor associated to the contribution looks for the type User::Donor. Since my donor is a User::Donor::Natural, the record is not found. ActiveRecord isn’t aware that User::Donor::Natural is a subclass of User::Donor in the context of an STI unless I preload it first.

irb(main):001:0> User.all.pluck :id
   (0.9ms)  SELECT "users"."id" FROM "users"
=> [2, 1]
irb(main):002:0> User.exists?(1)
  User Exists? (0.3ms)  SELECT 1 AS one FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
=> true
irb(main):003:0> User::Donor.exists?(1)
  User::Donor Exists? (0.7ms)  SELECT 1 AS one FROM "users" WHERE "users"."type" = $1 AND "users"."id" = $2 LIMIT $3  [["type", "User::Donor"], ["id", 1], ["LIMIT", 1]]
=> false
irb(main):004:0> User::Donor::Natural.exists?(1)
  User::Donor::Natural Exists? (1.3ms)  SELECT 1 AS one FROM "users" WHERE "users"."type" = $1 AND "users"."id" = $2 LIMIT $3  [["type", "User::Donor::Natural"], ["id", 1], ["LIMIT", 1]]
=> true
irb(main):005:0> User::Donor.exists?(1)
  User::Donor Exists? (2.1ms)  SELECT 1 AS one FROM "users" WHERE "users"."type" IN ($1, $2) AND "users"."id" = $3 LIMIT $4  [["type", "User::Donor"], ["type", "User::Donor::Natural"], ["id", 1], ["LIMIT", 1]]
=> true

This is not okay to me. I would rather not take the risk of choosing an architecture whose behavior is uncertain because subject to code preloading.

ActiveRecord could’ve been designed to produce the following SQL statement:

SELECT * FROM users WHERE "users"."type" = "User::Donor" OR "users"."type" LIKE "User::Donor::%" AND "users"."id" = 1

Which would allow me to:

  • Request User.all and retrieve records of type: User, User::ProjectOwner, User::Donor, User::Donor::Natural, User::Donor::Legal

  • Request User::Donor.all and retrieve records of type: User::Donor, User::Donor::Natural, User::Donor::Legal without code preloading

  • Request User::Donor::Natural.all and retrieve records of type: User::Donor::Natural

  • Request User::Donor::Legal.all and retrieve records of type: User::Donor::Legal

But it behaves otherwise:

SELECT * FROM users WHERE "users"."type" = "User::Donor" AND "users"."id" = 1

Only when I preloaded User::Donor’s subclasses does it start allowing me to request User::Donor.all and retrieve records of type: User::Donor, User::Donor::Natural, User::Donor::Legal .

SELECT * FROM users WHERE "users"."type" IN ($1, $2, $3) AND "users"."id" = 1 [["type", "User::Donor"], ["type", "User::Donor::Natural"], ["type", "User::Donor::Legal"]]

One can put the blame on lazy code loading but I don’t. While I agree that inflection and lazy code loading cannot work hand in hand out-of-the-box, and since we can’t have a predictable/stable behavior from a mid-level model, it would be better to have AR’s documentation explicitely discourage nested STIs.

I’d rather not have a feature than one I can’t rely on.


Why does it work fine from the base class of a regular STI and not from a mid-level one?

The answer is found in the source code of ActiveRecord.

When accessing the relation, ActiveRecord adds a type condition if needed:

# https://github.com/rails/rails/blob/6bc7c478ba469ad4b033125d6798d48f36d6be3e/activerecord/lib/active_record/core.rb#L306

def relation
  relation = Relation.create(self)

  if finder_needs_type_condition? && !ignore_default_scope?
    relation.where!(type_condition)
    relation.create_with!(inheritance_column.to_s => sti_name)
  else
    relation
  end
end

To determine whether the type condition is needed, it does a couple of checks regarding the distance between the current class and ActiveRecord::Base as well as the presence of an inheritance column.

# https://github.com/rails/rails/blob/6bc7c478ba469ad4b033125d6798d48f36d6be3e/activerecord/lib/active_record/inheritance.rb#L74

# Returns +true+ if this does not need STI type condition. Returns
# +false+ if STI type condition needs to be applied.
def descends_from_active_record?
  if self == Base
    false
  elsif superclass.abstract_class?
    superclass.descends_from_active_record?
  else
    superclass == Base || !columns_hash.include?(inheritance_column)
  end
end

def finder_needs_type_condition? #:nodoc:
  # This is like this because benchmarking justifies the strange :false stuff
  :true == (@finder_needs_type_condition ||= descends_from_active_record? ? :false : :true)
end

The type condition is built as follows:

# https://github.com/rails/rails/blob/6bc7c478ba469ad4b033125d6798d48f36d6be3e/activerecord/lib/active_record/inheritance.rb#L262

def type_condition(table = arel_table)
  sti_column = arel_attribute(inheritance_column, table)
  sti_names  = ([self] + descendants).map(&:sti_name)

  predicate_builder.build(sti_column, sti_names)
end

To sum up:

  • When requesting from the base class (in my example: User), no type condition is added.

     

    Since it’s listing all records of the table, it gives access to all records whose class is or inherits from User. Perfect.

  • When requesting from a leaf subclass, the exact type must be matched for the record to be found. Logical.

  • When requesting from a mid-level subclass such as User::Donor (neither the base class User nor a leaf User::Donor::Natural), it depends. As expected, records of type User::Donor are loaded. On the other hand, records whose class inherits from User::Donor will be selected only if their class is preloaded.


Is there a workaround?

There always is.

We could consider patching ActiveRecord, making it use LIKE in the SQL query as an alternative condition to the actual strict string comparison. Problem: I didn’t run any benchmark but it will certainly slow down database reading. Though it’s a working solution, it is inefficient, requires a lot of work to patch ActiveRecord and, frankly, we’re not even sure the Rails core team would merge such a patch.

Another workaround would be to override the default scope of User::Donor to make it use a LIKE statement as described above. I’m not a huge fan of default scopes because the day always comes when we need to use .unscope and voilà it doesn’t work anymore. It’s not a sustainable solution IMO.

Yet another solution could be to preload subclasses, for instance with the solution discussed earlier. I guess it’s an acceptable one.

One more solution is to roll back to a simpler architecture that does not let any room for behavior changes: no mid-level subclasses, no preloading required. How do I not repeat myself for the common code shared by User::Donor::Natural and User::Donor::Legal, you ask?

Using concerns.

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
end

class User < ApplicationRecord
  scope :donors, -> { where(type: ['User::DonorNatural', 'User::DonorLegal']) }
  scope :project_owners, -> { where(type: 'User::ProjectOwner') }
end

class User::ProjectOwner < User
end

class User::DonorNatural < User
  include User::DonorConcern
end

class User::DonorLegal < User
  include User::DonorConcern
end

module User::DonorConcern
  extend ActiveSupport::Concern

  included do
    has_many :contributions, foreign_key: 'user_id', inverse_of: :donor
  end
end

class Project < ApplicationRecord
  belongs_to :project_owner, class_name: 'User::ProjectOwner', foreign_key: 'user_id'
end

class Contribution < ApplicationRecord
  belongs_to :project
  belongs_to :donor, class_name: 'User', foreign_key: 'user_id', inverse_of: :contributions
end

There is still room for improvement (this code is intentionally oversimplified, no validations whatsoever) to make this article easier to read, my goal being to give you the essential information so that you can choose your own favorite solution in an informed way.


My favorite solutions

When possible, I’d rather have a simpler architecture (no intermediate layers). The less complex it is, the less headaches I have.

When I must have this intermediate layer, I’ll preload all subclasses of my STI to avoid any behavior randomness. And I mean all subclasses of my STI, not just the ones having records in the database.

module UserStiPreloadConcern
  unless Rails.application.config.eager_load
    extend ActiveSupport::Concern

    included do
      cattr_accessor :preloaded, instance_accessor: false
    end

    class_methods do
      def descendants
        preload_sti unless preloaded
        super
      end

      def preload_sti
        user_subclasses = [
          "User::ProjectOwner",
          "User::Donor",
          "User::Donor::Natural",
          "User::Donor::Legal"
        ]

        user_subclasses.each do |type|
          type.constantize
        end

        self.preloaded = true
      end
    end
  end
end

Thanks for reading!

Partager
Younes Serraj
Younes SerrajJun 2, 2020

Capsens' blog

Capsens is an agency specialized in the development of fintech solutions. We love startups, scrum methodology, Ruby and React.