This is probably the most visited post I’ve ever written, and it still gets hits now, hence the repost. There are probably other or better ways of doing this now.
UPDATE (23rd April 2012): I’ve just been pointed to the Apartment gem which looks like it could be very useful for anyone doing this (although I haven’t tried it myself). Thanks for the heads up, @johnjoseph_code.
First steps: reducing memory usage of multiple applications
For the last 12 months or so, we’ve been deploying each new account as a separate application instance, virtual hosted via a subdomain per application. This was a great (and easy) way of sandboxing data, but our memory usage has been increasing dramatically recently as we increased the number of instances. I’ve spent the last three weeks looking in detail at what we can do to reduce memory usage.
We’re actually running fairly low-load applications: our big drivers are security, uptime, and speed – in that order. Our server stack hasn’t changed and won’t be changing in the next few months – our cloud network is currently a few networked application servers, a database server, and an API server. Just for sheer sanity purposes, we want to keep the number of application servers down – and that’s not even counting the running cost. Our goal has been about 15-20 applications per 2-3GB of RAM.
We started out on an Apache/Passenger stack, which seemed to be averaging out about 150mb per instance – meaning we could fit perhaps 15 applications on a server. The bigger issue, though, was that at times of high load, it’d burst up to about 600mb, taking the server down.
The first real shot at fixing this was to move to an nginx stack running Unicorn to serve the Rails applications. There’s a lot of examples of this usage – such as Github – and so I thought this would be worth a shot. We ended up seeing usage of around 95mb for the master process, with the same again for each worker process. But to make this actually usable, we were needing 3 or 4 worker processes per application – even if I’d have settled on 3 worker processes that’s pretty much 400mb gone per instance. It just wasn’t going to hold up.
A question: are we tackling the wrong problem?
So we went back to the drawing board, and worked out what separate application instances was actually giving us. An ease of data sandboxing, but that was entirely due to the separate databases – which is the only thing we’re not willing to concede. But as for the application code – well, it’s all shared, surely? There’s just no reason to load up the entire Rails environment separately.
We needed to hang onto the same access method as we have at the moment – a separate subdomain per school – and I wondered if it would be possible to switch the database being loaded by Rails based on the hostname being accessed.
Connecting to the database based on the hostname
It turns out, this is actually really easy, but there’s a few… caveats… to forcing Rails to work like this.
It starts out with a fairly simple call to ActiveRecord::Base.establish_connection(). This has to be made in ApplicationController because it’s the only place we can access the request object.
The first issue with this is that Rails goes through its standard initialisation routine – so it will expect to find a database it can connect for your default development or production environment in database.yml. This means you have to have a dummy database – with the same structure as your actual application databases – made available to Rails.
The caching problem
This works without a hitch in the development environment, but in production, when Rails is set to config.cache_classes = true, things can get a little weirder. I’ve been seeing a very intermittent bug when trying to save a model – any model:
name' for nil:NilClassvisit_Arel_Nodes_InsertStatement’
arel (2.0.10) lib/arel/visitors/to_sql.rb:56:in
This is actually being thrown by Arel. My biggest fear – that caching classes would mean data is available to other applications – just isn’t happening, through many many tests. However, Arel caches table definitions – field names, for example – and sometimes seems to get confused (returning a nil column name), even if the “default” and “active” database schemas are identical.
There are two methods available to clear this cache before the connection is established. In Rails 3.1.x, ActiveRecord::Base.clear_cache! has been added. I’m still running a 3.0.10 codebase for this application, and so we ended up going with ActiveRecord::Base.connection_pool.clear_reloadable_connections!. This makes the full code as below:
I haven’t managed to throw the load at it to see if this holds up yet, because it does seem to be a very intermittent bug. I’ll update this post over the next few days when I find out if this has fixed it.
UPDATE 19/10/11: This doesn’t fix it – I haven’t had chance to check the Rails 3.1 method yet, but so far on 3.0.10 the only thing I’ve found that’s a fix is to set config.cache_classes to false in production. Also interesting – I’m unable to reproduce in development environment, even with a production configuration.
Deploying and migrations
This is another one of those gotchas – how do you run migrations against an environment that doesn’t exist? I first looked at hijacking Rails’ migrations to loop over a list of databases every time it runs a migration, but because Rails is connected to the default database, if that’s fully migrated, it won’t even load the migration file.
The solution I found is by no means pretty. In your database.yml, this requires a separate environment creating for each database. Migrations can then be run using rake db:migrate RAILS_ENV=some_database_connection. This also creates a new log file for each “environment”, but this seems like a small price to pay.
This is the faux-code I’m using in my Capistrano deploy.rb at the moment. This just overrides the migrations task to connect to each database and run the migrations. It’ll be a short edit to read in every environment from database.yml instead, ignore development, production, and test, and loop over that instead of an array – but this works for now.
So where from here?
This solution seems to work, assuming the caching of table structures issue doesn’t rear its head again. It’s certainly serving our purposes very well. Memory usage (now running nginx and Passenger) is down to about 150mb – one application – that is happily serving 15 installations’ worth of concurrent users. It’s a serious gain on hosting resources to what we were seeing.
But the biggest issue for me is just how much this feels like going against how Rails works. It’s obviously a specialised use-case, but from speaking to people at MagRails yesterday, I’m not the only person who’s come across this. I had a good discussion with Aaron Patterson yesterday about seeing if there’s anything that can be done at a Rails core level to support this, too.
If anyone else is doing this, or has any feedback, I’d really appreciate hearing thoughts!