FiveRuns Blog

On Rails production performance and monitoring

Posts
1 comment

Introducing DataFabric

One of the lingering issues we’ve had to deal with over the last year in the Manage service is ActiveRecord’s reluctance to talk to more than one database. It’s really quite stubborn in that regard and while there are a few solutions out there, none of them worked quite like we wanted. Some required intrusive application-level code changes, some didn’t offer the features we needed, so we bit the bullet and wrote what we needed.

Specifically we needed two features to scale our mysql database: application-level sharding and master/slave replication. Sharding is the process of splitting a dataset across many independent databases. This often happens based on geographical region (e.g. craigslist) or user account (e.g. flickr). Replication provides a near-real-time copy of a database which can be used for fault tolerance and to reduce load on the master node. Combined, you get a scalable database solution which does not require huge hardware to scale to huge volumes. DataFabric extends ActiveRecord’s standard connection handling to provide these two features.

To install, invoke the usual magic:

gem install data_fabric

Add DataFabric to your Rails 2.1 gems listing in config/environment.rb:

config.gem "data_fabric"

Annotate your sharded models with your desired sharding and replication setup:

class Auction < ActiveRecord::Base
  data_fabric :shard_by => :city, :replicated => true

Let’s assume we are sharding based on the city associated with the request (i.e. the Craigslist model). You’ll need to add the necessary database connections to your config/database.yml for each city based on the naming convention DataFabric uses. See the README for details.

Finally your application controller will activate the city’s shard for each request:

class ApplicationController < ActionController::Base
  around_filter :select_shard

  private
  def select_shard(&block)
    DataFabric.activate_shard(:city => @current_city, &block)
  end
end

Now, whenever you do anything with the Auction model, it will only affect the current shard. Auction.find(:all) will find all auctions within that shard. The converse is also true: you can’t do anything with the Auction model until you set a shard. Note that you can just set the replicated flag without the shard_by flag; DataFabric will act just like Rick Olson’s Masochism plugin.

We’re releasing DataFabric on github for others to use as they see fit. Take a look at the README on github for technical details and code samples. We’ve used it successfully with ActiveRecord 2.0.2 and 2.1. There are some areas which can be painful to deal with, notably migrations and fixtures, but we have both working in production here so you can overcome. :-) I’ll give you a hint: the example application might help.

Good luck and let us know what you think!

Bookmark and Share
Continued Discussion

1 response to this entry

Have you tried this plugin with thinking_sphinx? Masochism is supposed to have issues. I don’t want to use the sharding features yet, just send reads to the slave. I use ts though and need it to work, thanks.

Erik Landerholm Erik Landerholm said:

on June 20, 2009 at 11:50 PM

Contribute

Continue the conversation and share your thoughts. A name is required. Your e-mail address will not be displayed on the site. Textile formatting may be used in your comments (but will not be rendered in the live comment preview).

→ Posted by You on June 26, 2009 at 12:12 AM

Flickr

FiveRuns tagged photos on Flickr.

  • fiveruns-penn-and-teller-9
  • Eric Lindvall and Penn
  • dwi_1106e
  • Five Runs, Eric Schank, Lauren Sell and Brian Gugliemetti
  • Bruce Williams
  • Kelly Fowler and Penn
  • fiveruns-penn-and-teller-11
  • fiveruns-penn-and-teller-3

See more FiveRuns tagged photos…

Previous Entries

Other Categories

Entries are also organized under the following general topic categories.