FiveRuns Blog

On Rails production performance and monitoring

Posts
14 comments

Introducing DataFabric

One of the lingering issues we’ve had to deal with over the last year in the Manage service is ActiveRecord’s reluctance to talk to more than one database. It’s really quite stubborn in that regard and while there are a few solutions out there, none of them worked quite like we wanted. Some required intrusive application-level code changes, some didn’t offer the features we needed, so we bit the bullet and wrote what we needed.

Specifically we needed two features to scale our mysql database: application-level sharding and master/slave replication. Sharding is the process of splitting a dataset across many independent databases. This often happens based on geographical region (e.g. craigslist) or user account (e.g. flickr). Replication provides a near-real-time copy of a database which can be used for fault tolerance and to reduce load on the master node. Combined, you get a scalable database solution which does not require huge hardware to scale to huge volumes. DataFabric extends ActiveRecord’s standard connection handling to provide these two features.

To install, invoke the usual magic:

gem install data_fabric

Add DataFabric to your Rails 2.1 gems listing in config/environment.rb:

config.gem "data_fabric"

Annotate your sharded models with your desired sharding and replication setup:

class Auction < ActiveRecord::Base
  data_fabric :shard_by => :city, :replicated => true

Let’s assume we are sharding based on the city associated with the request (i.e. the Craigslist model). You’ll need to add the necessary database connections to your config/database.yml for each city based on the naming convention DataFabric uses. See the README for details.

Finally your application controller will activate the city’s shard for each request:

class ApplicationController < ActionController::Base
  around_filter :select_shard

  private
  def select_shard(&block)
    DataFabric.activate_shard(:city => @current_city, &block)
  end
end

Now, whenever you do anything with the Auction model, it will only affect the current shard. Auction.find(:all) will find all auctions within that shard. The converse is also true: you can’t do anything with the Auction model until you set a shard. Note that you can just set the replicated flag without the shard_by flag; DataFabric will act just like Rick Olson’s Masochism plugin.

We’re releasing DataFabric on github for others to use as they see fit. Take a look at the README on github for technical details and code samples. We’ve used it successfully with ActiveRecord 2.0.2 and 2.1. There are some areas which can be painful to deal with, notably migrations and fixtures, but we have both working in production here so you can overcome. :-) I’ll give you a hint: the example application might help.

Good luck and let us know what you think!

Bookmark and Share
Continued Discussion

14 responses to this entry

Great! I’m browsing source code now…

Jacek Becela Jacek Becela said:

on July 09, 2008 at 06:31 PM

Great idea. I’m going to dev something up with this.

Cody Cody said:

on July 09, 2008 at 07:38 PM

Beautiful.

ActsAsFlinn ActsAsFlinn said:

on July 09, 2008 at 10:49 PM

hey this looks brilliant! This looks like it will work perfectly with an data-heavy app I’m building. Great work! Can’t wait to try it.

Mike Subelsky Mike Subelsky said:

on July 10, 2008 at 09:23 AM

Finally! I’m so glad someone did this. Perfect timing as we are just now working on the sharding strategy for OtherInbox.com

Joshua Baer Joshua Baer said:

on July 10, 2008 at 09:24 AM

Thanks guys. It’s been in the making for a few months now; what a relief to have it out in the wild and appreciated!

Mike Perham Mike Perham said:

on July 10, 2008 at 11:35 AM

Is it possible for it to randomly pull data from one server for when you are not sharding?

Also is it possible to specify a list as in.. Try this database, if it doesn’t work try that one and down the line.

Thanks.

malcontent malcontent said:

on July 10, 2008 at 05:15 PM

If you are not sharding, the system will pull data from the single database you have defined. Use this in our code:

data_fabric :shard_by => :metrics, :replicated => true if production

So in development and test, there’s no special database configuration at all. You would just have your typical development database with all the tables.

Mike Perham Mike Perham said:

on July 12, 2008 at 11:22 AM

That should be “[We] use this…”

And production is a method which just does “RAILS_ENV == ‘production’”

Mike Perham Mike Perham said:

on July 12, 2008 at 11:25 AM

To take a stab at Mike’s question re selecting a random shard … I haven’t tried this directly, but surely it’s just a case of implementing a method that simply returns a random shard from a specified list?

David David said:

on July 14, 2008 at 02:41 AM

Have you looked at implementing the Flickr sharding model where rows with foreign keys in a different shard are replicated in both shards so you never have to do joins between databases?

http://highscalability.com/flickr-architecture

Sam Sam said:

on July 21, 2008 at 04:51 PM

If you were to implement this with PostgreSQL, which replication method would you recommend? Anybody used this plugin with replicated Pg?

Here’s a list of Pg replication methods http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling

Thanks! Chirag

Chirag Patel Chirag Patel said:

on July 28, 2008 at 01:19 AM

Very nice!!

acedayVem acedayVem said:

on August 02, 2008 at 05:26 PM

Hi, Sounds interesting – if you have the urge, could you blog about many advantages over SQLRelay? Admittedly and SQLRelay adapter would need to be written…

Cheers

mark mark said:

on August 07, 2008 at 03:31 AM

Contribute

Continue the conversation and share your thoughts. A name is required. Your e-mail address will not be displayed on the site. Textile formatting may be used in your comments (but will not be rendered in the live comment preview).

→ Posted by You on September 04, 2008 at 02:01 PM

Flickr

FiveRuns tagged photos on Flickr.

  • IMG_4692
  • IMG_4686
  • IMG_4685
  • IMG_4660
  • FiveRuns Booth
  • IMG_4684
  • FiveRuns Booth
  • FiveRuns Booth

See more FiveRuns tagged photos…

Previous Entries

Other Categories

Entries are also organized under the following general topic categories.