There hasn’t been a lot of visible change in the RM-Manage service over the last few months for one reason: we’re focused on solving a few major pain points. One of those pain points is scaling our usage of ActiveRecord to an order of magnitude farther than where we are today. At some point with one database you run out of headroom – and it doesn’t help that our access pattern is write-heavy. The FiveRuns client collects metric data for the various software moving parts of your server (MySQL, Apache, Linux, Rails, etc) and uploads that data to our service every 5 minutes. Now multiply that by thousands of machines!
We spent a good amount of time over the last few months evaluating various alternatives to an RDBMS for metric storage including RRDtool and BerkeleyDB. I wanted to give you a rundown of my thoughts on them so you can understand why we decided to stick with MySQL and ActiveRecord.
RRDtool is lightweight, extremely fast and perfect for when you want to graph a sliding window of metric values over time. The problem comes when you consider the other aspects of our service: context is critical for our Application Browser. If you’ll look at the screen shots in a previous blog post I prepared on Solving Rails Performance Bottlenecks with RM-Manage, you’ll see that we show how your Rails application uses Models within a Controller Action. This context is absolutely important to understand how your application works but RRDtool does not have any concept of context. Building that concept of context and the ability to query metric data with arbitrary contextual parameters would be no small task. Additionally RRDtool is not change-friendly. If we wanted to add a new metric, we’d need to build a layer on top of RRDtool which provided context, versioning and migration.
BerkeleyDB is essentially a toolkit for building your own custom datastore. There’s no query language – you build an API to access your datastore and implement the “query” directly in code by joining together and iterating through BDB’s tree and hash structures. I spent two weeks learning BDB and implementing a metricstore which could power the Application Browser functionality (as pictured in the link above). Like RRDtool, migrations and contextual data would be difficult to maintain. Unlike a DBMS, BDB has no concept of schema so you can’t just add a column – a row is just a blob of bytes. If we wanted flexible and efficient storage, we would need to build a notion of schema on top of BDB. Additionally the RM-Manage service allows you to build custom rules to detect anomalies and generate events. These rules query the metricstore and so we’d need to build some sort of dynamic query facility on top of BDB.
While I evaluated BDB, my coworker Brian was attacking the ActiveRecord/MySQL version of the metricstore. We competed for a two week development cycle by building the 5-6 operations needed by the metricstore and found some interesting results. Nothing beats BDB when it comes to blasting huge volumes of data into the database: it was approximately an order of magnitude faster. But MySQL was quite space-efficient, querying was about the same speed and deletes were actually much faster than BDB.
In the end, we decided that the traditional RDBMS offered too many pros to be ignored. As a small company, we cannot afford to ignore the flexibility and ease of use it offers. Once we decided to stick with MySQL, we needed to figure out how to scale MySQL to fit our anticipated needs over the next year. Stay tuned for my next blog post where I will go into details about the architecture we came up with and some of the tasty libraries I wrote to fit Ruby and Rails into that architecture.















Continued Discussion
No comments have been added yet.