Welcome to the Rails TakeFive interview series, where we have a virtual sit down with leading developers from the community and bring you their insights into all things Ruby and Rails.
This week, we're bringing back a popular theme and hosting a Campfire chat between FiveRuns developer, Rubyist and Raconteur Adam Keys and Glenn Vanderburg, noted Rubyist, speaker, author, and the Chief Scientist at Relevance.
In this chat, Glenn and Adam talk about testing, the Ruby Community, framework philosophies, VM implementations, and more. Let's get started!
| Adam K. | I've got some Ruby press-junket style questions to open up with. For the Glenn Vanderburg fan-club out there somewhere. |
| Adam K. | What is your preferred flavor of testing? Test/Unit, RSpec, Shoulda, etc.? |
| Glenn V. | I'm comfortable with all three of those, although I'm eager to start a new project and get more experience with Shoulda. I remain a little wary of RSpec ... it's a great framework, and seems to really help people new to TDD to understand how the process works. However, it's too big and complex for me to feel comfortable relying on it in such a foundational role in the process. I prefer my testing tools to be "so simple there are obviously no deficiencies," to use Hoare's phrase. |
| Glenn V. | That said, I like what the RSpec guys have been doing recently. Cucumber is nice, and I'm glad that RSpec now plays nicely with test/unit-style assertions. I find the test structuring facilities of RSpec to be quite useful, but often I find that assertions are better for expressing the conditions than "should". |
| Adam K. | Yeah, Cucumber is looking neat. We're playing with it a little here as we're adding more acceptance tests. |
| Adam K. | What is your preferred approach to testing? Do you emphasize unit tests, story tests, acceptance tests or do you prefer some hybrid approach? |
| Glenn V. | A hybrid approach, but with most of the emphasis on unit tests. I try not to be dogmatic about test coverage at every level or things like that. One of my colleagues at Relevance, Jason Rudolph, has been blogging and speaking recently about "How to fail with 100% test coverage," and I really think that's an important point. Testing and test coverage are about managing risk, and managing risk is always a cost/benefit proposition. Unit tests (especially if written in a TDD style) give you a huge benefit for essentially no cost. As you move up the testing chain, though, tests get more expensive and also (in most cases) provide less benefit. |
| Glenn V. | That's not an excuse to not do acceptance tests and things like that. But I like to try to focus that on where the areas of complexity are, or where the errors seem to be appearing. If you let real, known errors drive where you build more costly tests, you know for certain that they're worth the cost. |
| Glenn V. | In web applications, acceptance-level testing is always a problem. We're always looking for better ways to test JavaScript, for example. But at the moment, there doesn't appear to be a good way -- just ways that have different assortments of strengths and weaknesses. |
| Adam K. | That's a great idea. I was listening to the Agile Toolkit podcast recently and the interviewee was saying that he found the value of unit tests falling away as he adopted a very different process that de-emphasizes traditional sprint planning. But it seems that would make it harder to refactor your code, if you lose the confidence that unit tests give you. |
| Adam K. | Testing JavaScript is an tricky topic. What's your take on the current state of the art there? |
| Glenn V. | Both of those points are true, but your response gets at the heart of the value of unit tests, a value that many people overlook: they are for way more than just guaranteeing correctness. Arguably, acceptance and integration tests validate the same logic that unit tests do. But you only get the design benefit from unit tests, and only unit tests are detailed enough to really help you through redesigns and refactoring. |
| Glenn V. | The current state of JavaScript testing is that there are a whole bunch of tools available, but they're all severely limited in different ways. That means that, using any one tool, you can only really test part of your system's JavaScript code if you're doing anything reasonably sophisticated in JavaScript. Of course, with a higher-level acceptance testing tool like Selenium, I suppose you can test pretty much everything (because it just verifies changes at the DOM level), but that's an acceptance test rather than a unit test, with all of the disadvantages I've already mentioned. |
| Adam K. | So when it comes to JavaScript testing, do you take a unit testing or acceptance testing approach? |
| Glenn V. | JavaScript as a *language* is very testable. The core problem is that JavaScript is all about manipulating the browser, and so testing outside the browser is essentially cheating. But the browser gets in the way. |
| Glenn V. | I've found that each project requires hard thinking about the best approach to JavaScript testing, which is part of the problem. Depending on how sophisticated your JavaScript functionality is, you may be able to do effective unit testing. But it usually has to be a hybrid, and it usually costs the project way more time and effort than testing other parts of the code. |
| Adam K. | I've struggled with that cost as well. I wonder if the challenge is conceptual or technical? |
| Adam K. | Conceptual in that something like Rails says "stick these kinds of tests in here and those kinds of tests over there", whereas with JavaScript it's a bit more of an unexplored frontier. |
| Glenn V. | There's definitely something to that. I love the idea of the Rails JavaScript helpers; there are some parts of the JavaScript in my system that I can trust in the same way I trust that ActiveRecord just works for most things. But we're really still figuring out the best way of using JavaScript in real applications, and the helpers can't cover everything. (That's the same reason I don't think things like GWT are the answer ... it's still way too early in the game to begin limiting our options in the browser with tools like that.) |
| Adam K. | With JavaScript going through its second or third renaissance right now, maybe we can hope for a JavaScript testing renaissance to piggyback along with it. |
| Adam K. | Shifting subjects a little bit, let's talk about the Ruby community and web applications. |
| Adam K. | The community seems to have arrived at a similar point to where Java was about five years ago. Rails has a great community and momentum and other frameworks are growing out of their niches. What can we as a community learn from how things shook out in Java-land? |
| Adam K. | (I should mention that Glenn wrote one of the first serious tomes for Java and is pretty familiar with more than one Java framework. He's been around the block, so to speak.) |
| Glenn V. | I've been thinking about that a lot. For the past few years, Ruby has felt a lot like Java did in the early days ... excitement and rapid growth, and now we're close to the point where Java started becoming very enterprise-ish and overly complicated. But I'm not very worried about Ruby taking the same path. Part of the problem was that Java was being pushed and steered by a big corporate entity with its own agendas. That's not the case with Ruby. Sure, there are big (and small) companies who have a lot riding on Ruby's (and Rails') success and continued growth. But none of them are to Ruby what Sun was to Java, and I think we're too smart as a community to let someone try to take on that role. Sure, offer support and new facilities, we'll all welcome that, *especially* if there's competition. But I don't think the core of the Ruby community will ever be a tightly orchestrated event like JavaOne, or a process like the JCP. And that's very good for us. |
| Glenn V. | I'm just a *little* worried about how big the Ruby core library has become in 1.9. The growth rate is similar to that of the Java core libraries, which we deride as bloat. But most of the extra complexity is things like the character set support and complex numbers, and those aren't being driven by enterprise agendas. Those smell to me like things that were added because the community wanted them -- perhaps just a small corner of the community, but the community nonetheless. |
| Adam K. | What about philosophies? For example, Rails is opinionated, Merb is agnostic, Mack is distributed and Sinatra is slim. ActiveRecord is just enough to make a database tolerable, DataMapper is a full-blown ORM and Sequel is a thin layer over SQL. Is there a good way to deal with a diffusion of ideas like that? |
| Glenn V. | I think the way we're dealing with it is just fine. To some degree the market has spoken loudly, and the answer is "opinionated, 90%, modular, with plugins". There's a small percentage of the community that's really dissatisfied with some aspect of Rails, and that's a good thing ... but most still think it's a huge advance, and it'll take a lot more than 20% better along some axis to supplant it. But the dissenters are really valuable! It's good to have alternatives, and those ideas are being adopted by Rails where it makes sense. Certainly many ideas from Merb are already making their way into Rails, and ActiveRecord may soon see better SQL handling similar to what we see in Sequel. (And, for the record: Sequel is way more than just a thin layer over SQL, unfortunately. :-) |
| Adam K. | I look forward to the cross-pollination of ideas. It never hurt nobody! |
| Adam K. | OK, let's talk about virtual machines that support dynamic languages. You did a fantastic talk at RubyConf on how its quite possible to make dynamic language runtimes competitive in terms of speed. |
| Adam K. | For those who couldn't attend your talk, how does one make a dynamic language VM fast, in a nutshell? |
| Glenn V. | Thanks! Glad you enjoyed it. |
| Glenn V. | Here's the short, unhelpful answer: optimize a bunch of stuff. :-) Which sounds flip and snide, but that's kind of what YARV and Rubinius are doing in the short term. Preprocess source into bytecodes, optimize method dispatch (especially by caching the results of the last dispatch at each call site), etc. |
| Glenn V. | The big, relatively easy thing is to improve garbage collection. Garbage collection can be really fast, lots faster than manual memory management if you do a good job of it. A good generational garbage collector (like, for example, the one in the HotSpot JVM that JRuby gets to take advantage of) can make object allocation super-fast, and collection of short-lived object essentially free. Since nearly all objects in real systems are short-lived, that's huge. |
| Glenn V. | The holy grail, so to speak, is a full, dynamically optimizing VM with what's called "type feedback". Again, this is what HotSpot does, but it's optimized for Java's characteristics, so I think there's a limit to how much it can help JRuby. (I'd love to be wrong about that!) And I suspect Gemstone Smalltalk does similar things, and since Ruby and Smalltalk have such similar execution models, MagLev can gain a big boost. But I'd love to see a good optimizing VM designed especially for Ruby, and I hope that's what Rubinius and/or YARV will turn into. And Rubinius seems to have the best foundation for it. |
| Adam K. | Indeed. For anyone who couldn't make it to RubyConf, I'd highly recommend checking out the Rubinius and YARV presentations after watching yours. |
| Adam K. | What was the most surprising thing you found in researching VMs for dynamic languages? |
| Glenn V. | A dynamic VM gathers statistics about the program at runtime and does aggressive optimizations based on those statistics, aggressively inlining methods, optimizing the large methods that result, and compiling to native machine code on the fly. In a language as dynamic as Ruby, such optimizations are necessarily speculative -- any number of changes might invalidate the assumptions made during the optimization. But the VM can keep track of that and fall back to the bytecode version if necessary. But nearly all such changes will happen while the system is starting, way before the optimizations are done, so the advantages gained from the super-optimized code will far outweigh the cost of the occasional optimized method that must be thrown away. |
| Glenn V. | The most surprising thing was the huge impact of inlining. I'd heard a lot of talks about this kind of thing in the Java world, and given a few of them, but when discussing inlining, I'd only ever seen toy examples. |
| Glenn V. | So ... |
| Glenn V. | for this talk I decided to see if I could show something real. I started with a simple Rails index action generated from scaffolding, and started manually doing inlining. I set some rules: I would inline any method I encountered that was less than 10 lines long, as long as I could convince myself that the compiler could know the correct method implementation to choose for inlining. And when I encountered calls to Rails core classes, I used Rubinius' Ruby implementation of the Rails core library, rather than the MRI C versions. |
| Glenn V. | Well, I almost instantly got to the point where essentially every method call was on an object that was created in the same scope. It didn't take long for the initial six-line method to balloon to over 180 lines. And that's just the point at which I got bored and gave up ... I'm confident if I'd continued the process it would have grown to over 500 lines, and that's a conservative estimate. |
| Glenn V. | An optimizing compiler has problems with little tiny methods, but a big block like that gives it a lot to work with. There was a lot of dead code that could be eliminated, a lot of common subexpressions, loops that could be unrolled, etc. I was amazed at how effective it was. |
| Adam K. | I thought that was a great example of how Ruby really *could* get a lot faster. How do you think the Ruby implementation space will pan out in the next 12-18 months? |
| Glenn V. | A couple things are no-brainers, I'd say. I think MagLev will ship in that time, and it'll be blazingly fast for people who are willing to pay for speed. I also think JRuby will make some real performance gains; the JRuby team know what needs to be done, and for the first time Sun is planning on changing the JVM in ways that are aimed specifically at dynamic language performance. (I sincerely hope Sun's current troubles don't hurt that effort.)
|
| Glenn V. | YARV will continue to get more stable, but not a lot faster. As far as I can tell, the YARV implementation is too closely tied to the existing C implementation of the core classes, which will get in the way. |
| Glenn V. | The interesting one is Rubinius. There's been a perception of trouble with Rubinius lately, because progress seemed to stall while Evan rewrote the core VM in C++. (And "We're gonna start over and rewrite!" is a classic warning sign for a project.) But I'm convince the rewrite was the correct thing to do, and they're almost back to the point where they were before the rewrite. I think they'll start making rapid progress. I don't think Rubinius will be the fastest VM in the next 18 months, but I think it might be competitive by then, and with a lot of headroom for further improvement. |
| Adam K. | Let's dive into the statement that being "too closely tied to the existing C implementation" would reduce the opportunities to make YARV faster. Intuitively, porting code to C makes it faster. What is different about optimizing VMs? |
| Glenn V. | A lot of things about good dynamic VMs are counterintuitive. For example, you'd think by generating more garbage, you'd make more work for the garbage collector. But modern garbage collectors almost never touch garbage; they only deal with live objects. So by trying to avoid making garbage, you might be keeping more objects live for longer times, and *that's* what make the GC's life difficult. |
| Glenn V. | To understand why C code can be slower, think back to my inlining example. The inlining process can't see inside the C functions, so inlining has to stop there. From one standpoint, that might not seem like a problem; that code's already pretty fast. But function-call overhead is still costly, and modern pipelined CPUs work best with long code blocks. Furthermore, since those functions can't be inlined with the rest of the code, they can't be optimized together. And that really limits the options for optimization. Here's an example from my talk: |
| Adam K. | That's intriguing. So how would someone interested in making some of these ideas reality in Ruby get started? |
| Glenn V. | Here's the original method: |
| Glenn V. | View paste
|
| Glenn V. | after a little inlining, you get to this: |
| Glenn V. | View paste
|
| Glenn V. | (That's just the first line of the original, after inlining find.) |
| Glenn V. | If the VM can inline the creation of [:all], and args.first, and if it knows about VM primitives, it can eventually eliminate that whole case statement and get down to this: |
| Glenn V. | View paste
|
| Glenn V. | And if it keeps going, it can optimize away all uses of that args array, which means it can avoid ever creating it. That's a lot of optimization. But to do that, especially in a language as dynamic as Ruby, it has to be able to see inside the core class methods. |
| Adam K. | And core classes written in C are opaque to the runtime? |
| Glenn V. | Yes, they are. |
| Adam K. | Gotcha. |
| Glenn V. | How to get started? Well, there are a lot of interesting papers about the Self project, where these techniques originated. But we know now that not all of those techniques are good ideas; they have stiff costs in memory usage, for one thing. |
| Glenn V. | The first commercial project to use these ideas was a Smalltalk implementation called Strongtalk, and that eventually turned into Java HotSpot. The source to both systems is now available, but unfortunately they're very complicated and difficult to understand from the source, I think. |
| Glenn V. | Perhaps the best strategy, after reading the Self papers, is to take advantage of a rare opportunity. Three separate open-source efforts are competing to build the fastest JavaScript VM, and we get to watch. |
| Adam K. | (Hence the aforementioned JavaScript renaissance!) |
| Glenn V. | Google's v8, Mozilla's TraceMonkey, and Apple's SquirrelFish VMs are all using dynamic language VM ideas of various stripes. It's fun to watch, and systems being actively developed in the open are always easier to understand than mature systems whose code just gets dumped onto the world after the fact. So one great way to learn would be to start following one or more of those projects, and maybe even get involved. Once you've gained a little knowledge, you might be able to see how you could contribute to JRuby or IronRuby to help them play more effectively with the JVM and DLR, or you may be able to go deeper and write (for example) a native code generator for Rubinius. |
| Adam K. | Where would one get started with one of these projects? Lurking on the mailing lists, perusing the code, looking in the bug tracker? |
| Glenn V. | All of the above! The same way you get started with any open-source project. One good strategy is to start by improving the test suite. |
| Adam K. | Duly noted. So, to wrap things up, the folks at Relevance are working on a new app called RunCodeRun. Its hosted continuous integration for open source projects. What's the story there? |
| Glenn V. | This all sounds like rocket science, but the basic ideas are fairly simple ... it's just putting it all together that gets complicated. The lead developer of Google's V8 project, Lars Bak, told me that they went from a standing start to the level of performance they had on release day in 3 months, with a team of 3 developers. Of course, they all had experience on similar projects, and after that there was a lot of work left to integrate with the DOM and fix compatibility bugs. But the basics were not too hard. |
| Glenn V. | We realized that continuous integration is a big hole in how a lot of teams implement agile development, and part of the reason is that there wasn't a hosted solution. So we decided to provide one. Hosting CI is a challenge -- we're taking other peoples' code and running it on our machines, for one thing, and dependency management is also hard. But after investigating it, we felt up to the challenge, especially if we aimed at a niche to begin with. |
| Glenn V. | RunCodeRun is up today at http://runcoderun.com/, and it supports open-source Ruby and Rails projects hosted on GitHub. It's still in beta, but let us know if you want to try it out. We're preparing to support private projects as well, and plan to launch that support soon. We're really pleased by the level of interest; a lot of people are very excited about the possibility of a managed, hosted continuous integration server for their projects. |
| Glenn V. | Everyone at Relevance has contributed, but Rob Sanheim has been the lead on the project, and he's done an awesome job. |
| Adam K. | Its a really great idea, I'm hoping to see lots of projects on it. Avoiding the fits-and-starts the Rails project has gone through with CI will prove very valuable to the community. |
| Adam K. | Anything else exciting we should look for from Relevance? |
| Glenn V. | I agree completely. Source control, issue tracking, and other parts of the process have been easy for a while now, because of good hosted services. We want to add CI to that list. |
| Glenn V. | Well, we're always doing interesting things on projects, and we try to speak frequently about them at conferences. We do project work, of course, and we also stay fairly busy doing security audits and code audits for existing projects. Finally, we're quite active in open-source development as well. |
| Adam K. | Awesome. Thanks for your time! |
| Glenn V. | My pleasure! |
Glenn Vanderburg has more than 20 years of experience developing software across a wide range of domains, and using a variety of tools and technologies. Glenn is always searching for ways to improve the state of software development, and was an early adopter and proponent of Ruby, Rails, and agile practices. He is also Chief Scientist at Relevance.





























