Ruby on Rails Performance Series: Intro and all about YAML

Lately I’ve been doing a lot of API type work in Rails — both client and server. It mostly uses Rails’ default XML generation and consumption.

Along the way I’ve discovered that the default performance of XML and related things on Ruby and Rails is, shall we say, less than amazing. On the bright side, there are a number of ways to improve matters.

So, I’m going to do a series on improving the performance of Rails’ API-related stack.

I’ll start off with serialized columns in ActiveRecord.

Serialized columns are a way to store arbitrary information into a single TEXT column in your database. They probably shouldn’t be used too often, but they are useful at times.

YAML is the storage format for these serialized columns. They are defined like this:

1serialize :some_column, Hash

Easy enough. The column 1some_column will be a native Ruby hash and yet will be serialized into a single TEXT column in the database. As is so often the case with Rails, it just works.

The serialization format is YAML. If you use Rails’ built-in 1to_xml methods to turn your ActiveRecord object into XML, it will also use YAML to serialize that particular column into a string and then pass that string to the XML generator.

This is at least consistent and works quite well. It’s just slow.

The core problem is that YAML does not perform particularly well. There are others online who have documented this in sufficient detail so I won’t do so here.

If your use of objects with serialized columns, either directly to the DB or via an API is low volume, the performance doesn’t particularly matter. On the other hand, it may be a hidden performance drain if you have a higher level of activity.

In my tests, converting a Hash to YAML takes between 4 and 5 times as long as converting YAML back to a Hash.

In the case I’ve been working on, the database records are read often but written rarely. However, API traffic is much higher and performance there is critical.

Because, performance at the DB layer isn’t a big bottleneck here based on actual usage patterns, the decision was to leave the database records alone and continue to use the default serialization mechanism there.

However, using YAML inside the API’s XML output had to go. Some API responses have 100 or more individual instances of the hashes that are currently being serialized with YAML. The next question was, what does it get changed to?

There are a few options: a) JSON, b) Marshal, c) nested XML, and d) some other custom serialization.

Each has pros and cons. Interestingly, all are faster than YAML. JSON is around 3x faster but doesn’t have quite the richness of object mapping. Marshal is really fast (about 9x over YAML) but has an almost binary-like output and isn’t very portable. Nested XML may or may not map into something desirable on the client side and may not be much faster. Custom formats may also be limited in object representation and take the most work to implement, both client and server.

If you had a simple array of strings, a custom format like “one,two,three” would be fairly straight forward. Complex objects, like entire referenced models would make this unworkable though.

So what’d I end up using? In this case, JSON. It’s much faster, easy to work with, and the data that needs to be transmitted is in a very simply format: all key to numeric value pairs.

Unfortunately, this can be a bit messy in both the client and server (although not difficult).

On the server, the attribute needs to be converted to JSON before 1to_xml sees it. The decision was to create an alternate version of the attribute (“options”) that was already in JSON.

1def options_json
2  options.to_json

Then you’ll need a couple options added to 1to_xml:

1to_xml(:methods=>:options_json, :except=>:options)

This tells 1to_xml to skip the standard options rendering and include your custom method.

On the client side, in your model class inherited from ActiveResource::Base, add:

1def options
2  @options ||= ActiveSupport::JSON.decode(attributes["options_json"] || "{}")

As noted above, some API response here had dozens of objects that needed to be serialized. Moving to JSON over YAML reduced those API calls by about 25% (roughly 4 seconds to 3 seconds).

tags: rails, ruby, activeresource, performance