iprog.com

Benchmarking GeoIP gems for Ruby

On a recent project, I had need again to use GeoIP (by the lovely people at MaxMind). I decided to take some time to benchmark a handful of various GeoIP gems, both for GeoIP v1 (legacy) and the newer GeoIP v2.

I decided to benchmark the extraction of 3 pieces of data, using the GeoIP Lite City database:

Entries are labeled according to the installed gem name. Gems with “1” use the v1 GeoIP database. Those with “2” use the v2 database.

Gems labeled with “rb” are pure ruby. Those with “c” have C extensions.

3 of the tested gems do not return a timezone. Those 3 libraries turned out to be the 3 fastest, strongly suggesting that retrieving a more complete set of data from the GeoIP dataset does come with increased overhead. These 3 gems are labeled with “-tz” in the results below.

Lastly, the geoip gem has the option of preloading the entire GeoIP database into memory. It was benchmarked without and with preloading.

Without further ado, here are the results, based on 100,000 iterations (ordered by real time):

 1user     system      total        real
 2geoip-c (1;c) -tz           0.520000   0.010000   0.530000 (  0.533592)
 3geoip2_compat (2;c) -tz     0.520000   0.020000   0.540000 (  0.534711)
 4mmdb (2;c) -tz              0.700000   0.010000   0.710000 (  0.725789)
 5hive_geoip2 (2;c)           3.470000   0.010000   3.480000 (  3.495051)
 6geoip (1;rb;preload)        3.580000   0.000000   3.580000 (  3.583469)
 7geoip (1;rb)                4.780000   1.740000   6.520000 (  6.542248)
 8maxmind_geoip2 (2;c)       12.420000   4.670000  17.090000 ( 17.092035)
 9maxminddb (2;rb)           27.730000   0.030000  27.760000 ( 27.781552)

As noted earlier, the 3 fastest libraries (geoip2_compat, geoip-c, and mmdb) all have C extensions and all skip timezone loading.

The next fastest, and also the overall fastest to return a full response from the database, is hive_geoip2. The API for this gem is rather basic, but it’s arguably a very worthwhile tradeoff given its high performance.

Right behind it is the preload-flavor of the pure Ruby geoip gem, which is quite impressive. Equally impressive is that the non-preload flavor of the same gem comes in right behind, taking only about twice as long to run.

maxmind_geoip2 is just slow. Digging into the code, it appears to open and close the database file descriptor on every lookup, resulting in a substantial performance penalty.

maxminddb has a friendly API to work with, but I wonder if that’s perhaps the cause of its slower performance. That said, unless you’re processing a rather high rate of GeoIP lookups, this one is still worth a look.

If you want a pure ruby solution, both geoip for GeoIP v1 and maxminddb for GeoIP v2 are both viable for modest volumes of lookups.

For higher volumes, definitely use a gem with a C extension. If you need timezones, then hive_geoip2 is your best bet. If not, then you have several choices. Do remember that you’ll also need the underlying C library and headers installed before installing the gem. That’ll be libgeoip for v1 and libmaxminddb for v2.

I’ve published the benchmark code if you’re interested. There are a couple of gem namespace conflicts. To run the benchmarks, set PASS==1 first and install the matching gems. Then set PASS==2 and install those gems, being careful to uninstall geoip. If you want to return to pass 1, uninstall geoip-c.

tags: ruby, geoip, benchmark