Profiling Rails Applications with Flamegraphs 🔥

How we used Flame Graphs to profile our Rails backend code in our development environment and thus fixed slow, suboptimal code faster.

The Context

As part of our 2022 Q4 readiness goals, we were working on optimising one of our endpoints which had started spiking in latency and consequently leading to response times of > 4 seconds.

This increased latency resulted in the real world impact of our riders intermittently receiving 503 errors when they wanted to switch zones in Deliveroo’s rider app.


The rider app view when riders tried to switch zones
503 errors in the Rider API due to a slow Rails API

Our initial investigations did not point to any obvious bottlenecks, such as the database or the EC2 instance slowing down due to increased traffic, etc.

Neither was the data being queried via the API too large and nor was the code in question new. (It was untouched for the past 5 years).

For some reason the response time for this API was quite big and a large chunk of time spent was in Rails.


Datadog flame graph view for a slow request. A chunk of the time was spent in Rails application code. Datadog narrowed down the problem to time spent in the Rails App

Thus, there was a need to further isolate different bits of the code to see where the problem lay. Thankfully Datadog had a neat little feature which helped - Spans.

We deployed our code with a few Datadog spans and narrowed the issue down to the below piece of non-performant code.

serializer = CollectionSerializer.new(collection, index_params, resource_name: :zone)
output = serializer.serializable_hash
render json: output

Datadog was unable to provide more fine tuned insights. It only helped us conclude that most of the time was spent in this call : serializer.serializable_hash. That serialiser in question was responsible for packaging our object in a HATEOS compatible hash.


Datadog time breakdown Adding Datadog spans in the code and deploying them helped, but it was a slow process

Since adding further Datadog spans was a slow and frictional process, we resorted to Flame Graphs and local Rspec testing to recreate and isolate the problem.

The Problem

We wanted a quick way to stress test a piece of Rails application code in our local development environment and get feedback on our improvements without relying on production code changes, deployments or Datadog.

The Solution

ruby-prof-flamegraph - A ruby gem which helps analyse bottlenecks in our code and helps generate an easy to use Flamegraph.

Datadog time breakdown The Flamegraph generated by ruby-prof-flamegraph

Usage :

Simply add the gem to your Gemfile

group :development do
  gem 'ruby-prof-flamegraph'
end

Wrap the code you want to analyse in this block below

profile = RubyProf.profile do
  hash = collection1.serializable_hash
end

The block generates a fold stack file which can be fed into Flamegraph to generate an interactive SVG file to visualise the graph.

File.open("stack-file", "w+") do |file|
  RubyProf::FlameGraphPrinter.new(profile).print(file)
end

Using Flamegraph to generate the SVG graph

cat stack-file |  ~/GitHub/FlameGraph/flamegraph.pl --countname=ms --width=1600 > flame-graph-testing.svg

The Impact

The flame graph analysis helped us conclude that our serialiser was spending too much time initialising redundant objects. We removed the generation of redundant objects and achieved a 60% reduction in latency.

Even though the data being queried and packaged into the hash was < 1000 records, the process of packaging was not optimal and led to the initialisation of an exponential number of objects which caused the slowdown.

Datadog time breakdown
A visible reduction of latency could be seen since the fix was deployed, thanks to quick Flame Graph profiling