~/posts/when-ruby-graphql-requests-are-slow

When Ruby GraphQL requests get slow

Ruby GraphQL is unusually good at hiding allocation problems behind code that looks harmless. You can fix the obvious N+1 issues and other database gotchas and after profiling, still see a bunch of ugly p99 spikes. Often the root of slow queries isn’t the query at all, it’s due to how requests allocate a pile of short-lived objects that force Ruby’s GC to stop and clean up at exactly the wrong time.

The real problem

A single field in GraphQL-Ruby can touch some or all of the following:

  1. AST objects,
  2. Authorization checks,
  3. Field wrappers,
  4. Application models,
  5. Presenters,
  6. and JSON hashes

All before anything leaves the process.

None of that has to be individually slow to become expensive as a whole. The real problem is the object churn in aggregate. GraphQL-Ruby wraps a lot of structure around your business logic, and your application code adds another layer on top. If you have some list field that returns 500 nodes, that could also mean you’re instantiating 500 ActiveRecord objects, then 500 GraphQL objects, then finally a bunch of nested arrays and strings while resolving the child fields as well.

The difficult part with debugging is that short-lived garbage doesn’t leave a nice signature you can point to. If each resolver is just a little wasteful, no single method dominates the time profile. The request just slowly creates enough heap pressure that the tail gets ugly. Each line item seems fine, but the whole request shape is not.

Profiling the Schema

The question you want to answer isn’t “which method was hottest” but “which code path allocated the most stuff while this exact GraphQL operation ran.”

MemoryProfiler is a great tool for this. It gives you allocation counts and retained-memory summaries in a form you can make sense of easily. This isn’t something you want to leave on for normal traffic. Allocation profiling has serious overhead. The good thing is you can add explicit opt-in and authorization checks before any diagnostic data comes back on a response.

Here’s an example for a GraphQL controller. It wraps execution only when an admin asks for it, runs MemoryProfiler.report around the real schema call, and appends a small summary to extensions.

class GraphqlController < ApplicationController
def execute
result = if should_profile_allocations?
profile_allocations
else
AppSchema.execute(
params[:query],
variables: ensure_hash(params[:variables]),
context: graphql_context,
operation_name: params[:operationName]
)
end
render json: result
end
private
def should_profile_allocations?
current_user.admin? && params[:profile_allocations].present?
end
def profile_allocations
query = params[:query]
variables = ensure_hash(params[:variables])
context = graphql_context
operation_name = params[:operationName]
graphql_result = nil
report = MemoryProfiler.report do
graphql_result = AppSchema.execute(
query,
variables: variables,
context: context,
operation_name: operation_name
)
end
graphql_result[:extensions] ||= {}
graphql_result[:extensions][:allocation_profile] = {
total_allocated: report.total_allocated,
total_retained: report.total_retained,
allocated_memory: report.total_allocated_memsize,
retained_memory: report.total_retained_memsize,
top_locations: report.allocated_memory_by_location.first(10),
top_classes: report.allocated_objects_by_class.first(10)
}
graphql_result
end
end

The full report can expose internals you don’t want floating around outside a debugging session, so keep the payload small. A summary in extensions is usually enough to point you at the hottest locations and object classes.

Making sense of the report

Start with total allocated objects and total allocated memory for the request. Then look at retained memory, but treat it as a signal, not the source of truth. Usually the first win is not subtle. One file, serializer, or resolver path is often responsible for an unnecessary amount of churn.

Here’s what a report for a slow request might look like:

memory_profile_report.txt
Total allocated: 146.82 MB (1482611 objects)
Total retained: 9.74 MB (37412 objects)
Allocated memory by location:
app/graphql/types/user_type.rb:41 46.18 MB
graphql/execution/interpreter/runtime/graphql_result.rb:180 12.44 MB
graphql/execution/interpreter/runtime.rb:200 11.97 MB
graphql/execution/interpreter/runtime.rb:201 9.86 MB
graphql/execution/interpreter/runtime.rb:753 9.86 MB
app/graphql/types/user_type.rb:36 8.12 MB
app/graphql/types/user_type.rb:28 5.44 MB
graphql/schema/field.rb:897 2.21 MB
Allocated objects by class:
String 1041228
Array 163904
Hash 119887
GraphQL::Execution::Interpreter::Runtime::GraphQLResultArray 10201
GraphQL::Execution::Interpreter::Runtime::GraphQLResultHash 10201
UserType 10000

The numbers themselves are less important than how you read them:

  1. The first thing I care about is the shape: huge allocated memory but much smaller retained memory. That indicates churn. The request is making Ruby work hard even if it isn’t leaking the whole time.

  2. app/graphql/types/user_type.rb:41 is at the top of the list and probably where the request is reshaping data or building strings per node.

  3. app/graphql/types/user_type.rb:28 being high too is a hint that repeated formatting work is part of the problem.

  4. The GraphQL runtime lines are real, but I would start with the app’s own code first. Framework internals matter less than the allocation patterns you control.

  5. String dominating the class list is another clue. If strings are flagged, look for formatting, reshaping, JSON-like hashes, and repeated value building before you go hunting in the database.

ScenarioWhat the flamegraph showsWhat allocations showFix
.all.map(&:name) on many recordsNothing stands out, time is spread thinTons of Active Record model objectsUse pluck(:name) to skip the model layer
Resolver builds a hash, then GraphQL reshapes it againTime split across serializer and resolver codeHigh object counts in your serializer filesPick one place to shape the output
Per-node strftime or string formattingTiny repeated calls, easy to overlookFlood of String allocationsCache or precompute the formatted value

A bad resolver can look simple and still be the problem

The worst offenders are fields that only need a couple of scalar values but load a pile of models anyway. It’s usually due to convenience, using .all or some broad relation, but then child methods reach into each model for the final response shape.

# before
field :tag_names, [String], null: false
def tag_names
object.tags.visible.order(:name).all.map(&:name)
end
# after
field :tag_names, [String], null: false
def tag_names
object.tags.visible.order(:name).pluck(:name)
end

Both of these versions hit the database once. That’s what makes this easy to miss. The query count doesn’t change, and a flamegraph may not show anything dramatic. But the first version instantiates every Tag model, plus whatever those models drag in through attribute wrappers and related associations. Then it throws almost all of it away. The second version asks the database for exactly the values the GraphQL field needs and skips the model layer entirely.

Strings are another offender. Suppose a field formats timestamps for every node in a list.

field :created_at_label, String, null: false
def created_at_label
object.created_at.utc.strftime("%Y-%m-%d %H:%M:%S UTC")
end

That looks harmless until it runs a few thousand times in one response and the report starts showing a flood of String allocations from formatting code. If the output format is fixed, then cache it or precompute it, or move the work to a place that doesn’t repeat per field resolution.

For list fields, the first question is whether you need full models at all. If the field only returns ids, names, or counts, pluck or a narrower select will skip the model layer entirely.

For serialization-heavy paths, look out for duplicate transformations. A resolver that builds a hash, then later a GraphQL object that re-derives just a part of that hash, is a code smell. Pick one layer for shaping output and keep it there.

If the flamegraph looks flat and the latency is still bad, stop shaving CPU time and start counting objects. The answer is usually not that any one piece of code is slow. It’s that the request as a whole creates too much garbage for Ruby to clean up quietly.