~/posts/when-ruby-graphql-requests-are-slow
When Ruby GraphQL requests get slow
Ruby GraphQL is unusually good at hiding allocation problems behind code that looks harmless. You can fix the obvious N+1 issues and other database gotchas and after profiling, still see a bunch of ugly p99 spikes. Often the root of slow queries isn’t the query at all, it’s due to how requests allocate a pile of short-lived objects that force Ruby’s GC to stop and clean up at exactly the wrong time.
The real problem
A single field in GraphQL-Ruby can touch some or all of the following:
- AST objects,
- Authorization checks,
- Field wrappers,
- Application models,
- Presenters,
- and JSON hashes
All before anything leaves the process.
None of that has to be individually slow to become expensive as a whole. The real problem is the object churn in aggregate. GraphQL-Ruby wraps a lot of structure around your business logic, and your application code adds another layer on top. If you have some list field that returns 500 nodes, that could also mean you’re instantiating 500 ActiveRecord objects, then 500 GraphQL objects, then finally a bunch of nested arrays and strings while resolving the child fields as well.
The difficult part with debugging is that short-lived garbage doesn’t leave a nice signature you can point to. If each resolver is just a little wasteful, no single method dominates the time profile. The request just slowly creates enough heap pressure that the tail gets ugly. Each line item seems fine, but the whole request shape is not.
Profiling the Schema
The question you want to answer isn’t “which method was hottest” but “which code path allocated the most stuff while this exact GraphQL operation ran.”
MemoryProfiler is a great tool for this. It gives you allocation counts and retained-memory summaries in a form you can make sense of easily. This isn’t something you want to leave on for normal traffic. Allocation profiling has serious overhead. The good thing is you can add explicit opt-in and authorization checks before any diagnostic data comes back on a response.
Here’s an example for a GraphQL controller. It wraps execution only when an admin asks for it, runs MemoryProfiler.report around the real schema call, and appends a small summary to extensions.
class GraphqlController < ApplicationController def execute result = if should_profile_allocations? profile_allocations else AppSchema.execute( params[:query], variables: ensure_hash(params[:variables]), context: graphql_context, operation_name: params[:operationName] ) end
render json: result end
private
def should_profile_allocations? current_user.admin? && params[:profile_allocations].present? end
def profile_allocations query = params[:query] variables = ensure_hash(params[:variables]) context = graphql_context operation_name = params[:operationName]
graphql_result = nil
report = MemoryProfiler.report do graphql_result = AppSchema.execute( query, variables: variables, context: context, operation_name: operation_name ) end
graphql_result[:extensions] ||= {} graphql_result[:extensions][:allocation_profile] = { total_allocated: report.total_allocated, total_retained: report.total_retained, allocated_memory: report.total_allocated_memsize, retained_memory: report.total_retained_memsize, top_locations: report.allocated_memory_by_location.first(10), top_classes: report.allocated_objects_by_class.first(10) }
graphql_result endendThe full report can expose internals you don’t want floating around outside a debugging session, so keep the payload small. A summary in extensions is usually enough to point you at the hottest locations and object classes.
Making sense of the report
Start with total allocated objects and total allocated memory for the request. Then look at retained memory, but treat it as a signal, not the source of truth. Usually the first win is not subtle. One file, serializer, or resolver path is often responsible for an unnecessary amount of churn.
Here’s what a report for a slow request might look like:
Total allocated: 146.82 MB (1482611 objects)Total retained: 9.74 MB (37412 objects)
Allocated memory by location: app/graphql/types/user_type.rb:41 46.18 MB graphql/execution/interpreter/runtime/graphql_result.rb:180 12.44 MB graphql/execution/interpreter/runtime.rb:200 11.97 MB graphql/execution/interpreter/runtime.rb:201 9.86 MB graphql/execution/interpreter/runtime.rb:753 9.86 MB app/graphql/types/user_type.rb:36 8.12 MB app/graphql/types/user_type.rb:28 5.44 MB graphql/schema/field.rb:897 2.21 MB
Allocated objects by class: String 1041228 Array 163904 Hash 119887 GraphQL::Execution::Interpreter::Runtime::GraphQLResultArray 10201 GraphQL::Execution::Interpreter::Runtime::GraphQLResultHash 10201 UserType 10000The numbers themselves are less important than how you read them:
-
The first thing I care about is the shape: huge allocated memory but much smaller retained memory. That indicates churn. The request is making Ruby work hard even if it isn’t leaking the whole time.
-
app/graphql/types/user_type.rb:41is at the top of the list and probably where the request is reshaping data or building strings per node. -
app/graphql/types/user_type.rb:28being high too is a hint that repeated formatting work is part of the problem. -
The GraphQL runtime lines are real, but I would start with the app’s own code first. Framework internals matter less than the allocation patterns you control.
-
Stringdominating the class list is another clue. If strings are flagged, look for formatting, reshaping, JSON-like hashes, and repeated value building before you go hunting in the database.
| Scenario | What the flamegraph shows | What allocations show | Fix |
|---|---|---|---|
.all.map(&:name) on many records | Nothing stands out, time is spread thin | Tons of Active Record model objects | Use pluck(:name) to skip the model layer |
| Resolver builds a hash, then GraphQL reshapes it again | Time split across serializer and resolver code | High object counts in your serializer files | Pick one place to shape the output |
Per-node strftime or string formatting | Tiny repeated calls, easy to overlook | Flood of String allocations | Cache or precompute the formatted value |
A bad resolver can look simple and still be the problem
The worst offenders are fields that only need a couple of scalar values but load a pile of models anyway. It’s usually due to convenience, using .all or some broad relation, but then child methods reach into each model for the final response shape.
# beforefield :tag_names, [String], null: false
def tag_names object.tags.visible.order(:name).all.map(&:name)end
# afterfield :tag_names, [String], null: false
def tag_names object.tags.visible.order(:name).pluck(:name)endBoth of these versions hit the database once. That’s what makes this easy to miss. The query count doesn’t change, and a flamegraph may not show anything dramatic. But the first version instantiates every Tag model, plus whatever those models drag in through attribute wrappers and related associations. Then it throws almost all of it away. The second version asks the database for exactly the values the GraphQL field needs and skips the model layer entirely.
Strings are another offender. Suppose a field formats timestamps for every node in a list.
field :created_at_label, String, null: false
def created_at_label object.created_at.utc.strftime("%Y-%m-%d %H:%M:%S UTC")endThat looks harmless until it runs a few thousand times in one response and the report starts showing a flood of String allocations from formatting code. If the output format is fixed, then cache it or precompute it, or move the work to a place that doesn’t repeat per field resolution.
For list fields, the first question is whether you need full models at all. If the field only returns ids, names, or counts, pluck or a narrower select will skip the model layer entirely.
For serialization-heavy paths, look out for duplicate transformations. A resolver that builds a hash, then later a GraphQL object that re-derives just a part of that hash, is a code smell. Pick one layer for shaping output and keep it there.
If the flamegraph looks flat and the latency is still bad, stop shaving CPU time and start counting objects. The answer is usually not that any one piece of code is slow. It’s that the request as a whole creates too much garbage for Ruby to clean up quietly.