Stream large API responses: compression, exports, and limits
Learn how to stream large API responses with gzip/brotli, safe exports, and sane limits so big reports download reliably without crashing your app.

Why large API responses crash apps
Big responses usually fail in a familiar way: the report works on your laptop with a small database, then starts timing out, crashing, or returning a partial file in production. The data is bigger, the network is slower, and the server is handling real traffic at the same time.
Most apps break because they build the entire response in memory before sending anything. A “download report” endpoint queries a lot of rows, formats them into JSON or CSV, stores the whole result in a buffer, then finally writes it to the client. That spikes memory, triggers garbage collection, and slows down everything else.
The same failure points show up again and again:
- Memory spikes from buffering the full payload (sometimes more than once during retries)
- Timeouts at the app server, reverse proxy/load balancer, or client because the response takes too long
- Slow clients that read data slowly, keeping connections open and tying up workers
- Retries that double the load when the system is already struggling
- Huge JSON responses that are expensive to serialize and expensive for clients to parse
The goal isn’t “make the biggest file possible.” It’s to deliver large responses reliably so one heavy export doesn’t degrade the whole app for everyone else.
You can often fix this without rewriting the feature. Most teams get stability by combining three ideas: compress when it helps, stream data in chunks instead of buffering, and enforce limits (size, time, and rows) that match what users actually need.
If you inherited an AI-generated app where exports crash or authentication breaks mid-download, this is usually fixable with targeted changes (buffering, unbounded queries, missing backpressure). FixMyMess (fixmymess.ai) often sees these “works in dev, breaks in prod” exports and helps turn them into production-safe downloads.
Compression, streaming, and limits: what each one solves
Big responses usually fail for one of three reasons: the payload is too large to move quickly, too large to hold in memory, or too expensive to generate. Compression, streaming, and limits each target a different problem.
Compression makes the payload smaller on the wire. The server sends fewer bytes and the client downloads faster. This helps most when the content is text-heavy (JSON, CSV). It helps less when the data is already compressed (images, PDFs, ZIP files). Compression also doesn’t fix the core mistake of building a 200MB string in memory first.
Streaming changes how you deliver the response. Instead of building the whole export and then sending it, you send it in small chunks as you produce it. This is the main tool when you need to send millions of rows without running out of RAM. Streaming keeps memory flat, but it doesn’t automatically make the response smaller or cheaper to compute.
Exports are a special case. Sending a 200MB JSON response isn’t the same thing as offering a download. A download can be streamed as a file-like response (CSV/JSONL) and handled as it arrives. A massive JSON API response often forces clients to parse everything at once, which can freeze the UI or crash mobile apps.
Limits are the safety net. They stop worst-case requests from taking down your app when someone selects “all time” and “all customers.” Good limits usually include a maximum number of rows or bytes, a max request time, and rate limits on export endpoints. Many teams also set sensible defaults like a limited date range, and require an async/background job for very large reports.
Teams fixing broken AI-generated exports usually need all three: compression for speed, streaming for memory safety, and limits so one request can’t hurt everyone.
Choosing gzip or brotli without guesswork
Compression means the server packs the response into fewer bytes before sending it. The client unpacks it automatically. For large JSON payloads and exports, this can be the difference between a quick download and a request that times out.
Gzip is the older, widely supported option. Brotli is newer and often shrinks text a bit more, especially JSON and HTML. Both work best on text. Neither helps much if your response is already compressed.
How the client and server decide
Clients tell the server what they can decode using the Accept-Encoding request header (for example: br, gzip). The server should pick one of those encodings and set Content-Encoding in the response. If the header is missing, send the normal uncompressed response.
A practical rule: pick the best encoding the client says it supports, with a safe fallback.
- If
bris accepted, use Brotli for text responses. - Otherwise, if
gzipis accepted, use gzip. - If neither is accepted, send plain bytes.
When gzip is safer and when brotli makes sense
Choose gzip as the default when you need maximum compatibility (older clients, unusual proxies, or mixed environments). Choose Brotli when most traffic is modern browsers or your own controlled clients, and you care about saving a bit more bandwidth.
Keep the CPU tradeoff in mind. Smaller responses can cost more server work. Brotli often uses more CPU than gzip at similar settings. If your server is already busy generating reports, compression can push it over the edge. A common approach is gzip for most API JSON, Brotli for browser-facing endpoints, and lower compression levels for very large downloads.
Also skip compression for formats that are already compressed (ZIP files, PDFs, PNG/JPEG images, many video/audio types). You waste CPU and sometimes make the file bigger.
If you’re inheriting an AI-generated backend, a good “get stable fast” path is gzip for text responses plus clear size limits. Add Brotli only where you can prove it helps.
How to add compression safely
Compression is often the quickest win, but it can create confusing bugs if headers are wrong or the same data gets compressed twice.
Start with a simple rule: only compress when it helps. If the response is small, compression can waste CPU and add latency. A practical threshold is around 1-2 KB for JSON and 4-8 KB for CSV or plain text. Below that, send it as-is.
Compression works best for text-based content like JSON, CSV, HTML, and logs. It usually does little for images, PDFs, or already-compressed files (ZIP). For those, skip compression.
Set headers so browsers, proxies, and caches behave:
Content-Encoding: gziporbrto tell the client what you usedVary: Accept-Encodingso caches don’t mix compressed and uncompressed versions- A correct
Content-Type(for exampleapplication/jsonortext/csv) so clients parse it correctly - If you stream, avoid setting a
Content-Lengthyou can’t guarantee
Avoid double compression. This happens when your app compresses responses and a reverse proxy or framework middleware compresses again. Pick one place to do it, then confirm by checking response headers and doing a quick sanity test to ensure the bytes match the Content-Encoding.
Test it like a mini benchmark: compare response size, total time, and CPU before and after. Also test slow networks and “cancel mid-download,” since miswired compression often shows up as broken downloads or hanging requests.
Streaming exports so they do not blow up memory
The safest way to handle big exports is to never build the whole file in memory. Generate one row (or a small batch) and send it to the client immediately. Done right, the server does steady work and the download grows over time.
For exports, file-like formats are usually easier than “one huge JSON array” because you can write them line by line. CSV works well for spreadsheet workflows. NDJSON (one JSON object per line) works well for machine processing and log-style data.
When you stream large responses, slow connections matter. If a user downloads over a weak mobile link, the server must not buffer the entire export while waiting to send it. Use backpressure-aware writing (most web frameworks support this) so you only produce data as fast as it can be delivered.
Long downloads also need friendly timeouts. Keep the connection alive with periodic output, and set server/proxy timeouts high enough for expected report sizes. If you have a reverse proxy in front, confirm it allows long-running responses or it will cut off the export halfway through.
Streaming changes error handling. Once you start sending the file, you can’t switch to a neat JSON error response. Plan for this before you ship:
- Validate inputs and permissions before sending the first byte.
- Write a header early (CSV columns, or a metadata line for NDJSON).
- If something fails mid-stream, log it, stop cleanly, and make it obvious the file is incomplete.
A common “works in dev” bug is building arrays with hundreds of thousands of rows. Switching to streaming exports usually removes the memory spike immediately and keeps the app responsive while the download runs.
Step by step: implement a safe streaming download
Treat a download like a live pipe, not a big object you build in memory and return at the end.
Start by picking the export format based on how people use it. CSV is great for spreadsheets. NDJSON is better when another system will read it. A zipped bundle can help when you’re sending multiple files, but don’t reach for ZIP just to hide performance problems.
Next, make the work incremental. Instead of one huge query, read rows in pages (or via a cursor) and loop until there’s no more data. Your app should only hold a small slice at a time.
A simple sequence that prevents most “report killed the server” failures:
- Set headers up front (type + a download filename) and start the response.
- Fetch data in pages and convert each page into output lines.
- Write chunks and flush often (don’t build one giant string).
- Compress on the fly when it helps (streaming gzip is widely supported).
- Stop work when the client disconnects or the user cancels.
Compression is the bonus, not the foundation. It reduces bandwidth, but streaming is what keeps memory flat. For CSV and NDJSON, gzip usually gives a big win, as long as you compress as you write rather than after generating the whole file.
Validate with a realistically large dataset. A 1,000 row test can look perfect while a 5 million row export quietly runs out of memory, times out, or produces a truncated file.
Example: a “Monthly Transactions” export fails in production because it loads all rows, then joins them into a single CSV string. Switching to a paged loop plus chunked writes fixes it without changing what the user receives.
Enforce size and time limits that users can live with
If you want large responses without random crashes, you need limits that protect the server and still feel fair to users. The trick is making limits predictable, visible in behavior, and paired with an obvious next step.
Start with two hard caps: maximum rows and maximum bytes. Row caps keep slow queries from running forever. Byte caps prevent “successful” responses that overload proxies or buffers. When an export hits a cap, return a clear message that says what happened and what to change (for example, “Export capped at 100,000 rows. Narrow the date range or add a filter.”).
Put guardrails on the query itself so the database does less work. Common guardrails include a default date range (31 or 90 days), requiring at least one narrowing filter for “all customers” style reports, and a maximum page size even if the client asks for more. If you allow sorting or filtering, keep an allow-list and make sure the database can support it.
Time limits should exist at multiple layers: database statement timeout, server request timeout, and an application-level deadline for generating exports. When you cut off work, do it cleanly. Return a specific error that tells users how to succeed next time, not a generic 500.
Rate limiting is the other half of “livable” limits. One user repeatedly downloading a huge report should not starve everyone else. Throttle expensive endpoints by user and by organization, and consider separate limits for interactive requests vs exports.
Finally, log near-limit requests (rows, bytes, time, filters used) and alert when they cluster. If many users hit a 90-day cap, that’s a signal to add a summary report or an async export option.
Security checks for exports and big responses
Big exports fail in two ways: they crash the app, or they quietly leak data. Treat exports like a separate feature with its own security rules.
Start with authorization. A common bug is an export endpoint that checks “is the user logged in?” but forgets “are they allowed to see these rows?” Reuse the same permission checks as the on-screen report, and apply them on the server before writing any data.
CSV exports have a special risk: CSV injection. If a user-controlled field starts with characters like =, +, -, or @, opening the file in a spreadsheet can run a formula. The fix is simple: escape or prefix risky values (for example, add a leading apostrophe) for exported cells that come from users.
Export failures can also spill secrets. When a job times out, it’s tempting to log the full query, headers, or request body. That can expose API keys, auth tokens, or personal data in logs. Prefer an internal export ID plus a short error code, and keep sensitive values out of stack traces.
Flexible report filters are another trap. “Sort by any column” or “filter with a raw query string” can turn into SQL injection if you build SQL with string concatenation. Use parameterized queries and allow-lists for sortable and filterable fields.
Also protect long-running exports from abuse. A small set of guardrails goes a long way:
- Re-check the user token (or session) when the export starts, not only when it was queued
- Throttle exports per user/workspace
- Put a hard cap on rows or time and return a clear message when the limit is hit
- Record who exported what and when for audit
These checks are easy to miss when you’re focused on “make it download.” The goal is a download that’s reliable and safe in production.
Common mistakes that cause broken downloads
Broken downloads usually happen because the server tries to be “helpful” in the wrong place. A fast test with small data looks fine, then a real report hits production and the app freezes, times out, or returns a file that won’t open.
One easy trap is compression everywhere. Compressing a 2 KB JSON response can cost more CPU than it saves, especially under load. Compression shines when responses are large and repetitive (exports, long lists, logs). For small responses, skip it or enforce a minimum size.
Another common failure is building the whole export in memory before sending it. It feels simpler to create one big string or buffer, but it scales badly. A 200 MB CSV can become much larger in memory during formatting, and a few users doing this at once can crash the process.
Other mistakes that show up often:
- Calling something “streaming” while still generating the full CSV first, then writing it
- Streaming JSON in a way that produces invalid JSON (missing brackets, trailing commas, partial objects)
- Ignoring timeouts from the client, reverse proxy, or load balancer (the export runs, but the connection is already dead)
- Testing only on a fast local network with small data, then shipping without a slow-network or real-size test
- Hitting limits (size/time) with no clear message, so users just see a failed download
Streaming JSON deserves special care. If you need strict JSON, stream a well-formed array and manage commas carefully. If clients can accept it, choose a format built for streaming like JSON Lines/NDJSON.
When limits are hit, tell the user what happened and what to do next (narrow filters, smaller date range, or request an async export).
A realistic example: fixing a failing report export
A founder clicks “Monthly sales report” and waits. The browser tab spins, the app becomes slow for everyone, and after a minute the download fails. On the server, the report endpoint is building the entire CSV in memory before sending anything. One big month (or a few extra columns) pushes memory over the edge and the process restarts.
The fix wasn’t “make the server bigger.” It was changing how the export is produced and delivered so it handles large datasets predictably.
What changed:
- The server writes CSV rows as it reads them from the database, instead of collecting them into a huge string.
- Gzip compression is enabled for the download so the file is smaller over the network.
- A hard cap is added (for example, 31 days at a time) with a clear error message if the user asks for more.
- A timeout and max row limit are enforced so one request can’t hog the system.
The user experience improves right away. The download starts within a second or two because the server can send headers and the first bytes immediately. The file often finishes faster because it’s smaller, and failures drop because the server is no longer trying to hold everything in memory. If the user needs a longer range, the UI can guide them to run multiple exports.
For the team, the biggest win is stability. Memory stays flat, CPU spikes are lower, and support tickets like “report froze the app” disappear. This is the kind of work FixMyMess often does when AI-generated prototypes break in production: move exports to streaming, add safe compression, and put limits in place so a single export can’t take down the app.
Quick checklist before you ship
Test the worst case, not the happy path. Pick the largest report your users can realistically request and run it end to end the same way they will (same filters, same roles, and ideally a real device). This is where “works on my machine” downloads usually fall apart.
Checklist:
- Run the biggest export and confirm it finishes successfully (no 500s, no partial files, no “network error” after a couple minutes).
- Watch server memory during the export. It should stay mostly flat. A slow, steady climb usually means you’re buffering instead of streaming.
- Make sure limits are visible where users choose the report: max date range, row caps, and any timeouts.
- Verify authorization with real roles (admin, standard user, restricted roles). Confirm you can’t export data you can’t view.
- Check logs after a large run: response size, whether compression was used, time spent generating, and whether the request hit a limit.
If you inherited an AI-generated export that keeps crashing, the fastest fix is usually a short audit to find where buffering happens, then add streaming and hard limits.
Next steps if your app is already crashing
If the app crashes when users run big reports, treat it like an incident: stop the bleeding first, then improve the experience. Tuning compression before you have guardrails usually wastes time.
A sensible order of work:
- Add hard limits (max rows, max bytes, max time) and return a clear error when hit.
- Switch exports to streaming so the server never loads the whole file in memory.
- Tune compression after the basics are stable, and only where it helps.
Once limits are in place, you can support large downloads without taking down the process. The key change is avoiding buffering: don’t build the full JSON/CSV in an array or string, and don’t log full payloads on errors.
Make a quick worst-case test plan before touching production:
- The largest report users actually run (or the biggest table in prod)
- A slow client network (throttled) while downloading
- Two or three concurrent exports from different users
- A canceled download halfway through
- A request that hits the limit (verify the message and that the server stays healthy)
If your codebase was generated by tools like Lovable, Bolt, v0, Cursor, or Replit, these bugs often hide in a few places: auth wrappers that retry forever, error handling that dumps full responses into logs, and helper utilities that call things like toString() or json() too early (forcing full buffering).
A fast remediation usually looks like diagnosis (find where memory spikes and where buffering happens), targeted fixes (limits + streaming + safer errors), verification (load tests and export integrity checks), and deployment preparation (timeouts, worker sizing, and monitoring). If you want a second set of eyes, FixMyMess at fixmymess.ai can run a free code audit to pinpoint crash points in exports, security gaps, and performance issues, then help ship a working fix in 48-72 hours.