⚙️ Node.js Architecture · Operator-Translation

Three layers do all the work.
Skip any one and it breaks at scale.

Q: What's the cheapest way to do scalable data exports on AWS?

SQS + Lambda + S3 is the cheapest at-scale stack on AWS for export workloads under 15 minutes. SQS is pay-per-message (fractions of a cent per export). Lambda is pay-per-invocation + execution-time. S3 standard storage is cheap, and presigned URLs let users download directly without proxying through your infrastructure (which would burn data-egress costs). For a startup running a few hundred exports per day, the entire stack typically costs under $10/month.

Q: Why should the user download the export from S3 instead of the API?

Three reasons. (1) Load balancer timeouts — gigabyte downloads through your API will exceed the 30-60 second timeout most LBs enforce. (2) Broken retries — if the download fails partway, the user has to re-trigger the entire export instead of just resuming the download. (3) Egress costs — proxying gigabytes through your API doubles your data-transfer bill. S3 presigned URLs (24-hour expiry typical) let the user download directly from S3, which was built for exactly this pattern.

Most "export to CSV" features ship as a synchronous endpoint that loads the table into memory and stringifies it. This works at 10k rows and dies at 1M. The correct pattern is three decoupled layers — job queue, streaming export, presigned-URL delivery — and the difference between them is whether your worker holds the dataset in memory or just routes bytes through it.

PJ · Cardiff operator · 858-461-8054

⚡ TL;DR · 30-second read BullMQ on Redis to decouple the request from the work + native Node streams from a DB cursor → CSV transform → gzip → S3 multipart upload + presigned-URL delivery via email or webhook. Memory stays flat regardless of row count. Workers scale horizontally. The two patterns that kill teams: OFFSET pagination at scale (it's O(n²) — use keyset/`WHERE id > lastId`) and running the worker in-process with the API (kills latency for everyone else when an export runs). Get those two right and the rest is plumbing.

The 3-layer architecture, each layer doing one job.

Each layer is independently scalable and independently replaceable. The boundaries between them are where most "export feature" bugs actually live.

LAYER 1

Job queue · decouple the request from the work

User clicks Export → API enqueues a job → returns instantly with a job ID. The HTTP request is done in < 50ms regardless of how big the export is. Use BullMQ on Redis (modern, replaces old Bull). It gives you priorities, retries with exponential backoff, concurrency caps per worker, dead-letter handling, and Prometheus-friendly metrics out of the box.

Why decouple: HTTP timeouts, retries, load balancers, and frontend UX all assume requests are fast. Long-running work in the request path breaks every one of those assumptions simultaneously.

LAYER 2

Streaming export · never load the dataset in memory

Worker pulls rows from a DB cursor (Postgres: pg-cursor, Mongo: native cursor), pipes them through a CSV/JSONL transform (csv-stringify in object mode), pipes through gzip (zlib.createGzip()), pipes into an S3 multipart upload (@aws-sdk/lib-storage Upload class). Every layer is a Node stream — backpressure handles itself.

Why streams: memory stays flat at ~50-100MB regardless of whether you're exporting 10k rows or 50M. Compression often shrinks CSV by 90%+ before bytes ever hit S3.

LAYER 3

Delivery · presigned URL, not HTTP body

When the upload completes, generate an S3 presigned URL (24-hour expiry typical) and notify the user via the channel they expect — email (Resend / Postmark / SES), webhook, in-app notification, or websocket. User downloads from S3 directly. Your API never sees the bytes.

Why presigned URLs: gigabyte downloads through your API will time out, exhaust your egress budget, and break on retries. S3 was built for exactly this pattern. Let it do its job.

Reference stack · the actual pieces that fit together.

Concrete versions of each layer. Swap any piece for an equivalent in your stack — the architecture survives.

// 1. API endpoint — instant ACK, returns job ID
app.post('/exports', async (req, res) => {
  const job = await exportQueue.add('csv', {
    userId: req.user.id,
    query: req.body.query,
    format: 'csv'
  });
  res.json({ jobId: job.id, status: 'queued' });
});

// 2. Worker (separate process) — streams the export
import { Worker } from 'bullmq';
import { Upload } from '@aws-sdk/lib-storage';
import Cursor from 'pg-cursor';
import { stringify } from 'csv-stringify';
import { createGzip } from 'zlib';
import { PassThrough } from 'stream';

new Worker('exports', async (job) => {
  const client = await pool.connect();
  const cursor = client.query(new Cursor(job.data.query));

  const body = new PassThrough();
  const upload = new Upload({
    client: s3,
    params: {
      Bucket: 'exports',
      Key: `${job.id}.csv.gz`,
      Body: body,
      ContentType: 'text/csv',
      ContentEncoding: 'gzip'
    },
    partSize: 5 * 1024 * 1024
  });

  const uploadPromise = upload.done();
  const csv = stringify({ header: true });
  const gz = createGzip();
  csv.pipe(gz).pipe(body);

  while (true) {
    const rows = await cursor.read(1000);
    if (!rows.length) break;
    for (const r of rows) if (!csv.write(r)) await once(csv, 'drain');
    await job.updateProgress(cursor.rowCount);
  }
  csv.end();
  await uploadPromise;
  await cursor.close();
  client.release();

  // 3. Deliver presigned URL
  const url = await getSignedUrl(s3, new GetObjectCommand({
    Bucket: 'exports', Key: `${job.id}.csv.gz`
  }), { expiresIn: 86400 });

  await sendEmail(job.data.userId, url);
}, { connection: redis, concurrency: 4 });

AWS-native variant · same pattern, different boxes.

If you're already on AWS and don't want to manage Redis, the architecture is identical — every layer maps cleanly to a native AWS service. Same 3 layers, less infrastructure to babysit.

Layer	Bull/Redis stack	AWS-native equivalent
Job queue	BullMQ on Redis	Amazon SQS (standard or FIFO)
Worker process	Always-on Node process (EC2 / ECS / Fargate)	AWS Lambda triggered by SQS (zero infra, pay per message)
Retries + DLQ	Manual via BullMQ retry options	SQS DLQ built-in (set `maxReceiveCount`)
Job state	Redis hash per job	DynamoDB table keyed by job ID (or just SQS message attributes)
Multi-stage orchestration	Chained BullMQ jobs (manual)	AWS Step Functions — declarative state machine, automatic retry/branching

// 1. API endpoint — enqueues to SQS
import { SQSClient, SendMessageCommand } from '@aws-sdk/client-sqs';
const sqs = new SQSClient({ region: 'us-east-1' });

app.post('/exports', async (req, res) => {
  const jobId = crypto.randomUUID();
  await sqs.send(new SendMessageCommand({
    QueueUrl: process.env.EXPORT_QUEUE_URL,
    MessageBody: JSON.stringify({ jobId, userId: req.user.id, query: req.body.query })
  }));
  res.json({ jobId, status: 'queued' });
});

// 2. Lambda handler — triggered by SQS, streams the export
export const handler = async (event) => {
  for (const record of event.Records) {
    const { jobId, userId, query } = JSON.parse(record.body);

    const client = await pool.connect();
    const cursor = client.query(new Cursor(query));

    const body = new PassThrough();
    const upload = new Upload({
      client: s3,
      params: {
        Bucket: process.env.EXPORT_BUCKET,
        Key: `${jobId}.csv.gz`,
        Body: body,
        ContentType: 'text/csv',
        ContentEncoding: 'gzip'
      }
    });

    const uploadPromise = upload.done();
    const csv = stringify({ header: true });
    csv.pipe(createGzip()).pipe(body);

    while (true) {
      const rows = await cursor.read(1000);
      if (!rows.length) break;
      for (const r of rows) if (!csv.write(r)) await once(csv, 'drain');
    }
    csv.end();
    await uploadPromise;
    await cursor.close();
    client.release();

    // 3. Presigned URL → SES email
    const url = await getSignedUrl(s3, new GetObjectCommand({
      Bucket: process.env.EXPORT_BUCKET, Key: `${jobId}.csv.gz`
    }), { expiresIn: 86400 });
    await sendSesEmail(userId, url);
  }
};

⚠️ The one watch-out: Lambda has a 15-minute execution cap. If exports might exceed that, route the long jobs to AWS Batch (containerized worker, no time limit) or break the export into stages with Step Functions. For most "export this report" use cases, 15 minutes is plenty — 1M rows of CSV gzipped to S3 typically completes in 2-4 minutes from a well-indexed query.

When to pick which stack:

BullMQ on Redis — already running Redis, want a job-monitor UI (Bull Board), need fine-grained concurrency control, multi-cloud or on-prem.
SQS + Lambda — already on AWS, no Redis ops appetite, jobs typically <15 min, want pay-per-use cost. Default for AWS-native shops.
Step Functions — multi-stage pipelines (validate → query → upload → notify) where declarative orchestration + visual debugging matter.
AWS Batch — long-running (>15 min), CPU/memory-heavy, or GPU. Containerized job runs to completion.

The five anti-patterns that kill exports at scale.

Each one looks reasonable until row count crosses a threshold, then everything breaks at once. Worth a checklist before you ship.

✗Loading the full result set, then `JSON.stringify`

Works perfectly at 10k rows. OOMs your worker at 1M.

✓ Stream from a cursor. Memory should stay flat regardless of row count.

✗`OFFSET 1000000 LIMIT 1000` pagination

Every page rescans all earlier rows. O(n²) total. Last page on a 10M-row table can take minutes.

✓ Keyset pagination: WHERE id > $lastId ORDER BY id LIMIT 1000. Constant time per page. Or just use a server-side cursor and skip pagination entirely.

✗Running the worker in the same process as the API

One concurrent export pegs the event loop for everyone else's requests. p99 latency goes nonlinear.

✓ Separate worker process(es). Horizontal scaling is then a matter of running more containers.

✗Returning the file body in the HTTP response

Load balancer timeouts, broken retries, exhausted egress, frontend that hangs for 12 minutes.

✓ Upload to S3 + return a presigned URL. User downloads directly from S3.

✗No checkpointing, no idempotency

Worker crashes 80% of the way through. Job retries — restarts from row 1 with no record of progress, double-writes the partial file.

✓ Stable S3 key per job ID (overwrite on retry, idempotent). Persist progress to job state. For multi-hour exports, write checkpoints so you can resume from row N.

Why this is an operator-translation problem.

The capability layer is solved. Every cloud has S3-equivalent object storage, every language has streaming primitives, every queue library on Redis handles distributed work. The pieces are abundant.

The translation gap is wiring those pieces together correctly the first time. Most teams ship a synchronous export endpoint, hit a memory ceiling, then bolt on a queue. Then hit OFFSET slowness, bolt on cursors. Then hit egress costs, bolt on S3. Each retrofit is 2-4 weeks of work that the original architecture would have absorbed for free.

The 3-layer pattern is the operator-translation — turning "Node.js can do this" into "here's the wiring that works in production from day one." Worth getting right before row counts force the rewrite.

Capability is abundant. The differentiation is wiring.
Three layers, each doing one job — and the export problem is permanently solved.

Most asked questions · quick answers.

The questions readers send most often after reading the architecture. Honest answers, both stacks covered.

Should I use BullMQ on Redis or AWS SQS for my Node.js export queue?

If you're already running Redis, want a job-monitor UI like Bull Board, need fine-grained concurrency control, or are multi-cloud / on-prem — use BullMQ. If you're already on AWS, don't want Redis ops, jobs typically complete in under 15 minutes, and want pay-per-use cost — use SQS + Lambda. SQS + Lambda is the default for AWS-native shops because it has zero infrastructure to babysit and SQS DLQ is built in.

Why does my Node.js data export crash with out-of-memory errors?

Almost certainly because you're loading the full result set into memory then JSON.stringify-ing it. Works perfectly at 10K rows, OOMs at 1M. The fix is to stream from a database cursor (pg-cursor for Postgres, native cursor for MongoDB) through a CSV transform (csv-stringify in object mode), through gzip (zlib.createGzip), into an S3 multipart upload. Memory stays flat at 50-100MB regardless of row count because every layer is a Node stream and backpressure handles itself.

Can AWS Lambda handle data exports longer than 15 minutes?

Lambda has a hard 15-minute execution cap. For exports that may exceed that, route them to AWS Batch (containerized worker, no time limit, runs to completion) or break the export into stages with AWS Step Functions for declarative orchestration. For most "export this report" use cases, 15 minutes is plenty — 1M rows of CSV gzipped to S3 typically completes in 2-4 minutes from a well-indexed query. If your exports regularly approach the cap, the database query is usually the bottleneck (missing index, OFFSET pagination), not Lambda.

What's wrong with OFFSET pagination for large data exports?

OFFSET pagination at scale is O(n²) — every page rescans all earlier rows. The last page on a 10M-row table can take minutes to return. The fix is keyset pagination: WHERE id > $lastId ORDER BY id LIMIT 1000. Constant time per page regardless of how deep into the dataset you are. Even better for exports: skip pagination entirely and use a server-side cursor that streams rows continuously.

Should I run the data export worker in the same process as my API?

No. One concurrent export will peg the event loop for everyone else's requests, and your p99 latency goes nonlinear. Always separate the worker process from the API. Horizontal scaling becomes a matter of running more worker containers (or in AWS-native land, Lambda autoscales for you). The decoupling also means you can cap export concurrency without affecting API throughput, and a worker crash doesn't take down the API.

How do I make data export jobs idempotent?

Use a stable S3 object key per job ID — the same job retrying writes to the same S3 location, so retries overwrite cleanly instead of producing duplicate partial files. Persist progress to job state (BullMQ job.updateProgress, or DynamoDB for SQS+Lambda). For multi-hour exports, write checkpoints (last_processed_id) so a crashed worker can resume from the checkpoint instead of restarting from row 1.

What's the cheapest way to do scalable data exports on AWS?

SQS + Lambda + S3 is the cheapest at-scale stack on AWS for export workloads under 15 minutes. SQS is pay-per-message (fractions of a cent per export). Lambda is pay-per-invocation + execution-time. S3 standard storage is cheap, and presigned URLs let users download directly without proxying through your infrastructure (which would burn data-egress costs). For a startup running a few hundred exports per day, the entire stack typically costs under $10/month.

Why should the user download the export from S3 instead of the API?

Three reasons. (1) Load balancer timeouts — gigabyte downloads through your API will exceed the 30-60 second timeout most LBs enforce. (2) Broken retries — if the download fails partway, the user has to re-trigger the entire export instead of just resuming the download. (3) Egress costs — proxying gigabytes through your API doubles your data-transfer bill. S3 presigned URLs (24-hour expiry typical) let the user download directly from S3, which was built for exactly this pattern.

Wiring this in your stack?

If you're sketching the queue + streaming + delivery pieces and want a second set of eyes on a specific shape — text the row count, DB, and current bottleneck. Most export problems collapse to two or three architectural decisions. No funnel, no SOW.

Text PJ · 858-461-8054

Three layers do all the work.Skip any one and it breaks at scale.