Building a Job Queue in Go with PostgreSQL

May 1, 2026 · 3 min read · go, postgresql, distributed-systems, backend

I built Forge — a background job queue in Go backed by PostgreSQL. This post covers the interesting engineering decisions behind it.

Why PostgreSQL and not Redis?

Most job queues default to Redis. It's fast, simple, and has good pub/sub primitives. But Redis is a cache first — durability is an afterthought. If your Redis instance restarts without persistence configured, your queue is gone.

PostgreSQL gives you ACID guarantees for free. A job enqueued is a job that survives a crash. For a job queue where reliability matters, that's the right trade-off.

The dequeue problem

The hardest part of building a job queue isn't enqueuing — it's safely dequeuing under concurrency. If you have 5 workers polling the same table, you need to guarantee that two workers never pick up the same job.

The naive approach is:

SELECT * FROM jobs WHERE status = 'PENDING' LIMIT 1;
UPDATE jobs SET status = 'PROCESSING' WHERE id = $1;

This has a race condition. Two workers can both SELECT the same row before either runs the UPDATE.

The fix is FOR UPDATE SKIP LOCKED:

SELECT * FROM jobs
WHERE status = 'PENDING'
  AND scheduled_at <= $1
  AND pg_try_advisory_xact_lock(hashtext(id))
ORDER BY priority DESC, scheduled_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED

FOR UPDATE SKIP LOCKED tells Postgres: lock this row for me, and if it's already locked by someone else, skip it and move on. Combined with advisory locks, this makes concurrent dequeue completely safe — no two workers will ever process the same job.

Retry with exponential backoff

When a job fails, it shouldn't be retried immediately. Hammering a downstream service that's already struggling makes things worse. Forge uses exponential backoff:

func (j *Job) NextRetryDelay() time.Duration {
    base := 5 * time.Second
    shift := j.Attempts
    if shift > 10 {
        shift = 10
    }
    return base * (1 << j.Attempts)
}

Attempt 1 retries after 10s, attempt 2 after 20s, attempt 3 after 40s, and so on — capped at ~85 minutes. Once a job exhausts its MaxRetries, it moves to the Dead Letter Queue instead of silently disappearing.

Dead Letter Queue

Failed jobs don't get deleted. They land in the DLQ where you can inspect them, see the error, and retry manually. The dashboard shows all dead jobs with their error messages and attempt counts — one click to retry individually or in bulk.

Worker pool

The worker pool is straightforward — N goroutines polling for jobs:

func (p *Pool) Start() {
    for i := 0; i < p.concurrency; i++ {
        p.wg.Add(1)
        go p.worker(i)
    }
}

Graceful shutdown on SIGTERM: the cancel context propagates to all workers, they finish their current job, and exit cleanly. No jobs get abandoned mid-processing.

The API

A simple REST API over the queue:

POST /jobs            — enqueue a job
GET  /jobs/:id        — check job status
GET  /dlq             — list dead jobs
POST /dlq/:id/retry   — retry a dead job
GET  /stats           — queue statistics

Jobs are typed — you register a handler per job type and the worker routes accordingly:

pool.Register("email", handleEmail)
pool.Register("webhook", handleWebhook)

What I'd add next

Queue isolation — currently workers poll all queues. Dedicated workers per queue would let you prioritise critical work.
Scheduled jobs — the schema already has scheduled_at, the API just needs a way to expose it.
Prometheus metrics — queue depth, processing latency, error rate per job type.

The full source is on GitHub.