Why We Built Cobalt

The deployment platform is the most leveraged piece of software in our stack. Every release passes through it. Every secret is decrypted by it. Every customer request is routed by it. For most of Blue’s life, we ran on someone else’s.

Today we’re shipping our own. Cobalt is an open-source deployment platform — single Go binary, Docker Swarm orchestration, automatic HTTPS via Caddy, rqlite for the control plane. MIT-licensed. In production now, running the infrastructure that 19,000 organizations rely on.

Why we built it

We ship hundreds of pull requests a month. That cadence is how three engineers compete with venture-funded teams. It only works if deploys are boring.

Ours weren’t.

Intermittently, a deploy would land in a state where Caddy’s in-memory routing pointed at a Docker Swarm service that hadn’t actually come up healthy. The result was a window of 502s until somebody noticed. The fix was reproducible enough that we’d turned it into a AI Agent fix skill — deploy-proxy-fix — that we’d run from a laptop, over SSH, in about five minutes.

A skill that fixes your deploy bug is a useful skill. But its existence meant we couldn’t approve and merge pull requests from a phone, because the moment something went sideways we needed a laptop. For a team of three trying to ship from anywhere, that’s a structural problem, not a bug.

Around the same time, we found 150+ stale Docker networks accumulated on the production host — leaked across deploys, never cleaned up, eventually consuming IP space. Another patch. Another runbook. Each one rational in isolation. Together, a signal.

These weren’t bugs to fix. They were design-level decisions to make differently.

Then there was money. The deployment platform we were running offered a paid tier to prioritize our pull requests. Reasonable for them. Wrong for us. We just shipped one-time pricing for Blue — $99, $299, $999, paid once, no renewal. We are structurally the cheapest serious option in our category. Every recurring vendor cost we take on eventually shows up in a customer’s bill. Owning our deployment layer is a cost decision as much as a technical one.

How it works

Cobalt began as a fork of Disco, whose disco.json + Dockerfile model we kept and rebuilt the engine underneath. The shape stays familiar — drop a cobalt.json file in your repo, push to GitHub, it deploys. The differences are below the surface.

┌─────────────────────────────────────────────────────────────┐
│                         COBALT DAEMON                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   CLI (You)  ──────────►  HTTP API  ──────────►  Deploy     │
│                                                             │
│                            │                    │           │
│                            ▼                    ▼           │
│                      ┌──────────┐        ┌──────────┐       │
│                      │  Store   │        │  Build   │       │
│                      │ (rqlite) │        │  Kit     │       │
│                      └──────────┘        └──────────┘       │
│                            │                    │           │
│                            ▼                    ▼           │
│                      ┌──────────────────────────────┐       │
│                      │        Caddy + Docker        │       │
│                      └──────────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘

Four things matter.

rqlite for the control plane. Most single-host deployment tools use SQLite for their state. rqlite is a distributed SQLite built on Raft — same SQL surface, same files, but replicated. Today Cobalt runs single-node; rqlite is there so the day we need multi-node, we don’t have to migrate the data layer first.

Stable project IDs, not project names. Every Docker label, every Caddy @id, every internal lookup in Cobalt is keyed on a project’s stable ID, not its display name. The label is cobalt.project.id={id}; the Caddy route is cobalt-project-{id}. Renaming a project becomes a metadata change instead of a re-keying operation across the whole host. In a system that has accumulated infrastructure objects over a year of production, this is the difference between idempotent and not.

Reconciler-based Caddy state. The 502 class of bug we’d been patching was a reconciler problem — the in-memory Caddy config and the on-disk source of truth drifted apart under certain failure paths. Cobalt reconciles Caddy state from rqlite on every deploy and swaps upstreams via @id-keyed PATCH that’s atomic from Caddy’s perspective. The bug class is gone, not patched.

One binary, one runtime. Cobalt is a single Go binary. The CLI and the daemon are the same executable. Where comparable tools split a Python daemon and a Node.js CLI — two runtimes, two dependency trees, two version trains — Cobalt is one statically-linked file you scp to a server. Deploying Cobalt itself is cobalt server and a systemd unit.

The code is MIT-licensed. Read it, fork it, run it.

What it changes for customers

When we say “we control our stack,” it sounds like a slogan. Here’s what it actually means for teams running their work on Blue.

The 502 class of incident is designed out. Not patched, designed out. Same for the network leak. We’re not promising no failures — we’re saying the failure modes we already knew about have been addressed at the architecture layer, not the runbook layer.

We ship more, from anywhere. Cobalt deploys are faster than what they replaced, and crucially, they’re recoverable without SSH. The constraint that anchored us to laptops during ship hours is gone. More features, more fixes, more often.

Price stays low because cost stays low. Cobalt is the infrastructure half of the same bet behind our business model. The more of the stack we own, the more of the cost curve we control. Cloud bills go up. Managed-service tiers go up. Code we run ourselves does not. One-time pricing only works when there’s no recurring cost stack underneath that would force the asterisk later.

Open by default. Cobalt is on GitHub. The deployment layer is one of the highest-leverage pieces of software in any company’s stack — it touches every release, every secret, every domain. We don’t think it should be a black box for us, and we don’t think it should be one for anyone else running it.

What’s next

Cobalt is in production. It’s orchestrating Blue right now — every API call, every deploy, every customer request flowing through blue.cc.

The roadmap is what we’re focused on next: managed Postgres, automated backups to S3-compatible object storage, ephemeral preview environments per pull request, multi-node Swarm. Each one is a piece of cost or operational complexity we’d otherwise be renting from someone else, and passing on the extra costs to you.

The case for owning your infrastructure used to end at the hardware. We don’t think it does anymore.

— Manny