Engineering·April 24, 2026

How We Ship Every Day

Three engineers. No staging environment. 946 commits to production in 30 days. How we made shipping safe enough to do constantly.


How We Ship Every Day

In the last 30 days, we pushed 946 commits to production across Blue’s core repositories — that’s one code change going to customers every 46 minutes, averaged across the full month. We have three engineers, 19,000+ companies running their operations on the product, no staging environment, and no feature flag service. Every line of code we write is one git push away from customers using it.

This post is about how we made that safe. Not how we made it fast — shipping fast is easy if you don’t care about the consequences. We care a lot about the consequences. What follows is the set of guardrails that let us ship at this cadence without burning the platform down.

Why shipping frequency matters

A bug reported to a small team on Monday morning can ship that afternoon. The same bug reported to a large engineering organization ships in the next release window — often two weeks later, sometimes a month. Multiply that delta across every interaction a user has with the product, and after a year the higher-cadence team has absorbed dramatically more real-world friction: obscure edge cases fixed, small frustrations smoothed, wording clarified, flows shortened.

The compounding effect is the whole argument. It’s not that any single change is more impactful. It’s that when the cost of shipping approaches zero, the threshold for “is this worth fixing?” also approaches zero — and a thousand small improvements are what distinguishes a product people tolerate from one they recommend.

The interesting question isn’t whether high shipping frequency is valuable. It obviously is. The interesting question is: what does it take to do this safely?

The pipeline

Blue runs on dedicated servers at Hetzner, not on AWS. We wrote about why we left the cloud. The deployment pipeline is Disco — an open-source tool built on top of Docker Swarm.

Here’s what happens when I push a commit to main on the frontend repo:

  1. GitHub webhook fires.
  2. The Disco daemon running on our production server receives the push event via the GitHub App.
  3. Disco pulls the commit and builds a Docker image using the repo’s Dockerfile.
  4. Docker Swarm rolls the new service version — new container comes up, health checks pass, old container drains.
  5. Caddy reverse-proxies traffic to whichever container is healthy.

Total time from git push to production: 3-5 minutes.

The Disco configuration is literally this file, disco.json, in the frontend repo:

{
  "version": "1.0",
  "services": {
    "web": {
      "port": 3000
    }
  }
}

Five lines. That’s it. No pipeline YAML with a hundred steps. No Jenkinsfile. No Kubernetes manifests. No Helm charts. Swarm handles the rolling update, Caddy handles routing, Disco orchestrates the build. The backend GraphQL API uses the same config with one additional flag for Docker networking. Multiple deploys per day, using a file that fits on a Post-it.

There is no branch strategy to describe. We work on main. There’s no release branch, no sprint cut, no dev branch that gets promoted to prod. Pull requests exist — we use them for code review — but they land in main the same day they’re opened, usually within hours of being opened.

How the codebase is organized

Before talking about the pipeline, it’s worth talking about what feeds into it. The shape of the codebase is a big part of why daily shipping is sustainable at our size.

Blue isn’t a single monorepo. Each major service — the GraphQL API, the frontend rewrite, the legacy frontend, the forms app, the file service, the collaboration server, the CLI, the infrastructure definitions, the Python SDK — lives in its own Git repository. That separation fits the shape of the team: services have independent release cycles, independent dependencies, and independent owners-of-the-moment.

What’s less common is the meta-repo sitting above them. We keep a separate repo called blue/ that’s dedicated entirely to orchestration. It contains no production code. What it contains is the connective tissue that makes the rest of the platform feel like one thing:

  • scripts/setup.sh — clones every service repo as a sibling directory inside the meta-repo, idempotently. Safe to run on a fresh machine or to refresh an existing checkout. A new engineer goes from nothing to a full platform checkout with git clone blue && ./scripts/setup.sh.
  • compose.yml — a three-line file using Docker Compose’s include: directive to stitch together each service’s own compose file. One command from the meta-repo — docker compose up — brings up the entire stack: API, frontend, MySQL 8, Redis, Stripe CLI in listen mode, Firebase emulator, LocalStack for S3. No per-engineer README, no bespoke scripts, no “works on my machine” mystery.
  • docs/ — around 30 markdown files covering platform-wide concerns that don’t belong to any single service: the deployment pipeline, the records architecture, the permissions model, the frontend cache strategy, the testing approach, the style guide, the infrastructure overview. Cross-cutting knowledge lives here in one place, not scattered across per-repo READMEs that go stale.
  • plans/ — our active and completed initiatives as markdown. Every non-trivial piece of engineering work on Blue begins as a plan file: context, tradeoffs, approach, open questions. When it ships, the file moves to plans/completed/. We have 69 active plans right now and 55 in completed. There is no Jira instance, no Linear workspace, no Notion database behind this. Planning and shipping live in the same tool. git log plans/ is our decision history.
  • CLAUDE.md — an orientation document at the root, written for both human engineers and AI coding agents. It describes what each repo is, where docs and plans live, and how pieces fit together.

Secrets live next to the code, but encrypted. Each service repo’s environment files (.env.development, .env.production, etc.) are encrypted at rest with git-crypt. They travel through git like any other file, but without a GPG key they’re opaque. Onboarding a new engineer to secrets is one git-crypt add-gpg-user command — not a dance of 1Password shares, Slack DMs, and .env.example files that drift from reality.

The practical effect of all of this is that every engineer on the team has an identical local setup. Same directory structure, same services running, same compose file, same docs, same plans. When someone joins or comes back from a break, they’re not reconstructing a personal environment from memory; they’re running the same script everyone else runs.

It also means every piece of context — production code, documentation, architectural decisions, active plans, deployment configs, even encrypted secrets — is reachable with a single git grep from one working directory. A change that spans the API and the frontend (most of them do) doesn’t require jumping between tools. You’re already sitting at the root of everything.

None of this is groundbreaking on its own. Monorepo-ish layouts, Docker Compose includes, docs-in-git, plans-in-git — they’re all well-trodden ideas. The compounding effect is what matters. Each piece removes a specific type of friction — environment drift, tool-switching, onboarding time, stale documentation, secret sprawl. Take them all away and the cost of making a change drops to the point where making changes casually becomes reasonable. That’s the precondition for shipping every day.

No staging environment

This is the part that sounds the most reckless. We don’t have a staging environment.

A lot of teams will read that and assume we’re cowboys. The reasoning usually goes: you need staging to catch bugs before they hit users. But staging environments have a well-known failure mode — they drift from production. Staging runs on different data, different traffic patterns, sometimes different versions of services. Bugs that only surface under production load don’t get caught in staging anyway, and the bugs that do get caught were usually bugs a reasonable test or type-check would have caught.

More importantly, staging adds a human-scale delay to every change: deploy to staging, open the app, click around, file a ticket, fix the thing, deploy to staging again. That cycle — which feels like caution — is the exact thing that kills shipping velocity. It’s what turns three-engineer-speed into two-hundred-engineer-speed.

We went the other direction. Production is the only environment. What keeps that safe isn’t hope — it’s a stack of guardrails that each do a specific job.

The guardrails

TypeScript, end to end. Both our frontend (app-next, Vue 3) and backend (api, Node.js 22) run TypeScript strict mode. GraphQL types are code-generated from the schema. Prisma types are code-generated from the database. A meaningful share of the bugs a staging environment would catch are caught by tsc on my laptop before I push.

A large, real-database test suite. Our API repo has 320 test files, including 150 integration tests that run against a real MySQL 8 database in Docker. Our frontend has another 128 test files. We don’t mock databases, Redis, or queues — we’ve been burned before by mocked tests passing while production broke. Test services spin up with docker compose -f compose.test.yml up -d, and CI runs the full suite on every PR that touches the affected code. Writing tests is not an afterthought for us — it’s the thing that makes the rest of this model viable.

Infrastructure as code. Every piece of our production infrastructure is defined in Ansible playbooks, checked into Git. Not “mostly” — everything. The MySQL configuration, ProxySQL routing rules, Replication Manager failover logic, the XtraBackup scripts, the cron schedules, the fencing logic that prevents split-brain, the Loki log shipper, the Prometheus exporters — all of it lives in ansible/playbooks/. Nobody SSHes into a server and edits a config file. A change is a git commit and a playbook run. If both database servers disappeared tomorrow, we could provision two new Hetzner boxes and run the playbooks; the entire stack would be back in under an hour, configured identically. We wrote more about this in our database backups post.

Observability across every service. We see what’s happening in production in real time. Dozzle gives us Docker logs across every container as a searchable web UI — no SSH required. A full Grafana stack sits behind it: Loki for log aggregation, Prometheus for metrics, Promtail shipping from every server, Node Exporter on each box, 30-day retention on both logs and metrics. If something looks off after a deploy, we know within minutes, not the next business day. Telegram alerts fire for backup failures, replication lag, and failover events, straight into the team channel — the things we’d otherwise miss.

Docker Swarm rolling updates. When a new version deploys, Swarm brings up the new container, runs health checks, routes traffic, then drains the old one. If the new container fails its health check, old traffic keeps flowing. Nothing goes down. A broken build is a deploy that silently doesn’t ship — not an outage.

Instant rollback. A bad deploy is one Disco command away from being reverted. We’ve done it a handful of times in the last year. Mean time to recovery when a regression does hit production is measured in minutes, not hours.

We use our own product. Blue runs on Blue. Every commit to main is a commit I’m about to use myself, five minutes from now. That’s a better incentive than any QA checklist.

Each of those guardrails does one specific thing. Together, they replace what staging would have caught — and they do it without adding a two-hour delay between finishing a change and shipping it.

The daily texture

Here’s what shipping daily actually looks like from the inside.

On April 2, we pushed 77 commits to production — a pricing-page rework, a database migration, and a handful of fixes, all in one day. Even quiet days look like this: on April 18, a Saturday, we shipped 8. This morning before lunch: 14. Over the last 90 days, there has not been a single calendar day on which we deployed zero commits.

Across the last 30 days, we averaged around seven distinct deploy events per day. That’s not seven commits — that’s seven separate build-and-rollout cycles. Features, fixes, content changes, migrations, refactors, all rolling into production while customers are using the app.

To put it in absolute terms: the Blue platform codebase is around 1.85 million lines across 15,173 commits. We shipped 6% of our total commit history in the last 30 days alone. The frontend rewrite (app-next) turned five months old this week and has accumulated 3,275 commits in that time — an average of about 22 commits a day, every day, from a team of three.

What this means for customers

Most of the benefit of daily shipping is invisible from the outside. A bug reported against a product with a quarterly release cycle disappears into a queue; by the time the fix lands, the customer has either built a workaround or churned. A bug reported against Blue is typically fixed the same day. A feature request that would normally enter a roadmap backlog often ships within days or weeks.

The second-order effect is the one that matters. When shipping is nearly free, the bar for “is this worth fixing?” drops to whatever a reasonable engineer decides in the moment. Hundreds of small improvements that a slower-shipping team would defer indefinitely get made, quietly, over the course of a year. That’s the real product difference.

Trade-offs and limits

This model works for us, but it isn’t the right fit for every team. A few honest observations on where it breaks down.

It requires a small team where every engineer is fully trusted to reason about consequences. The moment you have engineers whose commits you wouldn’t personally review before a deploy, you need scaffolding we don’t have — a review gate, a staging environment, a cautious release schedule. That scaffolding isn’t a sign of dysfunction; it’s a reasonable response to scale.

It requires a product where most changes are low-risk relative to a rollback. Frontend tweaks, bug fixes, content updates, incremental feature work — these are safe to ship continuously because their blast radius is contained and rollback is cheap. Irreversible operations — schema migrations that drop data, changes to billing logic, anything involving money movement — get handled with more care. Those changes get extra review, staged in phases, and shipped at low-traffic hours, even in our setup.

It requires investing up front in the guardrails. None of the tooling above is exotic, but it takes time to wire up and maintain. A team that wants to ship daily without this stack is not shipping daily; it’s gambling daily.

Closing

When we talk internally about why the three of us can build a product that competes with much larger teams, the answer isn’t that we’re more productive per engineer. It’s that we’ve structured the work so that what each of us produces reaches users faster. The difference between “the fix ships this afternoon” and “the fix ships next quarter” compounds across every interaction.

The investment that makes this possible is mostly removing things, not adding them. No staging, no release train, no feature flag service, no change review board. What’s left is a short path between a decision and a deployed change, and a stack of guardrails that make that path safe to travel.

None of this is novel. Trunk-based development, continuous deployment, infrastructure as code — the practices are well-documented and the tooling is mostly open source. The interesting part isn’t the techniques. It’s the discipline to keep the path short as the team grows, and the honesty to admit when it needs to stop being short.

— Manny