Move fast and fix things

Posted on 2025-09-26

Dogma is cheap. Engineering is expensive.

It's a story you've heard before: a team shipped a prototype to prove a product idea. It worked, and the users loved it. The prototype was pushed to production with a few TODOs and optimism. Six months later, every changed touched that service. Incidents clustered around it and junior engineers avoided it. The team moved "fast" for one week and paid for it every week after.

This isn't an argument against speed; it's an argument against speed without foundations and a management culture that pushes risk onto engineers, users, and the future. "Move fast and break things" made sense at a certain point in a uniquely placed and outrageously resourced company's history. For most teams, the cheat code is compounding velocity, which comes from moving deliberately and fixing relentlessly.

Who benefits?

"Move fast" is management-friendly. Growth charts go up, experiments ship, and the roadmap looks impressive to investors.

Who pays?

Engineers on call at 3 a.m.
Future teams trudging through winding code paths.
Users' performance regressions and distrust of updates.
The product roadmap, as unexpected rework adds time and complexity to feature estimates.

Speed without foundations is a high-interest loan. The interest rate shows up in every sprint: slower PR reviews, more incidents, and longer cycle times.

The case for moving slowly (so you can go faster)

"Slow" isn't literal, it's friction-aware. It means you should:

Reduce blast radius with isolation and feature flags.
Shorten feedback loops with CI, tests, and observability.
Eliminate recurring toil before adding more feature debt.

The payoff isn't philosophical, it's practical: you spend less time recovering and more time building the right thing. You just have to put the ground work in first to create a foundation where more product experiments can be run.

Constant low-quality speed leads to burnout for engineers and users, so empower your team to do their best work by encouraging a healthy environment to experiment in. Collect the data - including engineers' personal reflections - and iterate based on it.

A simple decision framework

Use this matrix before you start a sprint:

              BLAST RADIUS
           low                   high
SPEED   +---------------------+----------------------+
 low    | polish later        | harden first         |
        | (safe sandboxes)    | (APIs, auth, data)   |
        +---------------------+----------------------+
 high   | prototype fast      | abort or isolate     |
        | (flags, throwaway)  | (split systems,      |
        |                     |  gates)              |
        +---------------------+----------------------+

Legend

Speed = how quickly you’re trying to deliver (time pressure, shortcuts, iteration pace).
Blast radius = the potential scope of damage if the change goes wrong (small sandbox vs. core systems).
Interpretation:
- Low speed + low blast: safe to polish later, but timebox it.
- High speed + low blast: prototype quickly, ideally throw away.
- Low speed + high blast: harden first, prioritise contracts and safety.
- High speed + high blast: avoid, or isolate tightly (flags, separate systems).

Five questions to ask before you cut corners:

How reversible is this decision?
What’s the blast radius if we’re wrong?
What signals will tell us quickly that we’re wrong?
What does “done” mean beyond “it works on my machine”?
If this becomes permanent (it will), what hurts first?

Foundations that make speed safe

Think of these as the flooring under your product engineering:

Contracts & versioned interfaces
- Typed boundaries, consumer-driven tests, schema migrations with roll-forward/rollback.
Fast, boring CI/CD
- Lint, unit, contract, and a small number of high-signal E2E tests.
- PR checks finish in minutes, not hours. (Slow feedback is a tax on quality.)
Observability from day one
- Structured logs, traces, SLOs, and dashboards that answer: Is it broken? Why?
- Feature flags with per-tenant rollouts and kill switches.
Operational hygiene
- Runbooks, autoscaling, rate limits, idempotency, backpressure.
- “Definition of Done” includes instrumentation and rollback.

These aren’t gold-plating. They’re how you ship quickly without living in incident review calls.

Where moving fast does make sense

Hypothesis tests: Narrow questions with cheap failure (“Do users click X?”).
Internal tools: Low-stakes prototypes that don’t face customers.
Exploration spikes: Hard timebox (48–72 hours), explicit decision to keep/discard.

Guardrails for fast work:

Do it behind flags with a default-off kill switch.
Mark code as throwaway in the repo structure (e.g., /spikes/…) and CI excludes.
Create a decision memo at the end: Keep? Rewrite? Delete? No memo → auto-delete.

Throwing code away is valid (so actually do it)

Most “temporary” code becomes permanent because deletion feels risky. Make deletion the default:

Timebox spikes and schedule the removal as a first-class task.
If you keep it, pay the acceptance fee: tests, docs, observability, ownership.

LLMs: fast scaffolding, slow foundations

Generated code is great for ideas and scaffolding, not as a substitute for human intent or guardrails.

Anti-patterns

Paste-driven development: code appears without design.
Unknown unknowns: plausible code that hides edge cases.
Dependency drift: generated snippets lock in libraries you didn’t choose.

Policy that works

Human-in-the-loop review: no exceptions.
Test-before-trust: new surfaces must arrive with tests, not “we’ll add them later.”
Label AI PRs and track rework vs. manual PRs. If AI PRs create more rework, adjust usage.
Security & licensing checks by default.

Generated code isn’t free; maintenance is where the bill shows up.

Product engineering on solid ground

The modern product engineer is a system thinker. They balance customer outcomes with platform realities:

Translate product goals into shaped work with clear edges and constraints.
Refuse one-way-door shortcuts on core systems (auth, billing, data contracts).
Insist on operational acceptance criteria: metrics, alerts, runbooks.

Empowerment means the right to say “Not like this.” Provide an alternative, but hold the line.

Tactics that increase velocity by “moving slowly”

Fix Fridays / Stability Sprints: recurring time to retire sharp edges.
Error budgets tied to roadmaps: breach the SLO → fewer launches until healthy.
20% debt quota: every cycle reserves time for de-risking (tests, docs, infra).
Change size discipline: smaller PRs, faster reviews, fewer surprises.
Golden paths: paved examples for common tasks (service template, CI config, observability scaffolding).
Incident learning that sticks: every postmortem yields a check or template change—not just a doc.

Metrics to watch (DORA + a few practicals)

Lead time for changes
Change failure rate
MTTR
Deployment frequency
Escaped defects per release
- Trend direction matters more than absolutes; use them to justify the “fix” work

A script for pushing back (and aligning)

Use this when the room wants speed and you smell future pain:

“We can ship this in two ways. Path A gets it out this week with a high blast radius and no rollback. Path B adds contracts and a flag; it’s two days slower but reversible. Given the risk, I recommend Path B. If we choose A, here’s the explicit risk we’re accepting and the rollback plan I’ll need approved.”

Make risk legible. Leaders can accept risk; they shouldn’t accept surprise.

Memorable truths

Slow is smooth, smooth is fast.
You don’t outrun tech debt; you refinance it.
If it’s not observable, it’s not done.
Prototype like it’s disposable, design like it will live forever.
Fixing things isn’t the cost of speed; it’s the engine of speed.

Try this for two weeks

Tag one high-blast-radius area.
Add missing contracts/tests/flags until you can roll forward or back safely.
Measure PR lead time and incident count before/after.
Write a three-paragraph note on what got easier.

If your team doesn’t feel faster afterward, I’ll be surprised.

What about my startup, before we find product–market fit?

Sure, this seems neat for companies at scale-up or larger but if you’re pre-PMF, “move slowly” probably sounds ludicrous. Your survival depends on speed: testing ideas, shipping experiments, finding users before the runway ends. That’s true, but speed isn’t the whole story.

On one hand:

Blast radius still matters. Move fast on the edges (landing pages, sign-up flows), but be careful with the core (auth, billing, customer data). A security breach or broken billing system can kill trust before you even get traction.
Shortcuts compound faster in startups. You don’t have a platform team to clean up later, so a single bad shortcut can stall you just when you should be accelerating.
You can rent, not build. Lean on hosted services and boring tech for foundations (auth, feature flags, observability). It’s faster and safer.

On the other hand:

Premature architecture kills momentum. If you spend three months on Kubernetes before you have users, you may not survive to see the benefits.
Over-engineering experiments wastes time. Don’t add enterprise-grade observability to a feature you might throw away next week.
Runway is short. A perfect platform without product–market fit is just an expensive hobby.

The balance:

Prototype fast, but delete aggressively.
If it touches customer data or money, build it like it will live for five years.
If it’s an experiment, cut corners ruthlessly but be honest and tear it down if it doesn’t stick the landing.

Startups don’t need “move slowly and fix things” or “move fast and break things.” They need:

Move fast on the surface, move carefully at the core.

The point

“Move fast and break things” was a growth hack that turned into dogma. Dogmas are bad engineering. The teams that win long-term don’t worship speed; they compound it by moving thoughtfully now, so they can move effortlessly later.

Move deliberately and be empowered to fix things now. Then watch how fast you really go.