112 tests green, and still broken in production

Every test passed. The local end-to-end run passed. The deploy was clean. Then the first real request never reached the code.

The build was as green as builds get. The full unit suite passed, all 112 of them. A local end-to-end run passed. The deploy ran without error. By every signal I had, the feature worked.

The first real request hit production and bounced. It never reached my code at all. The auth layer at the edge was path-scoped to the old routes, and the new routes it had never been told about got silently redirected to a login page. The request died at the perimeter, in a layer none of my tests touched.

i.The failure lived where the tests could not look

This is the uncomfortable part. There was no bug in the code I wrote. The code was correct, and it was correctly tested. The failure lived in the gap between the tested system and the deployed system: the edge configuration that only exists in production, that the tests have no view of, that the local run quietly faked.

Local development had a dev bypass for the auth layer. That bypass exists for a good reason, it lets you test without standing up the whole authentication stack on your laptop. But the bypass did not just make local testing easier. It hid an entire class of failure, because the layer that broke in production was the exact layer the bypass switched off locally. I had tested a system that, in the one respect that mattered, was not the system I deployed.

On bypasses A bypass that makeslocal testing easyalso hides the configthat only exists in prod.

ii.Green tests prove the wrong thing here

It is tempting to read 112 green tests as proof the feature works. They are not. They are proof the logic is correct given the inputs the tests provide. They say nothing about whether the request reaches the logic in the deployed environment, because reaching the logic is a property of the edge config, the routing, the auth scoping, and those things are not what unit tests exercise.

So a green suite and a working production feature are two different claims. The suite earns the first. Only a request that travels the real production path, through the real edge, into the real code, earns the second. I had spent all my verification budget on the first claim and none on the second, and the second was the one that mattered to a user.

iii.The deploy smoke is its own gate

The fix in process terms is simple to state and easy to skip: the deploy smoke test is a gate, not a formality. After the deploy, before you call it done, you send a real request along the real path that a user would take, in production, with no bypass, and you confirm it lands where it should. Not a local run. Not a test against a mock. The actual deployed surface, hit the way the world will hit it.

That single check would have caught this in seconds. The first real request bounced; a deliberate first real request, run as a gate, would have bounced in front of me instead of in front of a user, and the path-scoped auth would have been an obvious five-minute fix instead of a live outage.

Green tests provethe logic.Only a real requeston the real pathproves the feature.

iv.Test the system you deploy, not the one you mock

The general lesson is about the seam between what you test and what you ship. Every convenience that makes local testing easier, a bypass, a mock, a stub, a fixture, also widens that seam, because it makes the tested system diverge from the deployed one. The conveniences are worth it, but they come with a debt: somewhere, you have to test the system as it actually runs, with the bypasses off.

That place is the deploy smoke. Treat it as a real gate with real teeth. Send the real request. Watch it complete in production. Then, and only then, call the feature done. Green tests are necessary. They are not the same as working, and the gap between them is exactly the part a user sees first.

Drafted with Bishop, my AI partner.
Words picked, edited, and approved by me. Model provenance: Claude Code (Claude Opus and Sonnet)