When I was coding at IBM, we had pretty clear quality metrics that had to be met before a product went out the door. We had to execute all of our tests, and pass 95%, for instance. No, not 100%, because good developers ought to write tests even if they know the current code won’t be able to pass them – that’s far better than not writing the test, and someone at IBM got that. We also couldn’t ship with any P1 defects, and all P2 defects had to have a “disposition” – a workaround, or at least clear documentation on alternatives. We were, after all, IBM.
I remember one product cycle where things were particularly tight. Maybe they’re all “particularly tight.” In this case anyhow, some teams had fallen far behind, to the point that our team was being brought in to do triage and QA on their code as well. It was a stressful time for the product managers, for the whole department.
We were also not meeting our quality goals. There were significant P1s that still didn’t have fixes, and our pass rate on tests was mid-80s. We were asked to “focus.”
Whether it was encouraging “focus” per se, or just competent, dedicated people trying to do their job, we made some headway. Tests-passed got into the high-80s, not many P1s got fixed but a couple more P2s had workarounds written. Not enough, but better. Still, we were about to run out of time. That’s when we got an email.
“We test our code to make sure that the intended functionality succeeds,” it started (or words to that effect.) “Obviously, it wouldn’t make sense to test functionality we never expected to have. If we were releasing a word processor, and wanted to get inline spellcheck in, but just couldn’t do it, well then it would hardly be sensible to wring our hands about failing the inline spellcheck tests, would it?”
Oh…kaaaaay… we thought, all of us together.
“So if there are tests failing that we know we can’t fix in time, then that’s functionality we don’t intend to ship. So it doesn’t make sense to include those in our tests.”
With those tests removed, of course, our pass rate went way up. Ahem.
There was still the matter of the wayward P1s and P2s, but every developer in the room knows how those were fixed. One morning we all came in to a bunch of bugmail saying that our P2s were now, coincidentally and en masse, P3s; our P1s were all either P2s or P3s depending on how plausibly a workaround could be written.
And the product shipped. And customers complained. And tech sales wept. And a year after shipping we had no active, deployed, reference customers. And we did that thing, where we taught our customers not to trust our X.0 software, to wait for at least two service packs before trusting us. I hate doing that thing.
This isn’t about me throwing stones at IBM, it’s about underscoring how hard metrics are to get right, and how prone people are to gaming them when their incentives are misaligned. I bet the product managers got congratulated for shipping Another On-Time Release. I’m sure, too, that the blame for the market failures was spread broadly enough to be much less impactful, so it’s hardly surprising that PMs would act this way. I know that’s not novel insight, but I’ve always held on to that story as one of my own favourite examples.
The Mozilla community has amazed and impressed me with its active awareness of, and resistance to, these kinds of games, but it’s a never-ending battle. We, too, will second-guess our decision to mark some feature as P1 when we get down to it, or our decision to mark some bug as blocking. But I feel like there’s a cultural difference in game-awareness that’s important; those decisions generally seem to have “Are we gaming things here?” as part of the discussion. Can anyone tell me how we get there? IBM is not full of idiots nor of self-serving cycnics. If someone can tell me how to bottle that awareness, and cultivate it in software companies, and make it stick, I’ll write the book and give you a cut.

I’m about to go on at 



