Twelve Percent Faster, and Other Numbers I Don't Yet Believe
Four data sets, three methodologies, and no one who matters can yet give you a single defensible figure for AI's effect on knowledge-work productivity. The measurement culture is not up to the tooling — and the vendors will close that gap last.
Last Friday I sat next to a junior engineer at Real AI for an hour while she rewrote a parser. She had Copilot on, then off, then on again, and at the end she said the thing every honest practitioner has said in 2026: I don't know if I was actually faster.
That sentence is the entire problem.
The numbers came in waves this month
Two weeks ago Microsoft dropped its 2026 Work Trend Index. Sixty-six percent of AI users say AI freed them up for higher-value work. Fifty-eight percent say they are producing work they could not have produced a year earlier. Among the most active users — Microsoft calls them Frontier Professionals — that number climbs to eighty. The same release reports a fifteen-fold year-over-year increase in active agents on Microsoft 365.
A week before that, McKinsey put out its 2025 State of AI carry-over data and pulled in the more uncomfortable counterpoint: ninety-four percent of respondents say they have not yet seen significant value from AI investment. Only six percent — the so-called high performers — attribute five percent or more of EBIT to AI use. Those firms are nearly three times as likely as everyone else to have redesigned individual workflows from the ground up.
The Stanford team's AI Index 2026 splits the difference: fourteen to fifteen percent productivity gain in customer support, twenty-six percent in software development, fifty percent in marketing output, in studies where the workflow was instrumented.
Anthropic's own Economic Index update for March 2026 added a third register: roughly forty-nine percent of jobs now have at least a quarter of their tasks being done through Claude in real conversations its API saw. That is a different kind of measurement entirely — usage-tape evidence, not survey response.
Four sets of numbers. Three different methodologies. One topic. Stop and notice that nobody who matters is yet able to give you a single defensible figure for AI's effect on the productivity of knowledge work in 2026. I will not pretend otherwise.
The honest measurement problem
When my father, an engineer who spent forty years drilling tube wells across the plains, was asked how much faster a new rig was, he never answered with a percentage. He would say: we used to lose seven men a season; this year we lost two. He measured what survived.
The current productivity discourse has lost that habit. Most numbers in your inbox right now are self-reported. The Microsoft 66% is a survey question. So is the 58%, the 80%, the 90% of GitHub Copilot users who said they completed tasks faster. The Hawthorne effect, the social-desirability bias, the fact that anyone given a new tool will rate it highly for the first six months — none of this is being controlled for in the headline numbers most boards are seeing.
This is what I mean by the honest measurement problem. Knowledge work has always been hard to measure because the output is mostly invisible — judgement, framing, what got into the final deck, what didn't. AI doesn't change that. It only changes how vigorously we are tempted to lie to ourselves about it.
Where the discipline actually lives
The piece of recent work I take most seriously is the Microsoft–NBER field experiment on shifting work patterns. Fifty-six firms. Six thousand workers. Random assignment held for six months. Treated workers spent about half an hour less per week reading email, completed documents twelve percent faster, and about forty percent of them used the tool regularly. The point is not the magnitude. The point is the methodology — control group, random assignment, behavioural telemetry rather than self-rating. That paper is the floor of what an honest enterprise should now demand from its internal AI rollout. Most don't.
The arXiv longitudinal Copilot case study released in January 2026 is the other piece worth your time. It found no statistically significant change in commit-based developer activity after Copilot adoption, even as the developers themselves reported feeling more productive. Both things can be true. Both probably are. But the gap between them is the entire game.
The macro counterpoint nobody on a vendor stage will quote
US nonfarm business labour productivity grew only 0.8% in Q1 2026, down sharply from 2.9% in the same quarter a year ago. Software investment has run at an 11.1% annual clip since 2019 — the fastest of any asset category by a wide margin — and yet the headline productivity figure is cooling. Labour's share of nonfarm business income has dropped to 54.1%, the lowest reading since the BLS series began in 1947.
So the AI capex is flowing. The capital deepening is real. The headline labour productivity number is not yet moving in the way the deck slides imply. Either we are in another Solow-style lag (entirely plausible — Brynjolfsson and colleagues have argued this case for years and it usually rhymes), or the productivity is being captured but not measured at the firm level. I lean toward both being true at once. I do not lean toward "the lag is over."
The cognitive cost line item
There is a second-order number nobody is putting on a board deck yet. A new arXiv paper, The Augmentation Trap, studied 319 knowledge workers across 936 real AI use cases and found that higher confidence in the AI was associated with less critical thinking by the human. A 2026 BCG follow-up, tracking 244 consultants through about 5,000 AI interactions, found three modes: Cyborgs (60%) iterated with AI and grew new skills; Centaurs (14%) used it selectively and produced the highest-accuracy work; and 27% of highly-trained consultants delegated entire workflows and built neither AI fluency nor domain fluency.
Twenty-seven percent. That is the number to put on the slide. The autopilot fraction. The cohort that is silently losing the judgement they were hired for, while their self-rated productivity sails upward. In an enterprise context this is the line item I would expect AI-risk underwriters to start asking about by the end of next year. They will not yet get a clean answer from anyone.
The manager-shaped lever
Gallup's 2026 State of the Global Workplace puts global engagement at 20% — the lowest reading since 2020 — and finds manager engagement collapsed from 31% to 22% between 2022 and 2025. In the same report: the strongest predictor of whether an employee uses AI, after technical access, is whether their direct manager actively champions it. In organisations where AI has been deployed, the share of workers who expect their job to be eliminated by AI in five years rises from 18% to 23%.
If you are running an AI rollout and you have not invested in your line managers — not slogans, actual training and load-shedding — the ROI on your seat licences is going to be embarrassing.
What I'd push for
Three things if I were writing the brief.
First, instrument before you celebrate. If the only metric you have is a satisfaction survey, you have nothing. Workflow telemetry, output quality scored by a second human, and at least one shadow control group — not just A/B'd prompts.
Second, budget for the autopilot cohort. Assume 20–30% of senior people will quietly hollow out their own skills unless a centaur-style discipline is actively designed in. This is not a personality failing. It is a default of any pervasive tool with a confident interface.
Third, fix the manager layer first. The Gallup data is the most operationally useful number out of any of these reports this month, and it is the one no AI vendor will ever quote at you.
A closing memory
In 2003 a Dutch bank I was helping put a major rollout on hold for six weeks because the team realised they had been measuring login frequency as a proxy for adoption. Once they swapped the metric for time-to-close on a specific support category, the number fell and the project survived. Wrong number, real harm avoided. The lesson did not survive in the institutional memory of that bank — I have watched the same mistake repeat itself in three other firms since.
That is the discipline this month wants from us. We have more AI in production than at any point in the history of the field. We do not yet have a measurement culture that is up to it. The vendors will close that gap last, because the gap is what is selling the seats. The job of the people inside the firm is to close it first.
Tarry Singh is the founder and CEO of Real AI, a visiting professor in the Netherlands and Italy, and a founding contributor to the EU-funded HCAIM and PANORAIMA initiatives training the next generation of human-centred AI practitioners across Europe.