Essays·30 May 2026·22 min read

neuromodulation code comprehension substrate software architecture sovereign ai symphony

A Nervous System for Software

The full argument behind the comprehension gap. Why we cannot close it by enlarging the engineer; why both the statistical and the structural approaches hit a ceiling this decade; what it means to give a codebase a nervous system instead; and why it takes four disciplines that almost never share a project to find out. The SYMPHONY thesis, in long form.

For seventy years we have built artefacts faster than we can read them. That sentence sounds like a complaint about discipline. It is not. It is a statement about arithmetic, and arithmetic does not care how good your engineers are.

A system you could once hold in a single head — read end to end, reason about over an afternoon — became, across the software-eats-everything decades, an artefact whose internal complexity outran any individual's comprehension by something on the order of four orders of magnitude. The exact figure is almost beside the point. What matters is the shape of the two curves. Complexity climbs, on a log scale, roughly exponentially. Human comprehension — the amount one person can actually hold in mind — grows, at best, linearly. Plot them together and you do not get a lag that the next tool closes. You get a wedge that opens wider every year. Software entropy is not a metaphor borrowed from physics for color. It is the lived condition of every codebase old enough to have outlived the people who wrote it.

Fig. — Complexity × ComprehensionLog scale · drag →

Year

2023

Complexity

10,438×

Comprehension

1.28×

We have answered that divergence the only way we knew how: by enlarging the engineer. More abstraction. More tooling. More documentation laid over an artefact that keeps getting larger underneath the documentation. Every wave of developer tooling — the IDE, the architecture diagram, the static analyser, the AI pair-programmer — has been, at bottom, a way of making one human able to attend to more. It works, locally and temporarily, the way a ladder works against a rising tide. The tide is the complexity curve. No ladder closes a gap that grows three orders of magnitude faster than you can climb it.

This essay makes one claim and then spends the rest of itself defending it: the comprehension gap will not be closed by making engineers bigger. It will be closed, if it is closed at all, by reshaping the substrate they navigate. That is the thesis of a research programme I have been building with three European partners, called SYMPHONY, and what follows is the argument for it — what is actually broken, why the two dominant fixes both hit a ceiling this decade, what it means concretely to give a codebase a nervous system, and why finding out requires a combination of disciplines that have almost never shared a project.

Why this argument, and why now

Two structural shifts have arrived at once, and together they make the existing approaches insufficient at exactly the moment Europe can least afford it.

The first is that the statistical ceiling is now visible. For two years the story of machine code understanding has been a story of rising benchmark scores, and the scores are real enough to be seductive. In December 2025 Claude Opus 4.5 became the first model to cross 80% on SWE-bench Verified, at 80.9%. If you stop reading at the headline, you conclude the problem is nearly solved. It is not, and the way it is not solved is the whole point.

The second is that the comprehension gap is widening faster than any tool is closing it, and it is widening underneath infrastructure that now carries regulatory weight. Europe runs on PLC code, robot programs and SCADA configurations that routinely outlive the engineers who wrote them. Telecommunications stacks, electricity-grid SCADA, public-administration legacy systems, rail signalling — all of it dominated by software written across multiple decades and multiple contractors, with the implicit knowledge that once bound it together largely retired or redistributed. And the regulatory frame has caught up to the risk: NIS2, the EU Cyber Resilience Act, and the AI Act for high-risk systems are turning demonstrable understanding of legacy code from an engineering nicety into a compliance prerequisite. You can no longer wave at a forty-year-old control system and call it stable. Increasingly, you have to be able to read it, and prove you can.

So the timing is not opportunistic. The dominant approach is hitting a measurable ceiling exactly as the cost of not understanding our own systems becomes something a regulator can fine you for.

Two families, two ceilings

Machine code understanding today divides into two families. Each is genuinely impressive. Each has a ceiling we can already see from here.

The first family is statistical — the large-language-model agents. They set the public headline numbers, and the headlines are striking. But independent re-evaluation tells a different story than the leaderboard. Work published at ICSE 2025 Companion showed that once you strip out the instances resolved through solution leakage or passed by weak test cases, the measured resolution rate of a leading SWE-agent configuration falls from 12.47% to 3.97%. A parallel result, SWE-Bench+, replicated the effect: SWE-agent 1.0 with Claude 3.5 dropped from 57.6% to 31.8% on Verified once the contaminated instances were filtered out. The number you trust depends entirely on how hard you are willing to look. And underneath the contamination problem sits a harder, more permanent one: context. Industrial codebases routinely run to millions of tokens; production context windows remain in the low hundreds of thousands; retrieval augmentation hands the model local relevance with no system-level coherence. Today's LLM agents are, to put it plainly, pattern-matchers with short memories. They are remarkable at the patch in front of them and structurally incapable of holding a durable, navigable model of the system they are changing. Adding parameters does not give them memory. It makes them more fluent forgetters.

Fig. — What SWE-bench actually measures

Strip the contaminated instances and the headline collapses. The number you trust depends entirely on how hard you look.

The second family is structural — the static analysers (SonarQube, PMD, the inspections built into every serious IDE), the architecture-recovery tools, the ArchiMate-derived knowledge graphs. These do something the statistical family cannot: they capture what is explicitly there. Call graphs. Dependency edges. Type hierarchies. Declared interfaces. They are precise, auditable, and they do not hallucinate. But they recover the what and never the why. The design rationale — the reason a boundary sits where it sits, the context in which a decision was the right one, the constraint that is no longer written down anywhere — is the hardest tier of software knowledge to externalise. It has been recognised as such since Avgeriou and colleagues named it in 2007, and it remains the tier no structural tool can reach, for the simple reason that it was never written into the source in the first place.

So one family knows everything that was said and nothing about why it was said; the other guesses fluently and forgets immediately. Neither is on a trajectory to close the comprehension gap by scaling alone. SYMPHONY's advance is not to improve either family. It is to combine their information content under a different organising principle — one drawn from biology.

The biological turn

The interesting move is not to pick a side. It is to notice that biology solved a structurally identical problem a very long time ago, and to ask what its solution looks like transposed into code.

A brain does not enlarge the neuron to handle a harder task. It keeps a fixed underlying network and reconfigures what that network foregrounds in response to a small, slow, descending signal. Neuromodulation in the mammalian cortex works this way: chemical messengers shift the operating regime of a circuit so that the same anatomy produces qualitatively different, task-appropriate activity. Descending corticospinal modulation in the vertebrate motor system works this way too. In both, a low-bandwidth control signal — nothing remotely like the bandwidth of the network it steers — produces a structured, task-appropriate response from one and the same substrate. The system stores no documentation. It stores knowledge in distributed, layered, context-dependent structure, and it reconfigures that structure on demand.

This is not a metaphor I am reaching for. It is a mechanism that has been formalised. Mei, Muller and Ramaswamy, writing in Trends in Neurosciences in 2022, set out a four-scale framework for integrating neuromodulation into deep networks: modulating hyperparameters, scaling connectivity through plasticity, adjusting the gain of specific neurons, and reconfiguring dendritic computation. They showed, in simulation, that artificial neuromodulation across those four scales buys exactly three things — higher task reward, faster learning, and reduced catastrophic forgetting. Those three properties are not a wish list. They are the precise inverse of the three failure modes that limit code understanding: interference between tasks, the cost of exhaustive re-evaluation, and the need for a memory that persists yet stays responsive to context.

There is a reason I trust this framework, and it is institutional as much as mathematical. The scientist who co-authored it, Sri Ramaswamy, was trained at EPFL inside the Blue Brain Project under Henry Markram — the flagship effort to build a data-driven, biophysically detailed reconstruction of the neocortical microcircuit. The Blue Brain tradition is a specific discipline: you build a network model from anatomical and physiological evidence, not from whatever abstraction is convenient. That discipline — model the substrate from the evidence up — is exactly what a neuromimetic substrate for code will require, and it is not a habit the software industry has.

The same principle shows up in a completely different field, which is what convinced me it generalises rather than being an artefact of one lab's taste. Bruno Siciliano and his colleagues' haptic shared-control programme takes a high-degrees-of-freedom autonomous robot controller and shapes it, in real time, through a low-bandwidth human input — a haptic active constraint. Surgical needle grasping, dual-arm manipulation, teleoperated cutting: qualitatively different behaviours, obtained not by rewriting the controller but by reshaping its operating regime with a thin descending signal. A decade of hardware-validated work — Selvaggio, Pacchierotti, Giordano and Siciliano across RA-L, ICRA, IROS and T-RO — establishes the same architectural property again and again: a small, bounded input producing a large, structured, context-appropriate response from a single underlying system.

Two independent fields, one perceptual-cognitive and one motor-physical, converging on the same answer: you do not scale the network to meet the task. You keep the substrate and modulate it.

What SYMPHONY actually is

SYMPHONY is the attempt to carry that principle into the one domain where, as far as I can find, no one has taken it: software itself.

Concretely, it is a neuromimetic knowledge substrate for software systems — a representation in which the elements of a codebase (modules, functions, data flows, contracts, tests, commit history, design decisions) are nodes in a multi-scale network whose activation patterns are reconfigured, on demand, by task-specific neuromodulatory signals. In plainer terms: a representation of code that behaves less like a document to be re-read and more like a nervous system that foregrounds the structures relevant to the task in front of you.

It is easiest to picture stratigraphically — as a cross-section, not a flowchart. Four layers, stacked, each carrying its own network. At the bedrock, Structure: the modules and functions, the call and dependency edges, everything the static tools already recover. Above it, Behaviour: the tests, the contracts, the data flows — the living signal of what the system actually does when it runs. Above that, History: the commits and the decisions, the patina of time, the record of how the thing came to be shaped this way. And at the top, the warmest and hardest layer to articulate, Rationale: design intent, the why — the tier Avgeriou warned us never makes it into the source. SYMPHONY's first departure is to hold all four in a single graph-resident representation built for activation-based retrieval, not query-based retrieval. Not four tools whose outputs you reconcile by hand at the end of a long afternoon. One substrate.

Down through all four layers runs a thin vertical column — the neuromodulatory signal, the conductor's baton. When you declare a task — localise this fault, assess the blast radius of this change, find the refactoring candidates, onboard a new engineer onto this module — the baton does not rebuild the graph. It changes what each layer foregrounds. A small subset of nodes lights warmly: the subnetwork relevant to this task, across all four strata at once. Everything else rests in cool half-light, present but not competing for attention. The substrate holds one consistent state of the system; the task decides which harmony of it you hear. Same substrate. Different harmonies.

Task baton

One substrate, four layers. The baton doesn't rebuild the graph — it changes what each layer foregrounds. Switch the task and watch a different subnetwork come into the light.

Three advances, stated precisely

It is easy to make a claim like that sound like vapour, so let me be exact about what is new. Three specific advances separate SYMPHONY from everything adjacent to it, and they are the three things the long-term vision minimally requires.

One — multi-layer extraction into a single substrate. Existing pipelines give you either a single view (a static-analysis output, a dependency diagram, an architecture knowledge graph) or a separate document corpus (READMEs, commit messages, issue trackers) handled apart from the code. SYMPHONY unifies the structural, behavioural, historical and rationale layers in one graph built for activation-based retrieval. The substrate is the boundary object: one representation that every later mechanism operates on.

Two — context-dependent activation. No existing representation of a codebase alters its own salience profile in response to your declared task. SYMPHONY's substrate holds a single state of the system but surfaces different subnetworks depending on an externally specified task token, using the four-scale neuromodulatory primitives of Mei, Muller and Ramaswamy as the mathematical template. The codebase stops being a thing you query and becomes a thing that attends.

Three — low-bandwidth task control. This is the advance I care about most, because it is the one that makes the thing governable. Borrowing from the haptic shared-control formalism, SYMPHONY's task interface is deliberately narrow — a small set of scalar modulatory signals, not a prompt window. That narrowness is not a limitation to be apologised for; it is the entire point. A behaviour produced by a handful of bounded, named, scalar controls is composable, auditable, and bounded. It is the precise opposite of the unbounded, ungovernable prompt-response surface of a current LLM agent. It is the property that would let a substrate like this ever be trusted inside a regulated industry — the property a compliance function or a procurement officer can actually read.

Those three advances, in sequence, supply the three things the vision needs: a representation that can hold an industrial-scale codebase coherently, a way to reconfigure it on demand without rebuilding it, and a control surface you can audit. Succeeding at SYMPHONY is not sufficient for the vision. It is the decisive enabling condition for it.

Where the gap actually bites

A thesis about substrates can stay comfortably abstract forever. The reason this one is urgent is that the comprehension gap is not evenly distributed — it concentrates in exactly the places society can least afford to lose the thread. There are four of them, and they are addressable, not hypothetical.

Industrial automation and manufacturing. The European industrial-robotics installed base took on more than 100,000 new units in 2024 alone, on top of a stock measured in the millions, with median installation lifetimes well above a decade. The single largest hidden cost in industrial-automation operations is time-to-diagnose on legacy systems — PLC code, robot programs and SCADA configurations that have outlived their authors. This is not a benchmark; it is a coalface, and one of SYMPHONY's partners works it directly.

Software-intensive critical infrastructure. Telecommunications stacks, electricity-grid SCADA, public-administration legacy systems, rail signalling — software written across decades and contractors, now sitting underneath NIS2, the Cyber Resilience Act and the AI Act. A substrate whose read-outs are auditable to specific code elements is precisely the kind of artefact those regimes can read. Auditable comprehension is becoming a regulatory object, and almost nothing on the market is built to be audited.

Sovereign and human-centric AI tooling. Code-comprehension tooling today is dominated by US-headquartered platforms — GitHub Copilot, Cursor, Claude Code, Replit Agent — trained on extraterritorial pipelines and operated under non-European governance. In June 2025 the European Parliament found Europe "heavily dependent on foreign technologies" in exactly this layer. A credible European alternative, released openly with an EU-jurisdiction governance trail, is no longer just a research curiosity; it is a strategic priority. SYMPHONY is designed to be a building block of a European AI stack, not a competitor to a single US product.

Research-software sustainability. Two decades of EU-funded research have produced an enormous tail of computational artefacts whose long-term maintenance falls disproportionately on early-career researchers — the people least equipped to absorb the cost. A substrate that lowers the activation energy of understanding inherited scientific software pays that debt down where it is heaviest.

Four domains, one shape of problem: systems that have become collectively unreadable, carrying consequences — economic, regulatory, scientific — far out of proportion to anyone's ability to hold them in mind.

Here is the part that makes SYMPHONY a research programme rather than a product roadmap, and it is the part I am proudest of.

The proof of principle requires the genuine integration of four disciplines: the computational neuroscience of cortical neuromodulation; robot manipulation with haptic shared control; software engineering and architecture recovery; and foundation-model engineering with a sovereign, human-centric orientation. The first two are biological and biomechanical. The second two are symbolic and computational. The shortest path between any two of them runs through at least two intermediate fields. It is a deliberately uncomfortable combination — and it is the only combination that supplies what the test demands.

The consortium

Newcastle · Ramaswamy · Computational neuroscience

The four-scale neuromodulation framework — the mathematical template for how a fixed substrate reconfigures on a task signal. Trained in the Blue Brain tradition; also runs the equitable-access user study.

What absence would cost: Without it, no principled model for how activation should change — only hand-engineered routing heuristics.

Newcastle, and the Blue Brain heritage, formalised. Sri Ramaswamy's group contributes the formal four-scale framework for context-dependent reconfiguration of a fixed network — the mathematical template SYMPHONY transposes from cortex to code. Without it, the project would have no principled model for how a substrate's activation regime should change in response to a task, and would fall back on hand-engineered routing heuristics — exactly the incremental tinkering that frontier research funding is not meant to support. Newcastle also runs the human side: a pre-registered, ethics-governed user study, stratified across gender, career stage and native-language proficiency, with a non-inferiority margin built in specifically to catch deskilling before it becomes a deployment pattern.

CREATE / PRISMA Lab, and the haptic formalism, transposed. Bruno Siciliano is one of the founding figures of modern robotics — past President of the IEEE Robotics and Automation Society, co-editor of the Springer Handbook of Robotics. His lab contributes the published, hardware-validated control formalism by which a low-bandwidth descending signal reshapes a high-DOF controller without rewriting it. SYMPHONY's narrow task-control surface is the transposition of that formalism from continuous motor control to discrete symbolic dynamics. Without CREATE, the task interface defaults to a prompt window — the very surface whose unboundedness the project exists to reject.

Real AI, and the engineering capacity to make it real. This is the part I bring. Through Project HOMINIS — Europe's first open-source human-centric LLM, trained on the Leonardo supercomputer at CINECA Bologna under the Italian ISCRA programme, on the order of 14,000 NVIDIA Ampere GPUs — we have built and trained large transformer architectures with documented work on energy-efficient methods directly applicable to SYMPHONY's training regime. Real AI supplies two things at once: foundation-model engineering at scale, and the discipline of making a system's narrow control surface auditable in a way European governance frameworks can actually read. The point of HOMINIS was never only the model. It was the sovereign, human-centric posture — and that posture is exactly what a European code-comprehension substrate needs to be born with.

UP Robotics, and the industrial coalface. A Croatian industrial-automation company contributes the ground-truth view of what a real codebase looks like — not an open-source toy, but a production system with the version drift, the undocumented decisions, and the operator know-how that constitute the actual problem domain. Without an industrial partner, SYMPHONY would be judged only on community code, and any positive result would invite the obvious objection: that it had never been tested where the impact lives.

Each pair of these disciplines has been combined before. Neuroscience and AI in the NeuroAI community. Robotics and neuroscience in the embodied-AI literature. Software engineering and foundation models in the entire LLM-agent line. The four-way combination, aimed squarely at the symbolic, structural domain of source code, has not. That is the bet, stated plainly: that the architectural principles by which biology produces task-appropriate behaviour from a fixed substrate transpose to a non-embodied, non-rhythmic, symbolic domain — when, and only when, they are implemented inside a foundation-model-scale platform and anchored to a real industrial codebase. None of the four disciplines, alone or in any pair, can run that experiment.

The honest uncertainty, and how we will know

I would not trust this essay if it had no place where it says what could be wrong, so here it is, and it is not small.

Multi-scale neuromodulation is demonstrated in perceptual and motor domains — domains of continuous signals and embodied feedback. Source code is none of those things. Its signals are discrete, hierarchical, linguistic. The single critical uncertainty in the whole programme is whether the biological principle transfers across that gap — whether a mechanism that evolved for continuous sensorimotor control does anything useful when the substrate it modulates is a symbolic graph of a software system. This is not a question of engineering polish. It is a question of whether the principle generalises, and it is allowed to come back "no."

What makes it a research programme and not a manifesto is that the question is built to be settled. SYMPHONY is organised around five objectives over thirty-six months, each with a single numerical threshold, a verification method, a responsible partner, and a decision milestone — and each with a documented fallback if the threshold is missed.

Build the four-layer extraction pipeline and clear ≥90% function coverage and ≥80% inter-module dependency coverage on two demonstrator codebases — a large open-source system from a shortlist like the Linux kernel, PostgreSQL or Kubernetes, and a production industrial codebase from UP Robotics (decision milestone M12). Implement the neuromodulatory reconfiguration and show statistically significant, task-appropriate subnetwork activation across at least three engineering task classes — localisation, impact analysis, refactoring-candidate discovery — at F1 ≥ 0.6 against expert ground truth, p < 0.01 (M18). Derive the narrow scalar control interface and demonstrate task-switching under 500 milliseconds while preserving the underlying state at cosine similarity ≥ 0.95 over a hundred or more trials (M24). Benchmark the whole substrate, on a pre-registered protocol, against three named baselines — a frontier LLM agent, a best-in-class static-analysis-plus-knowledge-graph pipeline, and an LLM-with-retrieval baseline — and clear it by ≥20% on task-relevant-subgraph recovery and ≥15% on expert-rated actionability, with the margins declared in advance and a three-person external panel adjudicating (M30). And — the objective I am least willing to drop — run that equitable-access user study with at least sixty engineers, and show a significant reduction in time-to-first-correct-change for under-represented strata without a deskilling cost to their retention thirty days later (M33).

Each of those has an alternative written next to it for the case where it fails — drop the rationale layer to a lightweight classifier; retreat to single-scale neuron-level modulation; substitute a fixed-vocabulary gating module; narrow the claimed scope to OSS and document the industrial-transfer gap as a finding. A negative result at any milestone is not abandonment. It is a publishable scientific finding. That is the difference between a hypothesis you can stand behind and a story you merely like.

The substrate you can govern

There is one more reason the third advance — the narrow, auditable control surface — matters more than it might first appear, and it is the reason this is a European project and not just an interesting one.

Everything SYMPHONY produces is fundamental research at low technology-readiness; it does not deploy a regulated AI system. But it is built, from day one, to be read by the governance regime that is arriving anyway. The substrate is graph-resident and activation-based: every retrieval is traceable to specific nodes, edges and modulatory scalars. The control surface is narrow enough to audit. The outputs are released openly — the extraction pipeline, the reconfiguration module, the control library, the benchmark harness, all under permissive licences — with an EU-jurisdiction governance trail and a documented redistribution policy. The category it seeds does not exist on the market today: an auditable code-comprehension substrate, structurally distinct from both the LLM-agent platforms and the legacy static-analysis tooling, defined by three properties delivered together — persistent task-adaptive memory of a real codebase, a control surface compliance functions can read, and an open governance trail.

That combination is the point. It is the difference between a tool that happens to be useful and a building block a continent can actually adopt — at exactly the moment the political and regulatory frame is asking for one.

Same substrate, different harmonies

The comprehension gap is real, it is structural, and the instinct that has carried us this far — make the engineer bigger, give them more tools, more context, more model — is the one instinct that cannot close it, because it scales the wrong term in the equation.

The alternative is older than software. Keep one substrate. Let a small, slow, auditable signal decide which part of it stands in the light. Let the codebase behave like a nervous system instead of a library. It may not transfer from the brain to the graph — that is the honest uncertainty, and it is the thing the next thirty-six months are built to resolve, with five pre-registered milestones and four disciplines that have never before sat at the same table. But if it does transfer, the way out of the gap was never to read faster.

It was to close the gap not by enlarging the engineer, but by reshaping the substrate they navigate.

(A note for anyone who arrived from the Set Piece, which ends on a second curve — energy. There is a hardware face to "reshape the substrate," measured in joules per thought rather than tasks per substrate. But that is a different substrate and a separate project of mine, MEMPHIS, and conflating the two would do justice to neither. This essay is about the knowledge substrate, not the silicon. The instinct rhymes. The work is separate.)

Same substrate. Different harmonies.

Go deeper

The interactive proposal → — the SYMPHONY microsite: the planispheric substrate, the four neuromodulatory scales side by side, and the consortium with primary-source citations.
The Set Piece: The Comprehension Gap → — the scroll-driven feature this essay expands, scene by scene.
The full dossier → — the EIC Pathfinder Additional Information document, in print form.
MEMPHIS → — the separate hardware-substrate project the closing nods to.

Adjacent dispatches on the sovereignty thread: Three flags, one substrate and Brussels turned sovereignty into procurement law.