All posts
Perspective

Smarter Models Won't Save Your Codebase

Your AI tools worked better six months ago. The models didn't get worse — your codebase got bigger. A note on the wall every team deploying agents eventually hits, and what we've been building for a year to go around it.

Matt Walters

There’s a wall every engineering team using agents eventually hits. It doesn’t look like a wall — it looks like slightly worse output, month over month. The model you’re using hasn’t regressed. The codebase you’re asking it about has grown.

This is the shape of the problem every team deploying AI agents is running into, whether they’ve named it or not. As your project accumulates history — architectural decisions, bugs that got fixed, specs that described intent, conventions the team agreed on — the value of the information that isn’t in the files the agent can see grows faster than the files themselves. Your agent has access to your source code and not much else. Source code, by itself, is the smallest useful fraction of what you actually know about your system.

The constraint shifted while we weren’t watching. It used to be model intelligence. It isn’t anymore. It’s model grounding — what the model knows when it shows up. A merely-competent model with rich, structured context of your system will beat a state-of-the-art model with nothing but your codebase, every time. And the gap compounds the longer your project lives.

THE BOTTLENECK SHIFTED

The limit on agent performance used to be the model. Today, on any non-trivial codebase, it's the quality of the context the model has access to. Smarter agents without better grounding hit diminishing returns fast.

The ladder of workarounds

Every team trying to close this gap has tried at least two of these.

Dumping it all into the prompt. Works until it doesn’t. Context windows grew; the quality of attention inside a packed window didn’t. An agent drowning in irrelevant context performs worse than one given exactly what it needs.

CLAUDE.md and friends. An improvement — now there’s some structured guidance for the agent. But it conflates knowledge with code, bloats the repo, and breaks the moment you want to share insight across two codebases.

A separate git repo for docs. Better in principle. But git was built for source code, and the assumptions that make it good at source code — branch-per-feature, merge conflicts, full clone — don’t survive contact with knowledge. You don’t branch a belief. You don’t merge two competing understandings of system architecture. Your agent still has to clone the entire history of an organization’s worth of documents to ask one question.

Notion or Confluence. Human-browsable. But an agent doesn’t browse — it retrieves — and none of these tools were built for retrieval access patterns. No commit primitive, no provenance, no graph that an agent can traverse.

Every approach is a workaround. None of them is infrastructure your agents can actually stand on.

The industry’s answer is more AI. That’s the wrong answer.

Almost every tool shipping agent-memory features this year has responded to the context wall the same way: by adding another AI layer. Vector stores that guess which chunks are relevant. Semantic search that approximates what a call site means. LLM-powered question-answering about your codebase. Memory APIs that are themselves probabilistic retrieval systems.

Each of these adds a guess on top of a guess. Your agent, which was already making probabilistic decisions about code, is now relying on another probabilistic system to retrieve the right context. When it gets the answer wrong, you can’t tell whether the retrieval failed or the reasoning did.

GitKB takes a different position. The memory layer for an AI agent should not be another AI. It should be a graph.

Your AI agent's memory shouldn't be another AI. It should be a graph.

That has three specific meanings in practice:

Deterministic code intelligence, not semantic. GitKB indexes your code with tree-sitter at the AST level, across seventeen languages, and builds a real call graph. When your agent asks “who calls this function?”, the answer is not a ranked list of plausible candidates — it’s the actual set of call sites, every one of them, with line numbers. callers, callees, impact, dead_code are tools that return facts, not guesses.

Typed documents with real graph edges, not flat memory. Tasks, incidents, specs, architectural decisions — each is a typed document with frontmatter declaring its relationships. Edges are declared, traversable, and part of the canonical record. Not a vector similarity score. When your agent traces from a bug to the incident that investigated it to the commit that fixed it, the path is explicit, not inferred.

A commit-based sync protocol, not a cloud database. Your knowledge base lives on your machine. It syncs to other machines — or to a private cloud, or to your organization’s on-prem deployment — through a protocol that is sparse, auditable, and offline-first. Your documents are Markdown files. You can pull them and take them with you at any time.

What this buys you

The concrete outcomes when a team adopts this.

Your agent stops starting cold. In a GitKB-aware session, the first thing the agent does is orient itself — the active task board, prior incidents relevant to the topic, the three or four specific documents that matter for the task at hand. Not dumped into the prompt. Loaded on demand, scoped precisely. The agent’s first action is informed instead of exploratory, and you feel the difference immediately.

Your agent stops reinventing the answer to every question. Where does this function get called from? Which tests cover it? What’s the blast radius of changing this struct? What decision led to this architecture? All of these used to be thirty-minute investigations that consumed tokens and turns. They are now millisecond queries against a graph. Your agent spends its attention on the problem, not on reconstructing context.

Your team’s institutional memory stops dying. Every incident, every task, every architectural decision becomes a document that is part of the protocol — not an afterthought, not a ritual that depends on discipline. When an engineer leaves, they do not take the why with them. When a new engineer onboards, they do not have to reconstruct tacit knowledge one Slack thread at a time.

Your AI adoption stops being chaotic. Instead of each engineer maintaining their own .cursorrules, each agent starting from a different mental model, and each editor having its own memory layer, there is one protocol every agent follows — recorded in your own repository, version-controlled, auditable. GitKB speaks to Claude, GPT, Gemini, local models, and whatever ships next through MCP. You are not betting on a single model or a single vendor.

WHAT CHANGES

Your agents start oriented. Your code graph answers in milliseconds. Your team's institutional knowledge compounds across sessions, engineers, and years. Your AI adoption is governed by one protocol instead of scattered across editors, models, and conventions.

What this feels like at a desk

Here is the difference an engineer actually experiences.

Without GitKB, you describe a bug to your agent. The agent greps the code, picks a plausible culprit, makes a plausible-looking change, commits it. A week later the real bug surfaces somewhere else, because the fix addressed a symptom and the actual root cause was documented nowhere the agent could see.

With GitKB, you describe the same bug to the same agent. Before it touches any code, it runs git kb search "checkout timeout":

SLUG                                   TYPE      TITLE
incidents/inc-012-checkout-latency     incident  Slow checkout during address validation
tasks/move-address-validation-async    task      Move address validation off the critical path
specs/checkout-latency-budget          spec      Checkout latency budget and backpressure

Three documents. Collectively: the whole story. The incident describes what happened when this was first reported. The task proposed the fix. The spec recorded the decision. The agent reads all three, implements the correct fix, and commits a record that ties everything together. Future agents — and future engineers — inherit the whole trail.

Same model. Same prompt. Two completely different outcomes. The difference is grounding.

What this feels like at the organization level

If you’re running an engineering organization, the case reads slightly differently.

Your engineers already have AI tools. The question isn’t whether to adopt AI — that happened. The question is whether the AI your engineers use is operating on compounding knowledge or is starting from zero every session. Teams that invest in knowledge infrastructure will pull ahead of teams that don’t, for exactly the same reason teams with good documentation have always outperformed teams with scattered context.

Concretely, what changes at the org level:

  • Onboarding gets faster. New engineers and new agents query institutional memory directly, instead of being walked through it by a senior engineer whose time is the real bottleneck.
  • Key-person risk drops. Context doesn’t evaporate when someone leaves. The why behind your architecture lives in the knowledge base, not in one engineer’s head.
  • Compliance gets cheaper. Every agent action is attributed and auditable. For regulated industries, this is not optional — and it is dramatically more expensive to bolt on later.
  • Model independence. GitKB isn’t locked to any provider. When the AI landscape shifts — and it will — your knowledge infrastructure doesn’t need to be rebuilt.
  • Wrong turns drop measurably. Agents that can read the history of what was already tried stop repeating investigations and stop proposing fixes that were already rejected for good reason. Velocity compounds.

Local-first means your knowledge stays on your machines. On-prem deployment is there for regulated industries. Open-core licensing means the price doesn’t scale with vendor lock-in — it scales with how much you use it.

FOR ENGINEERING LEADERS

Teams that invest in AI-usable knowledge infrastructure will pull ahead of teams that don't, for the same reasons teams with good documentation always have. The difference now is that knowledge infrastructure is tractable — and the cost of not having it is paid in your agents' output every single day.

The pattern, named

Every major tool in this space has shipped a memory feature in the last few months. Each one is different. Each one is per-vendor. None of them are interoperable, none of them are deterministic, and none of them cover knowledge and code and distributed team collaboration at once. The agent-memory category is fragmenting while it forms.

Against that backdrop, Andrej Karpathy recently published a short gist called LLMWiki, describing a clean pattern: persistent markdown files, maintained by an LLM, sitting in a git repo, with an AGENTS.md file teaching the agent what the files mean. Files are canonical. The LLM is the writer. The human is the curator.

It is a sharp piece of thinking, and it has given the concept a name the community now recognizes.

The pattern Karpathy describes is, at core, the same thesis we have been building against for a year. Where we have gone further — and where any serious attempt at this eventually has to go — is on the word protocol. A pattern is a convention two or three engineers can practice by force of will. A protocol is infrastructure a whole organization can rely on without anyone having to remember to practice it.

Individuals scale on patterns. Organizations scale on protocols.

The long-form argument for why the specific design choices we made — single branch, sparse sync, projected knowledge graph, code as first-class graph edges — are the correct choices is already published: From Context Engineering to Knowledge Engineering, a month before the Karpathy gist — to what was then our private pre-alpha audience. My, how the world has changed. If you want the full case, start there.

Try it

You can feel the difference in about a minute. Install the CLI and run git kb init in any repo.

Then, next time you describe a bug or start a feature, ask your agent to create an incident or task document first and commit it before writing code. In your next session, watch what the next agent does before it touches anything.

If you’re a team deciding whether to roll GitKB out at scale, join the Alpha. We’re onboarding engineering organizations that want sync, org-wide graphs, and on-prem options.

If you want to talk to the people building this, Discord is where we are.


The future of AI-assisted engineering is not another layer of AI. It is Knowledge Engineering — the discipline, and the protocol that makes it tractable for humans and agents alike. We have been building that protocol for a year.

— Matt Walters