Your Codebase Is a Pidgin
Domain-Specific Encoding
Your Codebase Is a Pidgin
In the previous post, I argued that shop floor jargon is a compression algorithm — "wing plates" and "head frames" encoding pages of documentation into syllables. The compression emerges through repetition. It works because everyone shares the codebook.
Your codebase does the same thing. You just don't call it that.
The Vocabulary
Every mature codebase invents its own language. Not naming conventions — a whole conceptual vocabulary. Domain objects that don't exist in the real world but structure everything: UserSession, JobContext, RenderPipeline. Verbs that mean something specific here: hydrate, bootstrap, normalize, reconcile. Architectural metaphors baked into the bones: stores, reducers, sagas, handlers.
"Just hydrate the context before you hit the resolver." That's the developer equivalent of "were the wing plates true." Dense, efficient, opaque to outsiders.
This is efficient. It's also a trap.
Onboarding Is Decompression
New developers don't struggle because the code is hard. They struggle because they don't have the codebook.
I've watched senior engineers explain a system by rattling off terms — "the scheduler picks up jobs from the queue, validates against the manifest, then dispatches to the handler" — while the new hire nods along, catching maybe 30%. The words are English. The concepts are local.
The mismatch is invisible to the explainer. They've internalized the compression so deeply they've forgotten it exists. "Manifest" just means the thing it means here. The fact that it means something different — or nothing — elsewhere doesn't register.
Same dynamic as the shop floor. The compression works because both parties share the codebook. It fails silently when they don't.
Naming Is Compression Design
Every time you name something in code, you're making a compression decision. userAuthToken packs what it is, what domain it belongs to, roughly what it does — into three words. data packs nothing. A name from three refactors ago actively misleads.
There's no objectively correct compression. The "right" name depends on who's reading and what codebook they carry.
useSWR — meaningless unless you know the library. handleSubmit — generic, universally understood. reconcileFrobnicator — compressed for insiders, noise to everyone else.
React's entire API is a dense pidgin. useEffect, useMemo, useCallback — experts parse them instantly, newcomers find them bewildering. This isn't a failure. It's compression working as intended. The question is whether it serves the actual population of readers, or just the people who designed it.
Architecture Is a Compression Scheme
Zoom out. Your architecture is also a compression decision.
When you create an abstraction, you're saying "this pattern repeats enough to deserve a name and a place." When you inline instead, you're saying "the overhead isn't worth it." DRY vs. WET. Microservices vs. monolith. Abstraction vs. duplication. These debates are about compression. How much do we collapse into shared concepts? How much stays explicit?
Same rules as the shop floor:
- Frequency — does the pattern actually repeat enough?
- Stability — will the boundary hold, or shift and confuse?
- Transmission — can new people learn this, or is it too arbitrary?
I've seen codebases with beautiful abstractions that nobody uses. The compression was elegant but couldn't survive transmission. People route around it, writing explicit code because they can't trust the compressed version. The abstraction becomes a dead language inside a living system.
Legacy Systems Are Dead Languages
This is where it gets dark.
Legacy systems aren't just old code. They're codebases where the pidgin has lost its native speakers. The compression remains. The codebook is gone.
You stare at processLegacyBatch and wonder: what made it "legacy" when it was written? What's a "batch" here? The name made perfect sense to someone in 2009. Now it's archaeology.
Comments don't help as much as you'd hope. Comments are written in the pidgin of their time. "Handles edge case from client migration" — what client? What migration? The comment compressed context that no longer exists.
Same problem as an expert retiring from the shop floor. The language persists. The meaning doesn't.
The Deeper Pattern
Compression is never free. Every abstraction, every piece of jargon buys density at the cost of accessibility.
Shop floors solve this through apprenticeship — years of exposure until the pidgin becomes native. Codebases try documentation, but documentation is written by people who've already internalized the compression and can't see what needs explaining. They're decompressing for an audience they can't model because they can't remember not knowing.
Maybe the answer is to treat the pidgin itself as an artifact worth examining. Not just "what does the code do" but "what language did we invent here, and why, and who can speak it?"
Your codebase isn't just logic. It's a dialect. Dialects have histories, boundaries, and failure modes that pure technical thinking misses entirely.
Previously: Wing Plates and Head Frames Next: The CAD File Is a Pidgin Too