Domain-Specific Encoding

Your Codebase Is a Pidgin

The same compression dynamics that govern shop floor language govern your software

In the previous post, I argued that shop floor jargon functions as a compression algorithm — "wing plates" and "head frames" encoding pages of documentation into a few syllables. The compression emerges naturally through repeated practice, and it works because everyone shares the codebook.

The same thing is happening in your codebase. You just don't call it that.

The Vocabulary of a Codebase

Every mature codebase develops its own language. Not just naming conventions — a whole conceptual vocabulary:

Domain objects that don't exist in the real world but structure everything (UserSession, JobContext, RenderPipeline)
Verbs that mean something specific here (hydrate, bootstrap, normalize, reconcile)
Architectural metaphors baked into the structure (stores, reducers, sagas, handlers)

When a developer says "just hydrate the context before you hit the resolver," they're doing exactly what the fabricator does with "wing plates." They're using compressed vocabulary that assumes shared understanding.

This is efficient. It's also a trap.

The Onboarding Problem Is a Decompression Problem

New developers don't struggle because the code is hard. They struggle because they don't have the codebook.

I've watched senior engineers explain a system by rattling off terms — "the scheduler picks up jobs from the queue, validates against the manifest, then dispatches to the appropriate handler" — while the new hire nods along, understanding maybe 30% of what's being communicated. The words are English. The concepts are local.

The mismatch is invisible to the explainer. They've internalized the compression so deeply they've forgotten it exists. To them, "manifest" just means the thing it means here. The fact that it means something different (or nothing) elsewhere doesn't register.

This is the same dynamic I described with shop floor pidgins: the compression works because both parties share the codebook, and it fails silently when they don't.

Naming Is Compression Design

Every time you name something in code, you're making a compression decision.

A good name packs maximum meaning into minimum characters. userAuthToken tells you what it is, what domain it belongs to, roughly what it does. A bad name either compresses too little (data, temp, x) or compresses the wrong thing (a name that made sense three refactors ago but now misleads).

But here's the interesting part: there's no objectively correct compression. The "right" name depends on who's reading and what codebook they carry.

Consider: - useSWR — meaningless unless you know the library - handleSubmit — generic, but universally understood - reconcileFrobnicator — compressed for insiders, opaque to everyone else

The React ecosystem has evolved its own dense pidgin (useEffect, useMemo, useCallback) that experts parse instantly but newcomers find bewildering. This isn't a failure — it's compression working as intended. The question is whether the compression serves the actual population of readers.

Architecture Is a Compression Scheme

Zoom out from names to structure. Your architecture is also a compression decision.

When you create an abstraction — a service, a utility, a shared component — you're saying "this pattern repeats often enough that we should give it a name and a place." You're compressing.

When you inline something instead of abstracting it, you're saying "the overhead of maintaining a shared concept isn't worth it here." You're choosing explicitness over density.

The debates that consume engineering teams — microservices vs. monolith, DRY vs. WET, abstraction vs. duplication — are really debates about compression. How much do we collapse into shared concepts? How much do we keep explicit?

And just like with shop floor pidgins, the answer depends on:

Frequency: Does this pattern actually repeat enough to justify the abstraction?
Stability: Will the boundary of this concept stay stable, or will it shift and cause confusion?
Transmission: Can new people learn this abstraction, or is it too arbitrary?

I've seen codebases with beautiful abstractions that nobody uses because they failed criterion 3. The compression was elegant but couldn't survive transmission to new team members. So people route around it, writing explicit code because they can't trust the compressed version.

The Legacy System Is a Dead Language

Here's where it gets dark.

Legacy systems aren't just old code. They're codebases where the pidgin has lost its native speakers. The compression remains, but the codebook is gone.

You stare at a function called processLegacyBatch and wonder: what made it "legacy" when it was written? What's a "batch" in this context? The name made perfect sense to someone in 2009. Now it's archaeology.

Comments don't help as much as you'd hope, because comments are also written in the pidgin of their time. "Handles edge case from client migration" — what client? What migration? The comment compressed context that no longer exists.

This is the same problem as an expert retiring from the shop floor. The language persists. The meaning doesn't.

Deliberate Compression

So what do we do with this?

A few principles I've been trying to apply:

Name for the reader, not yourself. The right compression level depends on who will read this code. Internal tooling used by three people can be dense. Public APIs should be explicit. Match the compression to the audience's codebook.

Document the codebook, not just the code. The most valuable documentation isn't "what does this function do" but "what do these terms mean here." A glossary of local concepts pays dividends for years.

Watch for pidgin fork. When teams split or codebases diverge, the shared vocabulary starts to drift. The same word means slightly different things in different places. This is where bugs hide — not in the logic, but in the mismatched assumptions encoded in language.

Decompress before you compress. Before adding a new abstraction, write out explicitly what it does. If the explicit version isn't that much longer, maybe you don't need the compression. The abstraction should earn its place by genuinely saving cognitive load.

The Deeper Pattern

What I keep coming back to: compression is never free. Every abstraction, every shared concept, every piece of jargon buys you density at the cost of accessibility.

Shop floors solve this through apprenticeship — years of exposure until the pidgin becomes native. Codebases try to solve it through documentation, but documentation is often written by people who've already internalized the compression and can't see what needs explaining.

Maybe the answer is to treat the pidgin itself as an artifact worth examining. Not just "what does the code do" but "what language did we invent here, and why, and who can speak it?"

Your codebase isn't just logic. It's a dialect. And dialects have histories, boundaries, and failure modes that pure technical thinking tends to miss.

Previously: Wing Plates and Head Frames — how shop floor language compresses centuries of knowledge.

Next: The CAD File Is a Pidgin Too — configurations, assemblies, and the information density wars.

Published: January 02, 2026