Most design systems work fine inside Figma and unravel the moment they cross into code. The same shade of blue appears as `#2783ED` in one component and `rgb(80, 158, 248)` in another. A spacing scale defined as 4-8-12-16 in the design file shows up in CSS as `0.25rem`, `0.5rem`, `0.75rem`, `1rem`, but also as `4px`, `8px`, and a stray `0.4em` someone wrote in a hurry.
Design tokens solve this — but only when they're treated as a versioned contract between design and engineering, not as a documentation artifact. The teams that get the most out of tokens treat them with the same discipline as an API.
What 'contract' means in practice
A design token is a contract if three things are true. First, both teams agree on the source of truth — there's one place tokens are defined, and changes anywhere else are bugs. Second, both teams agree on the naming. `color.brand.primary` means the same thing in Figma and in code. Third, changes go through versioning. You can't rename a token without considering what it breaks, the same way you can't rename an API endpoint without considering who calls it.
Most teams stop at the first one. They publish tokens once, use them inconsistently, and call it a system. The discipline of treating tokens as an API — with deprecation paths, change reviews, and impact analysis — is what makes them load-bearing instead of decorative.
The structure we settled on
We group tokens into three tiers. Primitives are the raw values: `color.blue.500 = #2783ED`, `space.4 = 16px`. Semantics are the meaningful aliases: `color.brand.primary = color.blue.500`, `space.section = space.16`. Components consume only semantics, never primitives. This three-tier structure means we can rebrand by changing the semantic layer, retune the palette by changing primitives, or both — without rewriting components.
The tooling is boring on purpose. We export from Figma to a JSON file, run it through Style Dictionary, and emit CSS variables, JS/TS exports, and a Figma library file in one build step. The export runs in CI on every merge. The contract is enforced because the build fails if a token is referenced that doesn't exist.
The hardest part is renaming
Tokens go wrong at renaming time. Designers want better names, engineers want stable ones. The compromise we've found works: token names are append-only by default. A rename creates a new token, marks the old one deprecated with a target removal date, and we hold the old name for at least one major release.
This is annoying. It's also the only way to maintain the contract. Every team that's tried 'just rename the tokens, we'll fix the code later' has ended up with broken hand-offs and a design system the engineers stopped trusting. The two-name period costs nothing. The trust, once lost, costs months to rebuild.
When tokens stop being enough
Tokens handle color, spacing, type, radii, and motion durations well. They handle component-level decisions badly. A button has more state than tokens can express — focus rings, disabled treatments, loading states, icon spacing rules. These belong in components, not tokens.
The trap is to keep adding tokens until they're a worse component library. The discipline is to know when to stop: tokens describe the alphabet, components describe the words. If you find yourself adding `color.button.primary.hover.borderTopRadius`, you've crossed the line. Take it back to the component.
Want this kind of work on your project?
One conversation, honest pricing, recommended engagement model — within a business day.
Keep reading
All articlesHow we estimate software projects (and why most estimates are wrong)
Every estimate is wrong. The useful ones are wrong in known ways. A practical method we've refined across 50+ engagements — and the failure modes we learned to spot first.
The unreasonable effectiveness of plain SQL
ORMs are fine. Until they aren't. After a decade of building data layers on every framework we could find, we keep coming back to plain SQL — and the patterns that make it pleasant.
Why we don't write microservices anymore
We split into microservices because it was 2019. Then we spent three years feeling the cost. Here's what we learned, and how we structure backends in 2025.