Case studyReliance Jewels · Design System

A design system that scores itself, and that a machine can build from

I rebuilt the Reliance Jewels design system end to end. Before drawing a component, I built a ruler: I scored the live product against its own brand promise, surface by surface. The most important screen in jewellery e-commerce scored 1 out of 5. That score is the case study.

Role
Product Designer: system + method, self-directed
Surface
RJ e-commerce, web, Vue 2 / Fynd
Scope
Audit → 39 components → eval method → AI layer
Format
.less tokens + live docs + Figma + manifest
The product detail page as it shipped. On the trust ruler, it scored 1/5.
The product detail page as it shipped. On the trust ruler, it scored 1/5.
The 30-second version
Problem
The design system was a Figma UI kit with no reasoning behind any value, and the live product had drifted past trust: the most important page in jewellery e-commerce scored 1/5 against the brand's own promise.
My move
Three audits first (Figma, product, engineering), then a rebuild: 39 components with spec, code, docs and Figma in parity, 1346 tokens in three tiers, all policed by an automated quality gate.
Outcome
39/39 components pass the gate, zero raw hex anywhere, and the system is machine-readable: a generated manifest plus hard constraints an AI can assemble new screens from without drifting.
Learning
Wire the gate into CI from day one, and pressure-test the AI layer on a genuinely new surface: the correction rate is the real measure of whether the system works.
Most designers have taste, but no ruler. I built the ruler first.

The premise of the whole project

02Context

Two broken things to fix, not one

Two things were broken: the design system, and the product it was supposed to be holding together.

A jewellery store carries unusual weight online. Nobody impulse-buys a ₹2 lakh necklace: they research, visit a showroom and come back a week later. The site's job is to extend the showroom, and to never once feel cheaper than it.

I had worked on these surfaces and watched them drift, one screen at a time. So I treated this as a self-directed rebuild: not a brief handed to me, a problem I could see and decided to solve properly.

The second growth engine, mid-drift: the Gold Savings calculator as it shipped, a bank form with a default browser slider
The second growth engine, mid-drift: the Gold Savings calculator as it shipped, a bank form with a default browser slider
03Problem

The old system couldn't answer “why”. The product had drifted past trust.

The system couldn't justify a single value, and the flagship page scored 1 out of 5 on trust.

The existing system was a Figma UI kit, not a system. Audited component by component, the pattern was the same everywhere: a value with no reason behind it. The button had one of its five variants documented and zero interaction states. The icon “spec” was a screenshot of a plugin. Ask “why is the primary maroon the primary maroon” and the system had no answer.

Then I walked all nine live surfaces and scored each on two passes, functional and emotional. On a trust product, the emotional score is the one that matters.

123
  1. 1SKU code (“18K-S-RING”) shown where the product name belongs.
  2. 2A price-breakup table where every value is a dash.
  3. 3Buy Now and Add to Cart hierarchy inverted. No occasion, no story.
Scored 1 / 5
  • PDP: SKU codes as names, dashes for prices
  • My Account: two different layouts on one site
  • GSS calculator: a bank form with a default browser slider
Why it matters
  • On a jewellery store, inconsistency reads as carelessness
  • Carelessness is the one thing a lakhs-spending buyer can't forgive
  • The catalogue of failures became the build priority order
My Account, layout A
My Account, layout A
My Account, layout B. Same site, different URL
My Account, layout B. Same site, different URL
04Process · the bet

Three audits, not a moodboard

Three audits decided everything: the architecture, the priority order and the token format the build actually consumes.

Figma audit told me the first job was an architecture, not a reskin. Product audit gave me the priority order and the scoring rubric. Engineering audit decided the token format: Vue 2 on the Fynd engine compiles Less at build time, so the system had to be Less variables, because that is what the build actually consumes.

The bet underneath all of it: every value has to earn its place. If you cannot say why, it does not ship.

That one rule is what later made the system machine-readable.

✦ Creative to addThe three audits: Figma, Product and Engineering, and what each decidedProcess diagram or the audit scorecards · 4:3
05Decisions

Documentation as a function of code

Docs are generated from the code so they cannot drift, and no component ships a value it can't justify.

Decision 01

Three-tier tokens, no exceptions

Primitive holds raw values, semantic holds roles, component holds per-component tokens that reference semantics. Components never touch primitives. The non-negotiable rule: no hex, no raw pixel, no hard-coded string in any component file.

Decision 02

Generate the docs from the code, so they can't drift

A design system dies the day its docs drift from its code, so I removed the possibility. The .less is the single source of truth: a build script resolves the alias chain and emits the JSON and CSS that drive a live docs site, and Figma variable names mirror the .less names. Change the .less and everything follows. Hand-maintained docs always lose; generated docs stay honest with zero discipline required.

Decision 03

One vocabulary for “selected”

“Selected” now has exactly one language across the whole system. No component invents its own. The drift I started the audit with is now structurally impossible.

06Decisions · the gate

39 components. Each one is not done until it passes.

Every component ships four artefacts in parity: a markdown spec, a .less token file, a live interactive page, and a tokenised Figma component. A quality gate, npm run eval, checks each for the required spec sections, a complete state matrix, strict token discipline and a live page. Consistency stopped being a matter of vigilance and became a matter of CI.
39
Components, spec + code + docs + Figma
39
Of 39 passing the eval gate
1346
Tokens across three tiers
0
Raw hex in any component file

The before/after is concrete. The button went from one documented variant and no states to five variants across six states, fully tokenised. Icons went from a plugin screenshot to a 40-plus set with an engineering migration map. Every change carries its reason in the doc.

The docs are a live site, not Figma pages that drift. This is the button component, rendered from rj-tokens-button.less.

Live button documentation page
07Proof

Two bars every screen has to clear. I published my own failing score.

I scored my own flows on the same bars and published the 2.4, with the fix plan attached.

UX bar, run first
  • Feature parity against real product requirements
  • Task-flow soundness, every state designed
  • Cognitive load, microcopy, WCAG 2.2 AA
  • A missing P0 feature is an automatic fail
Luxury bar, run second
  • Restraint, hierarchy, type craft, colour discipline
  • Material, depth, motion, imagery, finish
  • Would this sit beside Cartier without apology?
  • Average below 4.3 and it goes back

Components are the vocabulary. Flows are the sentences. I rebuilt real Reliance Jewels journeys entirely from documented components, on the phone where most customers are. Then I scored my own flows on the same two bars and published the scorecard with the weak numbers in it: the PDP flow scored 2.4 on UX. I wrote the fix plan in priority order rather than hiding the gap.

The part I'm most willing to show a hiring manager is the scorecard that proves I can't quietly lower my own bar.

✦ Creative to addThe flow scorecard, published with the PDP flow scoring 2.4 on UXThe two-bar scorecard, or the rebuilt phone journey · 4:3
08Proof · the multiplier

A system a machine can build from, without drifting

Because every atom is reasoned, named and gate-enforced, the system is legible to a model. I delivered that as three artefacts, then closed the loop on the same gate that polices my own work.
manifest.json

The whole system in one generated file: token groups, all 39 components with problems, variants, states and rules. Regenerates on every build, so it never drifts.

11 hard constraints

Explicit, checkable rules. Most are enforced by the same quality gate.

Generation prompt

Tells a model to reuse before creating, use only tokens, honour the one active language, and flag a missing token instead of inlining a literal.

manifestmodel assemblesnpm run evalproof it stayed in the system
I did not automate design. The machine assembles; the human still decides what is worth assembling and whether the result is good.
09Outcome

A complete system, safe to evolve, legible to a machine

A complete, gate-enforced system a machine can build from, honest about not yet being rolled out.

4
Artefacts in parity per component: spec, code, docs, Figma
40+
Icons, with an engineering migration map
11
Hard constraints a model must honour
1
Generated manifest that assembles new screens
8/10 polish, 10/10 fundamentalsFigma deferred to a later batchDocs as a site, not Figma pagesShipped my own 2.4

The hard calls were deliberate. I held tokenisation, state matrices, accessibility and the active language to a non-negotiable bar, and accepted “good enough” on visual flourish so the project could finish. Knowing where not to gold-plate is part of the job.

Honest on adoption: the system is built, not yet rolled out across the live store. That is the next phase, not a claim I am making here.

10Learning

What I'd do differently

Wire the gate into CI from day one, and measure the AI's correction rate on a genuinely new surface.

If I ran it again, I would wire the gate into CI on the product repo from day one, so a component that breaks token discipline cannot merge. I would run an adoption pass surface by surface against the live store. And I would pressure-test the AI layer harder: have it assemble a genuinely new surface end to end and measure how much I have to correct. That correction rate is the real measure of whether the system works, and I want the number.