AI Illustration System

Designing and building a scalable AI image generation system for use in a client app — from prompt architecture to working tool

Summary

I led the design and build of a custom AI illustration system capable of generating 150–200+ high-quality, on-brand illustrations at scale — a brief that would traditionally require an illustration agency and months of production time.

I designed a multi-step prompt architecture separating content description from style, built a structured variant system for colour and composition, and built the entire orchestration platform myself in Lovable — without writing a line of code. The first client review returned a 74% strong-or-acceptable rate across 54 illustrations, well ahead of internal expectations. The system is now in full-scale production.

Role

Lead Product Designer & System Architect
Working with CPO, Product Designer, Cultural Consultant, and Client stakeholder (final approval)

Timeline: 2–3 weeks from kickoff to working tool and validated output

Responsibilities

Designed the multi-step prompt pipeline architecture separating content from style
Engineered the full prompt system — style lock, colour strategies, composition types, category templates, reference handling
Built the complete internal tool in Lovable — batch generation, gallery, collections, editing, CSV export
Developed systematic style debugging methodology across multiple generation passes

Context & Challenge

The real problem wasn't generating good images. It was generating 200 of them.

A client needed a large illustration set for their app — multiple categories of content, each requiring its own visual treatment. Whitespectre isn't an illustration studio, and this scope would normally be handed to a specialist agency. I could see that AI image generation had reached a quality level that made this feasible. But generating one strong illustration and generating 200 consistent, on-brand ones are completely different problems. The challenge I identified immediately was visual fatigue: with a uniform style applied at scale, ten illustrations on a screen start to look identical regardless of subject matter. The real brief wasn't just quality — it was quality, consistency, and meaningful variation across a large volume, all at the same time.

With variants — varied colour and composition

Key Decision (1/4)

Separate content from style — completely.

The first architectural decision was the most important. The client had an existing content library — hundreds of descriptions written for users, not for image generators. Feeding that raw content directly into an image model produces inconsistent, often inaccurate results. But mixing content and style instructions in a single prompt means you can't tune either independently. I designed a two-step pipeline. Step one: an LLM takes the raw content entry and a category-specific template and generates a pure scene description — what's happening, who's in the frame, what they're doing, what the environment looks like. No colour. No style. Just content. Step two: that scene description is injected into the full image generation prompt alongside the style lock, colour strategy, composition type, and reference images. This separation was critical for debugging. When an image doesn't come out right, you can immediately identify whether the problem is in the content description (step one) or in the style/colour system (step two) — and fix exactly that layer without touching anything else.

Two-step pipeline diagram — content description, then style generation

Key Decision (2/4)

Build a variant system — not just a style.

Getting one consistent style wasn't enough. At volume, stylistic uniformity creates visual fatigue. I needed the system to produce genuine variation — in colour, in composition, in framing — while maintaining complete stylistic coherence across every output. I designed eight colour strategies based on the client's existing brand palette. Each strategy defines foreground and background colour relationships using a warm/cool separation principle, with three distinct hues distributed across body planes, clothing, and equipment within each figure. Switching strategies produces a completely different feel while never leaving the brand spectrum. I also designed five composition types — narrative wide, iconic portrait, detail close-up, environmental, still life — and critically, discovered that composition instructions had to live in both steps of the pipeline: injected into the scene description in step one so the LLM frames the content appropriately, and again in step two to instruct the image model on spatial treatment. For food-related content, I introduced a completely separate naturalistic colour override — because foreground/background colour separation breaks down entirely when the foreground is food that needs to be its natural colour.

Colour settings — warm/cool separation across body planes and clothing

Key Decision (3/4)

Build the tool — don't hand it off.

The original plan was to build a proper orchestration tool with a developer. I was doing manual testing in the meantime — copying prompts between notes and spreadsheets, downloading results, comparing in Figma. It was laborious and error-prone. The wrong prompt version, a missed component, and a generation is wasted. More importantly, you can't trust your conclusions about what's working if your testing process introduces human error. I decided to try building the tool myself in Lovable. The risk felt low — I was already waiting on generations. If it came to nothing, I hadn't wasted much. Within a few hours I had something that stored all prompt components in separate, editable sections, linked directly to the image generation API, and could generate with a single button click. When I showed it to our developer — the person who was supposed to build the proper version — he looked at it for a few minutes and said: “we should just use this.” That was the moment the experiment became the product. From there I built out the full tool: CSV upload for bulk content import, batch generation with randomisation, a gallery with full prompt visibility per image, named reference sets, collections for curation, in-tool editing for minor adjustments, and CSV export for client handoff.

Key Decision (4/4)

Debug the style systematically — one layer at a time.

Getting a style prompt right is not a one-shot exercise. It requires iteration — but iteration done carelessly produces confusion rather than progress. The methodology I developed: generate in batches of four or five (never single images, which can be outliers), identify what's wrong, hypothesise which layer of the system is responsible, make one targeted change, generate again. The most instructive example was the polygon problem. I wanted a geometric, graphic illustration style — bold planes, strong tonal contrast. But using the word “geometric” prominently in the style lock caused the image model to produce figures that looked like early 3D game characters — polygon meshes, not people. No amount of “but make it look human” could counteract the word's association. The fix was to relocate geometry explicitly: clothing gets angular panels, backgrounds get geometric plane divisions, equipment gets clean angular forms. But figure silhouettes and skin surfaces must use natural curves. The geometric quality lives everywhere except the body. I also used the in-tool editing function as a diagnostic tool — not just for client refinements, but to test whether a specific change would move results in the right direction before committing to a system-wide prompt update.

Style lock progression — iterative refinement across generation passes

Outcomes & Impact

“Would have you do a million more.”

When I felt confident enough in the system to share with the client, I produced a deliberate first batch: 54 illustrations across all content categories. Internally, we calibrated our expectations — fifty percent acceptable would have felt like a reasonable starting point. We weren't sure what to expect. The results: 21 strong, 19 acceptable, 14 poor. A 74% strong-or-acceptable rate on the first proper client review. Critically, almost none of the failures were style failures — the consistency was holding. The poor results were concentrated in edge cases: abstract content that was hard to describe visually, or very specific compositional details (hand positions, object placements) that are genuinely difficult to control precisely through prompting. The project demonstrated something I think matters beyond this specific brief. A designer doesn't have to wait for a developer to build the tool they need to test an idea. The feedback loop that used to take weeks can compress into days. That changes what's worth attempting — and it changes what a designer can contribute, not just to the brief, but to the solution. Once the system was stable, I designed the full production workflow and trained a mid-weight designer to operate it independently — tool use, client approval rounds, feedback collation, and sign-off.

74%Strong-or-acceptable on first client review

150+On-brand illustrations delivered at scale

“

I really love these illustrations... Would have you guys do a million more but my Co-founder would fire me.”

Co-founderClient App

Other Case Studies

Setting UX direction and driving measurable conversion gains during a major platform transition

UXE-commerceConversion

Savage X Fenty

Setting UX direction and driving measurable conversion gains during a major platform transition

Savage X Fenty

Setting UX direction and driving measurable conversion gains during a major platform transition

Repositioning a Shareholder Rewards Platform for the Next Generation of Retail Investors

BrandProductStrategy

TickerPerks

Repositioning a Shareholder Rewards Platform for the Next Generation of Retail Investors

TickerPerks

Repositioning a Shareholder Rewards Platform for the Next Generation of Retail Investors