How to Architect Robust Frontend Systems for Agentic AI
How I built a reference-grade frontend monorepo, and why the same guardrails are what let AI agents ship code safely.
I rebuilt my own website, the one you are reading this on, as a deliberately over-engineered reference. It is a personal portfolio and a blog, and it needs none of what I gave it: a monorepo, strict typing turned all the way up, a CI pipeline with nine gates. I did all of that on purpose, because I wanted a low-stakes place to practise the habits that matter when the stakes are real.
The bigger reason, and the one this article is really about, is that the way I write frontend code has changed. Agents draft components, refactor utilities, and open pull requests for me now. That raises a question I never had to ask when I was the only one typing: with something other than me writing the code, I need a way to know it has not quietly broken anything.
The answer I landed on is that a robust architecture is really just the set of guardrails that make change safe to attempt. Those guardrails turn out to be exactly what an agent needs too. They let it move fast and tell it, straight away, when it has got something wrong, before the mistake reaches me.
What “robust” means here
The word gets thrown around, so here is what I mean by it. A robust architecture is one where change is cheap and safe: I can add a feature, bump a dependency, or take a contribution and find out fast, without doing the checking by hand, whether I broke something. This has nothing to do with how clever the code is. I have seen plenty of elaborate, heavily abstracted frontends that were terrifying to change, and that is the opposite of what I am after.
At this scale it comes down to a few things. Bad states should be hard to even write down. The build should tell me the truth rather than a convenient version of it. And I should hear about a problem in seconds, not minutes. Most of what follows is in service of those. Later on I will be straight about where a setup like this runs out of road, because a portfolio is not a payments system and I would rather not pretend it is.
The architecture at a glance
Here is the shape of it. I have linked the actual files throughout so you can check that I am not hand-waving.
It is a monorepo, run with pnpm workspaces(opens in a new tab) and Turborepo(opens in a new tab).
There is one app, apps/website, and three packages alongside it: a shared component library, a shared ESLint config,
and a shared TypeScript config. I wrote down the reasoning and the trade-offs in
ADR 0003(opens in a new tab).
A single app plainly does not require a monorepo. I went with one so the rules I care about, linting and types and design
tokens, all live in one place, and anything I add later inherits them without my having to wire it up again.
The site itself is Astro(opens in a new tab) with React(opens in a new tab) islands. Astro ships no JavaScript by default and only hydrates the interactive bits, so a content-heavy page stays fast without me watching bundle sizes by hand. Everything builds down to static HTML that Cloudflare Pages(opens in a new tab) serves, so there is no server sitting there to run or pay for. I covered that choice, and why it fits a content-led site, in ADR 0002(opens in a new tab).
Turborepo also caches the output of each task, which is easy to take for granted. A change that only touches the website does not drag the untouched packages through a rebuild or a re-lint. The loop stays short as the repo grows, and that turned out to matter more than I expected once an agent started running the same tasks dozens of times an hour.
One source of truth for design
The shared library, @repo/ui, builds its components on Base UI(opens in a new tab) primitives and styles them with
class-variance-authority(opens in a new tab). The tools matter less than one rule I hold to: the design tokens live in
a single file,
packages/ui/src/theme.css(opens in a new tab),
and nowhere else. Colours, type scale, the semantic helper classes — all defined once, in that one place. So when an
agent or a contributor goes looking for a colour, there is exactly one right answer, and the structure of the repo makes
inventing a one-off token more awkward than just using the real one. I get a lot of the visual consistency for free that
way, simply because there is no convenient path to doing it wrong.
This next part I will say plainly, because I have watched it go wrong on bigger teams: keep the UI on one design system
and do not mix and match component libraries. The cost of doing otherwise is hidden at first and then compounds. Pull in
a second library and you now have two design languages, two sets of tokens drifting apart from each other, two opinions
about focus rings and keyboard behaviour, and two lots of broadly similar code going down the wire to the browser. Each
of those is a seam, and an agent pattern-matching on whatever file it happened to read last will cheerfully widen it,
dropping in something that looks perfectly reasonable on its own. With one theme.css, “use the right colour” is
something the build can check, rather than a convention everyone has to keep in their heads.
Guardrails: the part that matters for agents
This is the heart of the article. When I let an agent write a real chunk of code, the thing keeping me honest is not the agent — it is the row of gates the code has to clear first. Each of those gates runs locally, so the agent can run it, see what failed, and fix its own work before any of it gets to me.
Make bad states unrepresentable
Before any human or any test looks at the code, the compiler does. The shared TypeScript config runs at the strictest settings I could live with day to day, and I logged the reasoning in ADR 0004(opens in a new tab). A few of the flags that consistently pull their weight:
noUncheckedIndexedAccess, so that readingarr[i]hands youT | undefinedand you are forced to deal with the empty case.exactOptionalPropertyTypes, so an optional prop cannot quietly becomeundefinedwhen what you meant was absent.verbatimModuleSyntaxand a few neighbours, so the import graph stays honest.
With an agent in the loop this matters even more than it did before. noUncheckedIndexedAccess is the flag that saves me
most often. An agent will write posts[0].title without pausing, and fair enough: nine drafts in ten the array has
something in it and the line looks completely fine. TypeScript will not take “looks fine” for an answer. It hands back
T | undefined and makes me deal with the empty array before the code will compile at all. That is roughly the whole
problem with agent-written code: it is fluent and confident and produces things that look right, so what I need from
the type system is that it keeps catching the cases where confident-looking code is actually broken — a promise nobody
awaited, an undefined nobody handled, a cast papering over a real mismatch. The stricter I make the types, the less
room is left for code that is plausible and wrong at once. ESLint runs on top of all this in type-aware mode (the
strictTypeChecked and stylisticTypeChecked rule sets), @ts-ignore is off the table, and warnings fail the build,
because in my experience a warning that does not fail the build is one nobody gets around to fixing.
Layered, local-first quality gates
I run the same checks in two places, which I wrote up as a clean-as-you-code policy in ADR 0005(opens in a new tab). On my machine, Lefthook(opens in a new tab) formats and lints whatever I commit, then runs typecheck and the tests before it lets me push. In CI the full run is, in order:
format, then lint, then typecheck, then knip, then syncpack, then test, then build, then publint, then Playwright with axe, then Lighthouse.
Running locally first is the half that earns its keep every single day. Because the same gates fire on commit and on
push, a broken type or a failing test shows up in seconds, right there in my terminal, rather than turning up minutes
later as a red badge in CI once I have already mentally moved on. knip trips on dead code and unused dependencies;
syncpack keeps dependency versions lined up across the whole workspace. What all of this gives me is a set of
acceptance criteria that are objective and runnable. An agent never has to wonder whether a change is good enough. It
runs the gate, reads what comes back, and because that takes seconds it can keep going around the loop until the thing
actually passes, instead of handing me something half-finished.
Quality you can put a number on
Plenty of quality is a matter of taste, but a surprising amount of it does not have to be. The site keeps a Lighthouse(opens in a new tab) budget of 100 for accessibility, best practices and SEO, and no less than 95 for performance, and it asserts those on every single run. Accessibility gets a second pass at runtime, where axe(opens in a new tab) runs against the actual rendered pages, scoped to the WCAG(opens in a new tab) success criteria. Test coverage sits on a floor that only ever ratchets up and is never allowed to slip back. Once the bar is a number, it does not matter whether the change came from me, from an agent, or from me at one in the morning: everyone gets marked against the same line.
Writing down what the code can’t tell you
For all the tooling, the single most useful thing in the repo is plain prose. There is an
AGENTS.md(opens in a new tab) holding the conventions you
could never work out from reading one file on its own: that React hooks do not belong in .astro files, that a single
hydrated root owns the theme context, that inside Astro you pass className to a React component and not class. The
bigger architectural calls live as numbered ADRs that explain why I did something and not only what I did. Writing that
down once, before anyone needs it, has saved me far more time than I would ever have got back by correcting the same
mistake over and over, and it helps a new contributor exactly as much as it helps an agent.
Why this matters more once an agent is writing the code
The mental model I keep coming back to is that an agent is a fast, literal, tireless colleague who remembers nothing from yesterday and badly wants to hand you something that looks finished. That is genuinely useful, and it is also exactly the sort of colleague whose work you want run past a machine before it lands.
Without the gates, every handoff to an agent comes down to taking its word for things. With them, a change has to show its working before it counts: the types still check, the tests still pass, coverage has not dropped, the page is still fast and still accessible. At that point the architecture is not decoration any more, it is the actual contract between me and whoever, or whatever, wrote the code. What pleased me is that it is the very same contract a human contributor benefits from, and the one I will lean on in six months when I have completely forgotten why I did any of this.
Where a personal site’s gates run out
I said I would be honest about the limits, so here they are. Everything above still applies at scale, it is just nowhere near enough on its own. The underlying idea, keeping change safe, carries over fine; what changes is how much surface area and how much money are riding on it. If you are building for a real product or a large organisation, here is the ground a portfolio’s gates simply do not cover.
Runtime truth
My Lighthouse number is a lab reading taken on my laptop. It tells me nothing about how the site behaves on a three-year-old Android phone with bad signal on a train. Real systems need to see themselves running: real-user monitoring, error tracking, service level objectives with error budgets, so that the first you hear of a problem is the data and not a furious email. A passing CI run is a prediction about production, and the only way to find out whether the prediction held is to watch the real thing.
Progressive delivery
I deploy on merge to main. Cloudflare Pages rebuilds the static output and ships it, and every branch and pull request
gets its own preview URL, which is plenty when there is exactly one author. A product with real traffic needs more:
feature flags, canary releases, staged rollouts, and a rollback you can pull in seconds. All of that exists to keep the
blast radius of any one change small, and a change written by an agent is exactly the kind you want a small blast radius
around.
Design system governance
One theme.css works precisely because there is one thing consuming it. Spread the same design system across a dozen
teams and it needs a good deal more: versioned tokens, a deprecation policy, visual regression testing (Chromatic and the
like) so that nudging one token does not quietly redraw a hundred screens nobody was looking at, and a genuine model for
how people contribute changes back. By that point you are running design-system governance, not just keeping a list of
colours.
Boundaries and contracts
In a large codebase the difficulty is rarely inside any single module. It lives in the seams where the modules meet. Handling that means enforced module boundaries, typed contracts between teams and between services, clear ownership rules, and sometimes splitting the frontend up so separate teams can ship without waiting on each other. Getting the types right inside one package is routine by now; the genuinely hard part is holding a contract steady across team and organisational boundaries, and no compiler flag does that for you.
State, data, and resilience
A static site sidesteps most of the genuinely hard frontend problems: caching and invalidation, optimistic updates, offline behaviour, retries and partial failure, race conditions in a client that stays alive for hours. An app that holds real state for real users has to design for every one of those deliberately, and most of the interesting bugs live right there.
When the frontend serves AI to users
This distinction is worth pulling out on its own, because it is easy to wave away. The agents in my repo are there to build the site. The site does not put any AI features in front of the people visiting it, and that is a meaningfully different thing. The moment a frontend does serve a model to users, a whole new layer of work shows up: evaluations so you actually notice when answer quality regresses, guardrails on what the model is allowed to emit, budgets for latency and cost, UX for streaming and partial responses, prompt and version management, and a real plan for abuse and for the model simply failing. None of that lives in this repo, and it would be a stretch to pretend otherwise.
Organisational scale
And then the parts that are not code at all: on-call rotations, proper threat modeling that goes well past scanning dependencies, and accessibility audits done with people who genuinely rely on assistive technology rather than an automated checker standing in for them. Automated gates are good at lifting the worst case, but they will not hand you the best one; that gap gets filled by human judgement and by people who have actually lived with the consequences.
So the honest version is this: my portfolio practises the right habits, just at a small scale. A larger system needs those same habits and a great deal on top, because once there are more people and more traffic and more money involved, a mistake costs more and is much harder to walk back.
A practical starting checklist
If you want to bring some of this to your own project, here is the order I would actually do it in. Do not try to land all of it on day one. Most of the payoff is in the first few steps, and bolting on machinery before you need it tends to cost you more than it saves.
- Reach for boring, proven tools. Every novel one is a cost you keep paying long after the novelty has worn off.
- Turn TypeScript strictness up on the very first commit. Relaxing it later takes a minute; tightening it across a codebase that has already grown is a slog you will keep putting off.
- Keep a single source of truth for your design tokens.
- Run your quality gates locally first and mirror them in CI, so the feedback stays quick.
- Write an
AGENTS.md, and a short ADR for any decision that would be painful to reverse. - Give test coverage a floor that is only ever allowed to go up.
- Take the quality you actually care about and turn it into budgets you can measure.
- Add the scale machinery — observability, feature flags, visual regression — once scale is genuinely here, and not a moment sooner.
Wrapping up
None of this is exotic. On a project this size, strict types, gates that run locally before CI, and a little written-down context are cheap, and together they buy the one thing I care about: I can change the site, or let an agent change it, and find out fast if something broke. A bigger system needs the same habits and a lot more scaffolding around them, which is exactly why a low-stakes site is a good place to keep them sharp.
If this was useful, there is more on the blog, and I am happy to talk shop — you can reach me from the contact page.