Let's do some quick math. If you're building software entirely with LLMs, and your AI agent is 90% accurate on any given task, what happens when you chain five steps together?
0.90 ^ 5 = 0.59.
Fifty-nine percent. You have a coin flip's chance of getting a working feature at the end of a five-step process. That math is exactly what was driving me crazy over the past few weeks. I was watching agents hallucinate syntax, forget configuration, and break test suites that worked just a few minutes prior.
Over the last three days, I completely ripped out the foundation of my local development workflow. I stopped treating AI like an intern that writes code, and started treating it as an orchestration layer. It was a metamorphosis that profoundly changed how I approach engineering.
The Metamorphosis
It started innocently enough. On Saturday, I was refining the layout of a checkout success page. Then I was running automated git history retrospectives for the team. By Monday, I was deep in the weeds pulling in Garry Tan's architectural docs to rebuild my dogfooding setup with gstack. The context-switching was brutal, and relying purely on conversational prompts to get the AI to execute these widely different tasks reliably was failing.
The core realization hit me while debugging a corrupted laptop BIOS, of all things. I realized that my issue wasn't the AI's "intelligence." I had an impedance mismatch.
LLMs are inherently probabilistic. Most business logic and software engineering tasks are strictly deterministic. Mixing them at the code-generation level is asking for trouble.
The 3-Layer Architecture
To fix the mismatch, I adopted a strict 3-Layer Architecture. We push the complexity and repetitive tasks into deterministic code, leaving the LLM to do what it does best: reason and route.
[ The 3-Layer Agent Architecture ]
LAYER 1: 📝 DIRECTIVES (What to do)
SOPs in Markdown (`directives/`)
Reads like instructions for a human employee.
│
▼
LAYER 2: 🧠 ORCHESTRATION (Decision Making)
The LLM (You are here)
Reads intent, routes tasks, handles errors.
│
▼
LAYER 3: ⚙️ EXECUTION (Deterministic Tools)
Python Scripts (`execution/`)
Secure, testable, fast. No hallucinations.
Here's how this plays out in practice:
- Layer 1 (Directive): I write a plain-English markdown file that says exactly how to, say, run a system accessibility audit or generate an executive summary. It defines the goals, required inputs, and the strict edge cases.
- Layer 2 (Orchestration): The AI reads the directive, understands the context of what I'm asking for, and acts as the project manager. It stops trying to write the implementation from scratch.
- Layer 3 (Execution): The AI calls battle-tested, deterministic Python scripts inside an
execution/folder. If an API rate limit is hit, the script throws a hard error. The AI catches it, fixes the script if needed, and updates the directive. The system self-anneals.
Zero-Friction Local Development
Because the AI is no longer hallucinating random directory structures, I could enforce aggressive physical boundaries in my repo. All intermediate files, scraped data, and temporary outputs go into a .tmp/ folder. No exceptions. They are never committed and can be regenerated at will. Deliverables go straight to cloud URLs (Google Sheets, Slides) or remain as pure, version-controlled artifacts.
By removing the cognitive overhead of "will the AI write this function correctly," I unlocked unparalleled velocity. The AI reads the spec, routes the intent to the Python script, and deterministic magic happens.
I stopped wrestling with 59% success rates. It turns out, when you give a probabilistic mind a set of strictly deterministic hands, you can finally build at the speed of thought.
Get The Drop
Drop your email to get early access to tools and engineering insights before anyone else.