Most custom AI agents fail in production for the same reason: the team built them as one big chatbot prompt instead of as a five-layer architecture. The chatbot works in the demo, then breaks the moment it has to actually do something the team did not anticipate. The fix is not a bigger model or a longer prompt. The fix is architecture.
At AppWT, we run our own pair-coding environment (Opie, named after our digital partner) on Claude Code, the same platform Anthropic engineers use internally. Claude Code organizes agent capability into five distinct layers, each with a separate job. When clients ask us to build a custom AI agent for their business, this is the architecture we use. It is also the architecture we teach.
The Five Layers
Layer 1: CLAUDE.md (Memory)
A markdown file the agent loads on every turn. Contains the architecture rules, naming conventions, test expectations, and a map of the codebase. Always loaded, always active. This is the agent's constitution.
Two scopes: a global CLAUDE.md at ~/.claude/CLAUDE.md applies to every project; a project-scoped CLAUDE.md at .claude/CLAUDE.md applies to one repo.
Common mistake: stuffing every rule into CLAUDE.md so the agent will "remember" them. This bloats the context window on every turn, slows responses, and crowds out the actual task. Use CLAUDE.md for things the agent needs every turn. Everything else belongs in Layer 2.
Layer 2: Skills (Knowledge)
Modular knowledge files that are on-demand, not always-on. The agent reads a skill's description once at session start; when a trigger matches, the full skill content forks into an isolated subagent context.
Skills can be auto-invoked by keyword matching or user-invoked by slash command. Reference documents, scripts, and templates load only when needed. The main context window stays clean.
This is where most of an agent's long-tail expertise lives. AppWT has 100+ canon modules covering everything from optimization doctrine to brand voice rules to deployment patterns. None of them load every turn. They load when the trigger fires.
Layer 3: Hooks (Guardrail)
Deterministic, not AI. Shell commands that run at specific lifecycle events.
- PreToolUse runs before the agent calls a tool. Use it to block dangerous commands (e.g., refuse to let the agent run rm -rf).
- PostToolUse runs after a tool call. Use it for auto-lint on every file write.
- SessionStart runs when a session begins.
- Stop runs when the session ends. Use it to fire a Slack notification.
Think Git hooks for your agent. The crucial property is that hooks are deterministic. They never forget. They never get confused by a clever prompt. They are the safety rails.
Common mistake: trying to enforce safety rules through the AI model itself. AI models forget. Hooks never forget. If a rule must be followed every time without exception, it is a hook, not a CLAUDE.md instruction.
Layer 4: Subagents (Delegation)
Specialized agents the main agent can hand work to. Each subagent has its own context window, custom model, custom tools, and custom permissions.
Pre-built subagent types include code-reviewer (review a diff or branch), test-runner (run and analyze tests), and explorer (fast read-only search across a codebase). Teams can define custom subagents too.
The crucial property: subagents keep the main context clean. When the main agent needs to do a wide research pass without polluting the main thread, it spawns the explorer subagent, which runs in isolation and returns a single distilled message. No infinite recursion is allowed; subagents cannot spawn subagents.
Layer 5: Plugins (Distribution)
NPM-package-style bundles that ship skills, agents, hooks, and commands to teammates via a marketplace or team install.
Think npm packages for agent capabilities. AppWT ships our AI Guardrails bundle this way, so every engineer joining the AppWT process inherits the same constitution, the same skills, and the same guardrails on day one.
How the Layers Compose
The flow reads bottom-up:
CLAUDE.md sets the rules. Skills provide the expertise. Hooks enforce quality. Subagents delegate the work. Plugins distribute everything to the team.
Each layer has a distinct job. The most common anti-patterns are layer confusion:
- Putting on-demand knowledge in always-on memory (bloats context)
- Putting deterministic rules in the AI model instead of a hook (relies on remembering)
- Running everything in the main context window instead of delegating to subagents (context pollution)
- Hand-shipping files to teammates instead of using plugins (drift and version mismatches)
Why This Matters for Your Business
If you are evaluating an AI consulting vendor and they cannot explain how they organize agent capability, they are selling you a prompt-stuffed chatbot wearing an agent costume. It will demo well. It will break in production. The teams whose AI agents survive contact with reality are the teams whose agents are layered.
AppWT builds custom AI agents using the 5-layer architecture, adapted to whatever AI platform your team has standardized on. We start at the simplest tier of the AI systems pyramid that solves your problem and only add complexity when the simpler tier proves insufficient. The result is an agent that works in production, not just in the demo.
Want to talk through what your business actually needs (and what it does not)? Text Tony directly at 734-203-0171 or schedule a free consultation at appwt.com/schedule. The call is with Tony, not a sales rep.
Tags
Tony Paris
Founder and Tech Wizard at AppWT Web & AI Solutions. With over 29 years of experience in web development, Tony helps businesses succeed online through custom websites, SEO, and AI integration.
Learn more about TonyEnjoyed this article?
Share it with your network