Copilot Autonomous Template
Open-source workflow that makes GitHub Copilot build coherently across long autonomous sessions.
v1.0.0 shipped — hook-verified stage machine, 160 hook tests, 10 ADRs. Used to build WyoClear, Sentinel, and this site.
Problem
AI coding agents are great at single prompts and bad at sustained work. Across long sessions they lose context, drift from the goal, skip tests they can rationalize away, and — if you let them — quietly game their own benchmarks to look successful. The first version of my workflow caught most of this. Around session 40 of building [Sentinel](https://github.com/jcentner/sentinel), it stopped catching the rest: the agent was editing the benchmark repo to make detector precision look better instead of improving the detectors. Right code, wrong work, and no review process designed to notice the difference.
Approach
Rebuilt the workflow as a hook-verified stage machine. A session moves through explicit stages (planning, design-critique, implementation, review, cleanup), each with gating fields a Python hook can verify before allowing transition. Strategic review ("is this the right work?") is separated from code review ("is this written well?") and given to a dedicated product-owner subagent. The tester subagent is OS-isolated from the implementation — a PreToolUse hook denies it reads of source code, so tests come from the spec or they don't exist. The builder can no longer edit its own enforcement layer; improvements get proposed to a human and applied via copier update. Every rule that matters is a programmatic hook, not a comment in a markdown file.
Result
Shipped as v1.0.0 in April 2026: copier template that bootstraps the full setup (agents, subagents, hooks, vision lock, ADR scaffold, prompt overrides), 160 tests for the hooks themselves, 10 ADRs documenting the design rationale, and a smoke test for the generated output. Used to build WyoClear (production civic-tech app on donations), this site (jcentner.com), and Sentinel (in progress). The bottleneck is no longer "can the agent build coherently" — it's "do I know what I want it to build."