We Open-Sourced Our AI Agent Orchestrator — Here's How It Works

We have been shipping Dev Decks with an autonomous agent system for the past two months. A Product Manager agent reads project outcomes and plans sprints. A CTO agent makes architecture decisions. A Project Lead agent executes with wave-based parallelism. Dual reliability and security audits verify everything before a PR is created. Cross-run learning means every sprint builds on what the system learned from previous ones. Today we are open-sourcing the entire framework so you can use it in your own projects.

The system is called The Orchestrator. It is a set of Claude Code skills, rules, and conventions that turn a set of project outcomes into shipped code. It is not a SaaS product or a hosted service. It is a directory of markdown files you copy into your project.

The AI-Human Engineering Stack

The Orchestrator is built on the AI-Human Engineering Stack — a six-layer framework for effective AI agent workflows, created by Henrique Sanchez and Hayen Mill.

Most teams stop at prompt engineering. They write a good prompt, get a good result, and call it done. The Engineering Stack argues that prompts are just the foundation. Five more layers sit on top, and each one compounds the quality of the output.

Layer 1: Prompt Engineering — "What to do." Skill files define what each agent does. Each slash command is a prompt with process and output format. This is the foundation that everything else builds on.

Layer 2: Context Engineering — "What to know while doing." What the agent knows while executing. CLAUDE.md, path-triggered rules, and VALUES.md provide persistent context that shapes every decision. Context engineering is why the same prompt produces different quality outputs in different projects — the surrounding context determines whether the agent understands your architecture or is guessing.

Layer 3: Intent Engineering — "What to want while doing." What the agent is trying to achieve. Outcomes and PRDs define the goals. Every task prompt includes a "north star" statement so execution agents never lose sight of why they are building something.

Layer 4: Judgment Engineering — "What to doubt while doing." What the agent should question. Decision heuristics in VALUES.md, feasibility spikes for high-risk sprints, and the --review checkpoint all create moments where human judgment enters the loop at exactly the right time.

Layer 5: Coherence Engineering — "What to become while doing." Identity and consistency across runs. Sprint retrospectives capture what worked and what broke. Pattern extraction distills recurring lessons. The system maintains architectural coherence even across dozens of sprints because every agent reads what came before.

Layer 6: Evaluation Engineering — "How to know while doing." The feedback loop. Code quality reviews, reliability audits, security audits, and validation gates evaluate every output. Without evaluation, the system would drift. With it, quality is enforced structurally, not by hope.

The power of the stack is that each layer compounds. A well-prompted agent with good context, clear intent, sound judgment, coherent identity, and rigorous evaluation produces fundamentally different output than a well-prompted agent alone.

How The Orchestrator Works

The Orchestrator runs a cycle. Here is what happens on each iteration:

Step 1: Product Manager Agent. The Product Manager agent reads your outcomes (shared/OUTCOMES.md), your values (VALUES.md), and all previous sprint retrospectives. It runs each potential sprint through your decision heuristics — does this deliver user value? Can we validate in 48 hours? Is this feature creep? — and writes a PRD with a mandatory north star statement.

Step 2: CTO Agent. The CTO reads the PRD and explores your codebase for existing patterns. It makes architecture decisions grounded in what already exists, not in abstract best practices. It produces an Architecture Decision Document (ADD) that specifies file structure, integration points, and implementation sequence.

Step 3: Pre-Implementation Audit. Before a single line of code is written, a reliability audit identifies likely failures, generates test specifications, and maps gaps between the spec and the codebase. These test specs feed directly into task generation so tests are built alongside implementation, not after.

Step 4: Project Lead Agent. The Project Lead agent generates a structured task list from the PRD and ADD, then executes it with wave-based parallelism. Tasks that touch different files run in parallel. Dependencies are respected via wave ordering. Every task prompt includes the north star from the PRD so execution agents never lose the thread.

Here is what a task dispatch looks like:

Task(
  description: "Execute task 2.1: Create user service",
  subagent_type: "execution-agent",
  model: "sonnet",
  prompt: "
    North star: Teams ship faster because TaskFlow removes coordination overhead.
    Execute this task. Read CLAUDE.md first. Search for existing implementations
    and copy patterns exactly — only adapt business logic.
  "
)

Step 5: Post-Sprint Audits. After all tasks complete, two audits run in parallel: a reliability audit checks test coverage, contract mismatches, and edge cases, while a security audit checks authentication, authorization, input validation, and data exposure. Only sprints that pass both gates produce a PR.

Step 6: Pattern Extraction. After every third sprint, the system reads the last five retrospectives and distills recurring patterns into .ai/patterns.md. Future Product Manager and CTO agents read these patterns, creating a learning loop. If the same mistake appears twice, the system learns to avoid it on the third attempt.

The VALUES.md System

Most AI agent systems treat every project the same. The Orchestrator does not. The /values-discovery skill runs a 20-30 round interview that surfaces your engineering principles, decision-making style, and code preferences. The output is a VALUES.md file that every agent reads at boot.

This is not a personality quiz. It captures concrete engineering opinions: "TypeScript strict mode, no exceptions." "Reuse before create — read 2-3 reference implementations before writing anything." "Feature first, but track debt and pay it down every 3rd sprint." These opinions directly shape how agents write, review, and architect code.

The Product Manager agent runs sprint decisions through your decision heuristics. The CTO agent respects your architecture preferences. Execution agents follow your code principles. The result is code that looks like you wrote it.

Cross-Run Coherence

A single AI agent interaction has no memory. The Orchestrator creates memory through three mechanisms:

Sprint retrospectives (.ai/retros/) capture what was built, what broke, and what was harder than expected. Every Product Manager and CTO agent reads past retros before planning.

Pattern extraction (.ai/patterns.md) distills recurring lessons from retrospectives. After every third sprint, the system identifies repeated mistakes, consistent architectural decisions, and proven implementation patterns.

VALUES.md provides identity persistence. The system knows who you are, how you decide, and what you value across every sprint. This is the coherence layer — without it, each sprint would start from zero context.

Together, these three mechanisms create a system that genuinely improves over time. Sprint 20 produces better code than sprint 1, not because the underlying model improved, but because the context, patterns, and values accumulated across all previous sprints.

How to Use It

Getting started takes about five minutes:

# 1. Clone The Orchestrator
git clone https://github.com/growthmind-inc/the-orchestrator.git

# 2. Copy skills and state files into your project
cp -r the-orchestrator/.claude/ /path/to/your-project/.claude/
cp -r the-orchestrator/shared/ /path/to/your-project/shared/
mkdir -p /path/to/your-project/.ai/

# 3. Customize CLAUDE.md for your project
cp the-orchestrator/CLAUDE.md /path/to/your-project/CLAUDE.md

# 4. Discover your values (interactive session)
cd /path/to/your-project
claude "/values-discovery"

# 5. Define outcomes and launch
claude "/outcomes"
claude "/orchestrate"

The skill files use [your validation command] as placeholders. Replace these with your project's actual commands — npx tsc --noEmit for TypeScript, mypy . for Python, cargo check for Rust.

The full repository, documentation, and quick start guide are at github.com/growthmind-inc/the-orchestrator.

What's Next

The Orchestrator is a snapshot of how we ship Dev Decks today. It is opinionated and reflects our workflow. Some things it does well:

Autonomous sprint planning from outcomes
Wave-based parallel execution
Cross-run learning through retrospectives and pattern extraction
Quality gates that prevent broken code from landing

Some things it does not do yet:

Multi-repo orchestration (it assumes one project at a time)
Automatic eval loops for AI-generated content quality
Visual regression testing for UI-heavy sprints
Integration with CI/CD pipelines beyond git push

We welcome contributions. If you adapt The Orchestrator for a Python, Rust, or Go project, we would love to see your customizations. If you find patterns that improve the framework, open a PR. The system is designed to evolve.

We built The Orchestrator to ship Dev Decks, the AI pitch deck platform for technical founders. If you are raising a round and want a deck that looks like a real company built it, give us a try.

Originally published on Dev Decks Blog. Cross-posted with permission.

Tom McDonough

CTO, Dev Decks at Dev Decks

Ready to build your deck?

Paste your URL and get a branded pitch deck in minutes. Custom slides, your brand, no templates.

Build your deck free