I Replaced My Entire Dev Team with a Bash Script

April 7, 2026

Fleet is a simple multi-agent orchestration tool. It's a bash script that launches Docker containers, coordinates them through git, and provides a terminal dashboard to watch what's happening. I'm experimenting with it in my own work and wanted to share it in case others find it useful, too. Sorry for the click-bait title.

Where I started

To this day I am not a big fan of fully autonomous development. I keep trying whatever gets released by the big companies but it mostly lets me down for anything beyond straightforward projects, and the pain of jumping into a flow that is meant to be autonomous is usually more work than it saves.

So that leaves me with Claude Code as an interactive tool that augments me as a developer. With projects like Superpowers and the lesser-known Beastmode this works pretty well. If you don't use any of those, you're leaving a lot on the table.

That's how I mostly work at the moment. It's the workflow I used for the last decade on steroids. With improving models and good planning, less and less intervention is needed. My waiting times got longer and I needed to fill them. So I usually start my day by planning 3-5 projects in parallel and kick them off one after the other, developing them side by side. Since git worktrees seem to be a feature that most of us only learned existed in the last two years, I guess many people went through the same process.

The productivity is good, but I also found that context switching takes its toll. I am usually exhausted by the evening. And I know from experience that I'm decent at context switching, so I can only imagine how tiring it must feel for people who struggle more with it.

I wanted to change something. My goal was to reduce context switching by packaging more scope into a single flow, but not lose out on clean implementation and end up with massive PRs.

The review absurdity

Additionally, parts of my workflow started to become increasingly odd.

I would develop a feature, then put up a PR like I did for the last 15 years. Then I or another human reviewed that code. Of course now there's AI code reviewers, most of the time even multiple different ones, because why not.

Reviewing myself when AI can do it so well felt increasingly weird, but I still found myself regularly flagging things that AI didn't, so I didn't want to fully give up on it. I scraped my PR comments from the last few years across multiple projects and compiled them into a prompt for a me-AI-reviewer. I let that reviewer do the PR reviews instead of me and it did a decent job. My review work shrank to skimming for architectural topics only that AI doesn't catch that well, often due to missing scope. But given the seniority of the team, those were rare. I stopped looking at the implementation details which made my reviewer life a lot more pleasant. Not sure how many subtleties I missed over the last months. I guess the future will show... #aws.

At that point things felt absurd. I was planning and steering, but AI was both writing code and reviewing it. Yet I was still opening PRs and using a dedicated agent to review the code as if a human was involved. The ceremony of a PR (create a branch, push, open the PR page, assign reviewers, wait for CI, read the review, close) existed to coordinate humans. And there were no humans left in that loop except me.

So I thought: why not move the whole thing to my local machine? That's what made me think about fleet.

What I needed and why

The more I use AI for development, the more I notice that the things that always mattered most in software development now matter even more. Clear product requirements, solid acceptance criteria, thoughtful technical planning, clean and frequent commits. These practices usually lead to decent implementations. The more vague you are on any of them, the more nonsense you get back from the LLM.

Whenever I have to manually nudge and steer AI it usually falls into one of these three categories:

Unclear requirements or acceptance criteria. This is the one I'm trying hardest to tackle. When I work with a capable team, I'm not sitting there looking over their shoulder all day. Instead you create an engineering culture, agree on standards and what matters. Then you do good product and design work, plan carefully, define a clear outcome, and they usually deliver. Code reviews are there to double-check, not to micromanage (they're also a valuable place to grow engineers, but that's a whole other blog post). I wanted to push my AI workflow closer to that state. A proper planning phase that produces clear tasks with real acceptance criteria, and agents that coordinate without me shuffling things around. Developers pick up tasks, reviewers review their work, rejected tasks bounce back. No babysitting.

Missing application context. The big AI context problem. Agents that don't understand the codebase produce code that technically works but doesn't fit. I wanted something that maintains living documentation (architecture, stack, features, learnings) that grows with every feature and gets fed back into agents automatically. And keeping the work history and tasks in git adds to that: a trace of decisions and changes that makes it easier to figure out where something went wrong.

Complicated problems where the solution is not straightforward. This still happens more often than I'd like, and I don't have a good answer for it beyond waiting for better models. To me this is why it feels AGI might still need a few iterations.

Beyond those, the PR ceremony I described above needed to go. If AI is writing and reviewing the code, the whole create-branch-push-open-PR-assign-wait-close flow is pure overhead. Agents should coordinate through something simpler.

But giving agents more autonomy means I need guardrails:

Clean output. Each task should get its own branch. When the project is done I want to look at individual diffs, not one massive PR that mixes 12 unrelated changes.

Full isolation. Agents should be able to install packages, run builds, execute tests without touching my machine. If one goes off the rails, I stop it and the container disappears.

And finally, my own sanity:

Less context switching. I wanted to describe a project once, kick off agents, and monitor from a single place. Not five terminal tabs with five different worktrees.

How fleet works

Fleet is a bash script that manages Docker containers and a directory of task files. It's about 1800 lines. That's the whole thing.

You describe your project in a markdown file (fleet design helps you with this if you want). A planning agent breaks it into tasks. Developer agents pick up tasks, write code on their own branches, and submit for review. Reviewer agents read the diffs and approve or send back with feedback. QA agents validate the integrated result and create follow-up tasks for issues they find. A docs agent watches what gets merged and maintains structured documentation that gets fed back into all agents.

All coordination happens through git. Tasks are files in .fleet/tasks/. Agents claim them by writing a lock and committing. Reviewers move task files between directories. There's a bare upstream repo inside the project that agents push to and pull from.

That's it. No message passing, no shared memory, no orchestration framework. Just files, git, and Docker.

fleet new          # set up a project
fleet design       # plan it interactively
fleet start        # launch agents
fleet monitor      # watch the dashboard
fleet close        # merge everything to main

Fleet Monitor — live dashboard showing agents, task board, and recent commits

See the repo for a complete overview of all available commands.

Design decisions

A few choices that might seem odd until you think about them:

Git as the coordination layer. Git seemed like the easiest communication interface for agents to use. Every action becomes a commit, so you get a full audit trail for free. If something goes wrong, git log tells you exactly what happened and in what order. The downside is git noise. Your history will have coordination commits mixed in with code commits. I think that's an acceptable trade-off. The alternative is a separate system that you need to debug separately. All the coordination commits live on the integration branch, all the "actual" coding work lives in dedicated branches per task. If you don't want the noise, simply merge those branches (or open PRs) and discard the integration branch.

No status machine. Early versions had a project status that went from "active" to "completing" to "complete". The plan agent would decide when to transition. This caused race conditions. The plan agent could set "complete" while QA was still finding bugs. I ripped it out entirely. Now agents just work until there's nothing left to do, and the human decides when the project is done. Simpler and more correct.

Agents don't know about each other. There's no registry of running agents, no peer-to-peer communication. Each agent sees the task board and acts on it. If a developer finishes a task and submits it, they don't need to notify a reviewer. The reviewer will see it next time they check the board. This makes the system robust to agents starting and stopping at any time. If a developer is stopped and restarted, it will pick up the existing work from the previous agent (only if committed).

IDLE backoff instead of shutdown. Agents run in a loop. When an agent has nothing to do, it outputs IDLE: <reason> and backs off. 30 seconds, then 60, then 90, up to 5 minutes. A sleeping container costs nothing and means agents are always ready to pick up new tasks. The idea is that you keep the plan, docs, qa agents running all the time to accompany the work of developers and reviewers.

Docker, not worktrees. Docker gives true isolation, you can run Claude Code in full YOLO mode. The trade-off is setup overhead (you need Docker running) and slightly slower git operations through the upstream repo. Worth it.

No orchestration agent. Since every agent is independent and they can't communicate with each other, the architecture dictated that there won't be an orchestration agent. So far I don't have the feeling this is a problem. But running Claude Code in an interactive session and telling it to monitor the fleet using fleet status and fleet logs works pretty nicely!

Try it

Fleet is early and rough. It brings me value and I use it daily, but I wouldn't call it polished. I keep iterating on it as I learn what works and what doesn't.

If you want to try it: github.com/danrex/fleet.

It's one script, no dependencies beyond bash and Docker. Install it, run fleet new, and see what happens.

Christian Graf
Just trying to figure out what to do with all these models.