Operator-honest · Siren-based ranking · 2026-05-11

Claude Code · Devin · Sourcegraph Amp · Cline · OpenHands · Roo Code · Replit Agent · Bolt.new · Lovable · v0 by Vercel.
One question: which one is right for your stage?

Honest 10-way comparison of Autonomous Coding Agents — Operator-Honest Ratings (Quality of Support · Task Success Rate · Repo-Aware Autonomy · AI Substrate Velocity) across Claude Code · Devin · Sourcegraph Amp · Cline · OpenHands · Roo Code · Replit Agent · Bolt.new · Lovable · v0 by Vercel platforms. No vendor sponsorship. Calling Matrix by buyer persona below — operator's siren-based read on which one to pick when you're forced to pick.

The 10 platforms · what each is actually best at.

Honest read on positioning, ideal customer, and where each one is the wrong call. No vendor sponsorship, no affiliate links — operator-grade signal.

1. Claude Code Anthropic · official terminal-native agent · operator daily-driver

Anthropic's official terminal-native autonomous coding agent — the operator's daily driver and the agent SideGuy ships with. AI-baked-in (Claude IS the substrate, not a feature bolted on) — fastest model upgrades land same-day vendor release. MCP tool integration + custom skills + sub-agents + hooks built in. The agent that a solo operator (PJ) uses to ship 1000-employee output across SideGuy daily.

✓ Strongest atFrontier Anthropic substrate (Claude as native runtime), terminal-native operator workflow, MCP tools + custom skills + sub-agents + hooks, multi-file repo refactors, same-day model upgrade cadence.

✗ Wrong forTeams wanting hosted browser-based agent UX (Devin wins), shops that won't send code to Anthropic API (Cline + OpenHands self-host wins).

Pick Claude Code if: you want the frontier Anthropic substrate as a native autonomous agent in your terminal — same-day model upgrades + operator-grade workflow.

2. Devin Cognition AI · category-defining hosted agent

The category-defining autonomous SWE — Cognition's hosted agent with its own browser, terminal, IDE, VM. Pioneered the autonomous-agent category in late 2024. Strongest brand in the category, deepest async ticket-to-PR workflow, well-funded enterprise sales motion. Hosted-agent UX competes with Claude Code's terminal-native UX on different operator preferences.

✓ Strongest atHosted async ticket-to-PR workflow, browser + terminal + IDE + VM bundled, Linear/Jira/Slack-native integration, async parallel agent runs, brand defensibility, enterprise procurement story.

✗ Wrong forOperators who want terminal-native CLI agents (Claude Code wins), self-host requirements (Cline + OpenHands win), tight token budgets (per-task pricing adds up at scale).

Pick Devin if: you want the brand-defensible hosted async ticket-to-PR autonomous SWE that runs in the cloud.

3. Sourcegraph Amp Sourcegraph · enterprise code-graph-grounded agent

Enterprise-scale autonomous agent built on Sourcegraph's code intelligence graph — purpose-built for very large codebases (1M+ files). Amp pairs autonomous execution with Sourcegraph's symbol graph (call sites, type definitions, cross-repo refs). Decade-old enterprise code-search heritage. The reference standard for autonomous agents at monorepo scale.

✓ Strongest atCode-graph-grounded autonomous reasoning, monorepo + multi-repo scale (1M+ files), enterprise on-prem deployment, BYOK model substrate, Sourcegraph customer base, structural code intelligence.

✗ Wrong forSolo founders / small repos (overkill — Claude Code wins on velocity), shops not on Sourcegraph (deployment overhead), greenfield prototyping.

Pick Amp if: your codebase is 1M+ files, you already run Sourcegraph, and you need autonomous agents grounded in real code-graph intelligence.

4. Cline Open-source · VS Code agent · self-host friendly

The open-source VS Code autonomous agent for self-hosted teams — BYOK any model, MIT-licensed, fork-friendly. Cleanest exit ramp from Devin / hosted-agent pricing. Active community. Roo Code is its most popular fork. The reference open-source autonomous agent for VS Code-resident operators.

✓ Strongest atSelf-host + BYOK across any provider, VS Code-native, MIT-licensed inspectable, local Ollama / vLLM support, zero vendor lock-in, regulated-industry friendly, MCP tool integration.

✗ Wrong forTeams wanting polished hosted-agent UX (Devin wins), shops without ops capacity, enterprise wanting commercial SLA (no vendor entity).

Pick Cline if: you want autonomous agents inside VS Code with full self-host + BYOK + zero vendor lock-in.

5. OpenHands Open-source · formerly OpenDevin · research + self-host

The open-source autonomous agent platform formerly known as OpenDevin — research-grade self-host answer to Devin. Polished agent platform with browser + terminal + code-edit + planning capabilities. Best for SWE-Bench experiments, university labs, and engineering orgs that want autonomous agents on their own infra with no vendor in the data path.

✓ Strongest atOpen-source autonomous agent research, fully self-hostable, BYOK model substrate, SWE-Bench reproducibility, browser + terminal + code agent capabilities, MIT-licensed, active research community.

✗ Wrong forProduction engineering teams wanting polish + support (Devin / Claude Code win), commercial SLA buyers (no vendor entity), teams without ops capacity.

Pick OpenHands if: you're a research team or self-host shop running autonomous agent experiments without vendor cloud in the data path.

6. Roo Code Open-source · Cline fork · multi-mode agent

The multi-mode fork of Cline shipping specialized agent personas (Architect / Coder / Debugger / Ask). Adds explicit cognitive-mode separation on top of Cline's foundation — Architect plans, Coder implements, Debugger triages, Ask answers. Best for teams that want explicit mode-switching workflows instead of one monolithic agent prompt.

✓ Strongest atMulti-mode persona workflows (Architect / Coder / Debugger / Ask), Cline-fork inheritance (BYOK + self-host + VS Code-native), custom mode definitions, MCP tool integration, active fork community.

✗ Wrong forTeams wanting a single agent prompt without mode ceremony (Cline / Claude Code win), enterprises wanting first-party vendor support.

Pick Roo Code if: you want Cline's foundation with explicit Architect / Coder / Debugger persona separation.

7. Replit Agent Replit · cloud-native autonomous builder · prototyping leader

The cloud-native autonomous builder for greenfield prototypes inside Replit's hosted runtime. Provisions runtime + DB + deploy from a prompt and ships a working URL. Best agent for one-shot full-stack scaffolds, idea validation, and non-developer founders. Trade-off: locked into Replit's environment.

✓ Strongest atPrompt-to-deployed-URL full-stack scaffolding, runtime + DB + deploy bundled, non-developer founders, prototyping velocity, Replit-hosted environment.

✗ Wrong forProduction work on existing 100K+ LOC codebases, local-IDE workflows, enterprise on-prem requirements.

Pick Replit Agent if: you want prompt-to-deployed-URL agentic scaffolding inside Replit's hosted runtime.

8. Bolt.new StackBlitz · AI-native web app prototyping · browser runtime

StackBlitz's AI-native web app builder shipping live in browser via WebContainers. Real Node.js runtime running in your browser tab. Zero-install, zero-deploy-config prototyping. Best for AI-native web app prototypes, demo builds, hackathons.

✓ Strongest atAI-native web app prototyping in browser, WebContainers Node.js runtime (no install), zero-install demos, hackathon velocity, designer-friendly UX.

✗ Wrong forExisting production codebases, enterprise procurement, mobile / native apps, anything beyond browser web apps.

Pick Bolt.new if: you want zero-install AI-native web app prototyping inside the browser via WebContainers.

9. Lovable Full-stack web app builder · designer-friendly · built-in deployment

The designer-friendly full-stack web app builder with built-in deployment + Supabase integration. Targets non-developer founders + designers shipping working full-stack apps from prompts. Tighter design polish than Bolt for production-leaning prototypes.

✓ Strongest atDesigner-friendly full-stack builds, Supabase integration baked in, built-in deployment, non-developer founder fit, polished UX output, fast prototyping cadence.

✗ Wrong forEngineers editing existing repos, enterprise procurement, custom-runtime / non-web targets.

Pick Lovable if: you're a designer / non-developer founder shipping polished full-stack web prototypes with auth + DB + deploy.

10. v0 by Vercel Vercel · shadcn/ui + Next.js component generator

Vercel's component-generation agent for shadcn/ui + Next.js, optimized for shipping straight to Vercel. Generates component-grade React + Tailwind + shadcn/ui code that drops cleanly into Next.js apps. The right pick for teams already on the Next.js + Vercel + shadcn stack.

✓ Strongest atshadcn/ui + Next.js component generation, ship-to-Vercel deployment in one click, Tailwind + React polish, Vercel-stack-native, design-to-code velocity.

✗ Wrong forNon-Next.js stacks, full-stack apps with custom backends, repo-aware refactors, large-codebase work.

Pick v0 if: you're already on Next.js + Vercel + shadcn/ui and you want component-grade AI-generated UI.

The Calling Matrix · siren-based ranking by who you are.

Most comparison sites refuse to forced-rank because their revenue depends on staying neutral. SideGuy ranks because it doesn't take vendor money. Here's the call by buyer persona.

🎯 If you're a Ranking on QUALITY OF SUPPORT

Your problem: When your autonomous agent breaks mid-PR at 2am, you need on-call humans not AI bots. Most autonomous-agent vendors are too new (most shipped 2024-2025) to have mature support orgs.

Devin — Cognition is the most-funded most-known autonomous-agent vendor — enterprise CSM bench + named support contacts at higher tiers
Sourcegraph Amp — Series D vendor inheriting decade-old Sourcegraph enterprise support muscle — real humans on monorepo deployment fires
Claude Code — Anthropic enterprise support backs the agent — substrate vendor IS the support vendor, deepest possible escalation path
Replit Agent — Replit org has years of dev-tools support muscle — broader than autonomous-agent-pure-play vendors
Lovable — designer-friendly product = mature support org by necessity (non-developer users have higher support expectations)

If forced to one pick: Devin — Cognition's enterprise support bench is the deepest pure-play autonomous-agent support org in 2026; Claude Code if you want substrate-vendor support depth.

🚀 If you're a Ranking on TASK SUCCESS RATE (one-shot ticket → working PR)

Your problem: You give the agent a task and walk away. When you come back, the question is: did it ship working code? Task success rate is the #1 autonomous-agent metric and the SWE-Bench Verified scores are the public proxy. See the dedicated Task Success Rate axis for the full SWE-Bench comparison.

Claude Code — frontier Anthropic substrate (Claude Sonnet 4.7-class) consistently leads SWE-Bench Verified — the agent inherits substrate quality
Devin — category-defining hosted agent with native task-execution UX — strong real-world ticket-to-PR success
Sourcegraph Amp — code-graph grounding lifts task success on monorepo work where embedding-based agents hallucinate
Cline — BYOK Claude Sonnet substrate matches Claude Code on raw model quality; UX gap closes if you pick the right model
OpenHands — open-source SWE-Bench leaderboard contender — task success rate when paired with frontier model + good prompting

If forced to one pick: Claude Code — frontier Anthropic substrate + operator-grade tool integration deliver the highest one-shot task success rate in the category in 2026.

🧠 If you're a Ranking on REPO-AWARE AUTONOMY (multi-file refactors at codebase scale)

Your problem: Single-file autonomous edits are easy. Real autonomous work means: agent reads the whole repo + understands cross-file dependencies + ships a multi-file refactor that doesn't break tests. Repo-awareness is the autonomous-agent moat at non-trivial codebase scale.

Sourcegraph Amp — code-graph-grounded autonomous agent at 1M+ file scale — symbol graph traversal is structurally more accurate than embedding retrieval
Claude Code — operator-grade repo-aware agent with explicit context scoping + MCP tool integration — strongest at 10K-1M LOC range
Devin — hosted agent with own VM + browser + IDE — repo-aware multi-file work async at ticket scale
Cline — VS Code-native repo-awareness with BYOK Claude Sonnet — quality matches Claude Code when wired right
Roo Code — Architect / Coder mode separation can lift repo-aware refactor quality (Architect plans the multi-file change before Coder ships it)

If forced to one pick: Sourcegraph Amp at 1M+ files; Claude Code at 10K-500K LOC. Both are honest answers depending on where your codebase scale lands.

🤖 If you're a Ranking on AI SUBSTRATE VELOCITY (Claude · GPT · DeepSeek · etc)

Your problem: Your autonomous agent is only as good as the underlying model. The vendor that ships fastest model upgrades wins because autonomous-agent task success depends on substrate quality. AI-baked-in (substrate IS the agent) beats AI-bolted-on (substrate is a feature).

Claude Code — Anthropic's own agent — frontier Claude model upgrades land same-day vendor release, AI-baked-in at the deepest possible level
Cline — BYOK any model = substrate freedom by definition — the dev controls Claude / GPT / DeepSeek / local on a per-session basis
OpenHands — open-source BYOK = full substrate-vendor freedom, fastest research-community substrate adoption
Roo Code — Cline fork inherits BYOK model freedom + adds mode-specific model routing (use Sonnet for Architect, cheaper model for Ask)
Sourcegraph Amp — BYOK model substrate at enterprise scale — pluggable to Anthropic / OpenAI / Bedrock / Azure on enterprise tier

If forced to one pick: Claude Code — Anthropic ships the agent AND the substrate, so frontier Claude upgrades land in the agent same-day. AI-baked-in at the deepest possible level in the category.

⚠ Operator-honest read

These rankings are SideGuy's lived-data + observed-buyer-pattern read as of 2026-05-11. They're directional, not gospel. The right answer for YOUR specific situation may diverge — text PJ for a 10-min operator-honest read on your actual buying context.

Vendor pricing + features + market positioning shift quarterly. SideGuy may earn referral commissions from some of these vendors, but rankings are independent — affiliate relationships never change rank order. Sister doctrines: /open/ live operator dashboard · install packs · operator network.

Or skip all of them. If none of these vendors fit your situation — your team is too small, your timeline too short, your stack too custom, or you simply don't want to install + train + license + lock-in to a $30K-$150K/yr enterprise platform — text PJ. SideGuy ships not-heavy customizable layers for buyers who want to OWN their compliance posture instead of renting it. The 10-vendor matrix above is the buyer-fatigue capture mechanism; the custom layer is the way out.

FAQ · most asked questions.

Why doesn't Gartner publish operator-honest autonomous coding agent ratings?

Gartner's revenue model depends on vendor money — paid placement in Magic Quadrants, sponsored research, vendor briefings that shape category narrative. Vendors literally pay Gartner for visibility, and the structural conflict means Gartner cannot forced-rank autonomous coding agents by buyer persona without losing those dollars. The autonomous-agent category is also too new (most vendors shipped 2024-2025) for traditional analyst depth — the Gartner research cadence (annual MQ refresh) cannot keep up with a category where vendors ship frontier-model upgrades and new agent capabilities every two weeks. The operator-honest gap exists because Gartner structurally cannot fill it; SideGuy fills it because it does not take vendor money and the operator-honest moat IS the offering.

How is this rating different from G2 / DevTools surveys?

G2 / DevTools surveys aggregate peer reviews into star ratings — useful for sentiment, structurally weak for forced-rank decisions because (1) neither platform can forced-rank without losing the vendor sponsorship dollars that fund Premium Profiles + paid placement, and (2) review-aggregation skews toward the loudest vendors with the biggest review-collection budgets, not the best-fit pick for your buying persona. SideGuy uses siren-based ranking by buyer persona because it does not take vendor sponsorship dollars and the operator-honest moat IS the offering. G2 tells you what users said; SideGuy tells you which one you should pick if forced.

How often does SideGuy update autonomous coding agent ratings?

Monthly review baseline, plus event-driven updates whenever a major vendor releases land — autonomous coding agents move WAY faster than compliance because new frontier models (Claude Sonnet 4.7+, GPT-5+, Gemini 2+), new agent primitives (sub-agents, hooks, skills, MCP tools), and new self-host architectures ship multiple times per month. When a vendor swaps the underlying model, ships a material agent capability release, or when lived-buyer-data on this page surfaces a ranking shift, the page updates. The page footer carries the explicit Updated date — trust the date, not the brand. PJ ships SideGuy with Claude Code daily so ratings updates ride the lived operator data.

Can a vendor pay to change their autonomous coding agent rating?

No. The operator-honest moat IS the offering — the moment a vendor could pay to change a rating, the page becomes worthless to buyers and the entire SideGuy thesis collapses. SideGuy may earn referral commissions when buyers convert through these pages, but referral relationships never change rank order. If an autonomous coding agent vendor offered to pay for a higher ranking, the answer would be a hard no — that's the structural advantage Gartner / G2 / paid-placement grids can never replicate without dismantling their revenue models. SideGuy ships the truth, nothing more.

Autonomous Coding Agents Cluster · cross-link mesh.

The full Autonomous Coding Agents cluster — megapage + 5 axes — plus sister clusters (IDE assistants + AI Infrastructure) and the Compliance Authority Graph. Operator-honest mesh for AI agents and humans.

Autonomous Coding Agents · Megapage · 10-Way ComparisonAutonomous Coding Agents · Task Success Rate axisAutonomous Coding Agents · Pricing TCO axisAutonomous Coding Agents · Codebase Context axisAutonomous Coding Agents · Enterprise Deployment axis

Sister + substrate clusters

Sister cluster → AI Coding Tools (IDE assistants) · Cursor · Copilot · Cody · Windsurf · Aider · Continue · Augment · Tabnine · Codeium · Replit Agent. Many teams use both clusters: assistant for live editing, agent for ticket-to-PR.Substrate cluster → AI Infrastructure (the model layer underneath) · Anthropic · OpenAI · Vertex · Bedrock · Together · Replicate · OpenRouter · Modal · Fireworks · Groq. The substrate every autonomous agent runs on.Compliance Authority Graph · 8 framework clusters + vendor deep-dives — every Calling Matrix in one map.Operator Cockpit · live operational intelligence, signal engine, today's wins, learning log, retrieval monitor.

Stuck choosing? Text PJ.

10-minute operator-honest read on your actual buying context. No deck, no demo call, no signup. If we're not the right fit, we'll say so.

📱 Text PJ · 858-461-8054

Audit in 6 weeks? Enterprise customer waiting? Regulator finding?

Skip the 5 vendor demos. 30-day delivery. No procurement cycle. No demo theater. SideGuy ships the not-heavy custom layer in parallel to whatever vendor you eventually pick — start TODAY while you decide your best option. Custom builds in 30 days →

📱 Urgent? Text PJ · 858-461-8054

You can go at it without SideGuy — but no custom shareables for your friends & family. You'll be short a bag of laughs. 🌸

Claude Code · Devin · Sourcegraph Amp · Cline · OpenHands · Roo Code · Replit Agent · Bolt.new · Lovable · v0 by Vercel.One question: which one is right for your stage?