Coder Agents vs Coder Tasks ( a "harness" opinion ) #26048

mdgozza · 2026-06-04T01:56:29Z

mdgozza
Jun 4, 2026

Hey Coder Team!

I'm very excited to see active development on Coder, I personally think Coder can really shape the future of engineering workflows now more than ever before.

I wanted to write a discussion piece to understand your reasoning for introducing your own top level layer of AI development ( Coder Agents ) or probe for some alternatives.

I think that the Agents interfaces is fantastic and very exciting, it's the most ergonomic yet to firing off AI tasks within workspaces.

However, I'm of the opinion that product, even when equiped with a SOTA model, will vastly under preform Claude Code ( or Codex )

Take the recently announced dynamic workflows from Anthropic as an example of something that will never be possible with Coder Agents as it is designed at the moment.

We rolled this out to some of our engineers today ( with Opus 4.8 ) and we have all noticed that the quality of the generated code is extremely sub-par and not mergable, compare contrast with one-shot claude code prompts.

I write this discussion piece to urge you to consider finding some way to wrap/pass-through to the popular coding harnesses. If you do this, we would be very interested in becoming an enterprise customer.

Thanks for you consideration - apologies for the poor formatting and half baked thoughts here. I wanted to get the message across, but didn't want to spend a long time composing a stronger message.

Happy to talk offline if you are interested.

ibetitsmike · 2026-06-04T09:43:02Z

ibetitsmike
Jun 4, 2026
Maintainer

Can you share the model config? If you're getting bad results it might be due to missing thinking config.

1 reply

ibetitsmike Jun 4, 2026
Maintainer

Also - I have used terminal bench with Coder Agents and I am getting better results with Opus models than Claude Code has on the leaderboards.

kuza55 · 2026-06-04T12:48:43Z

kuza55
Jun 4, 2026

I want to chime in and say I agree. What I want from Coder is the infra around the harness so that we can easily run these tools fully on our cloud. That is the thing Coder provides that nobody else does.

Maybe Mux is great, but I think locking the harness tightly to the infra means you're now having to compete at being state of the art for every single use case and benchmark.

Things I would much rather Coder focus on are how to schedule and manage and coordinate autonomous agent execution and human workflows around that rather than trying to replace something lots of other people are working on.

0 replies

bpmct · 2026-06-05T14:28:13Z

bpmct
Jun 5, 2026
Maintainer

👋🏼 I really appreciate the kind words about Coder and your feedback around our direction! This helps make the Coder project better and we've circulated the thread around internally a lot :)

I wanted to share some of our rationale for the Tasks-to-Agents migration and what this means for Coder. The TLDR here is we will never force users to use our harness (Agents) and we are actively working to make Coder work better with any harness (improved lifecycle, support for Cursor self-hosted workers in Cursor, etc). We just didn't believe Tasks was the answer and we also believe that we can do something unique with Agents.

We originally built Tasks because we originally didn't want to build with/compete with another harness and really enjoyed Claude Code, Codex, opencode, etc and were optimistic that over time they would become easier to integrate with. Because these harnesses lacked clear APIs at the time, we built https://github.com/coder/agentapi as a stopgap and a translation layer which basically scraped the stdout (and placed things in stdin) in order to provide a UI and API for communicating with Agents.

Needless to say for anybody who has used it, the UX and architecture of Tasks/AgentAPI was pretty bad and we were never able to fully iron out the bugs. The harnesses would (rightfully) release new features and they would either be unsupported, or break the Tasks integration altogether. It felt very similar to the early days of Coder v1 when we only had hardcoded support for Kubernetes and we had to change our product any time a user needed access to a new/different Kubernetes feature or any other type of compute (Windows, EC2, etc). After a couple of years of pain, we wrote Coder (v2, the Coder we know today) which uses Terraform to let users define their own infra requirements.

While there are cleaner protocols than AgentAPI (like ACP), we still decided that trying to be a "harness wrapping another harness" (which is what I effectively consider Tasks to be) means that we are still giving our users little control over which harness features they can use. For example, a new Claude Code feature (dynamic workflows) may not be available in our API (or in ACP) therefore limiting the benefits of running the harness in Tasks versus standalone. Even if the harnesses were different, because only a subset of features are supported, it'd often appear functionally the same.

It took us a few projects/experiments to land on this conclusion, including the likes of Tasks, Mux, and Blink, as well as the feedback we've gotten from our users and enterprise customers. Most of the feedback we got on Tasks is: "this is great, but the UI/lifecycle is buggy," or "API support and integrations is key for us," and the feedback we got on Mux was "the UI/UX is smooth, but we need it deeply integrated into Coder," which is where Agents came from. We also learned a lot around what it takes to build a harness, and helped us develop the belief that very little "magic" belongs in the harness layer.

Therefore, our strategy is threefold:

Support any CLI/IDE harness to run inside Coder (supported today, little work needed). You basically install claude-code in your workspace or use our (recently refactored) modules. That way, you get all of the support of the harness with no middle-layer. This may sound basic but I believe running Claude Code CLI inside Coder is way better than running it outside Coder on local laptops 🤓
[Coder Agents] Build a first-class harness into Coder: Agents. It may lack the full feature-set of the flagship harnesses (hooks, memory, etc) or be a month or two behind in that support but has a clean UI, API, avoids LLM provider lock-in, and most importantly, allows us (and our users) to demonstrate a full agentic workflow with Coder Workspaces as the infrastructure/background compute layer. For the longest time, we were "locked out" of being able to demo the power of Coder in agentic workflows due to the lack of harness support for background compute and self-hosted runners. We also believe a lot of the power of any given harness comes from the underlying LLM/model and Coder Agents can still be used as a powerful frontend or API for background workflows in Coder.
[New] Integrate/partner with existing harnesses (e.g. Claude Code Web, Codex, Cursor Cloud Agents) that are seperating the logic layer from the sandbox/runtime layer and ensure Coder Workspaces provides the best infrastructure layer for any harness. For example, Cursor Cloud Agents has support for Self-Hosted Workers and we're actively working on an integration to ensure this works well with Coder (and the corresponding workspace/session lifecycle).

The third one is new for us, but it is in progress and (frankly) the core lifecycle of Workspaces regardless of the harness does need to be revisited with agentic workflows in mind. I believe workspaces need to be faster to start up, context-aware of the agentic processes running in them, spin down when not in use, and preserve/replicate context on-demand, all regardless of the harness being used.

I'm personally optimistic we can find a "SSH" or "Terraform" of harness<>compute interactions so Coder can support any harness just like we support any IDE or any compute provider, but even without that we're working towards integrating with the harnesses that do support remote/self-hosted runners.

3 replies

bpmct Jun 5, 2026
Maintainer

👋🏼 Wanted to add too that we're always open to feedback/ideas/POCs of how we can better support some of these things too and like everyone we're still learning a lot about the proper architecture and interfaces. Discussions like this do actually go somewhere and make Coder better

mdgozza Jun 7, 2026
Author

Hey Ben, thank you so much for the thoughtful response!

I'm glad to hear how you guys are thinking about this, and can appreciate the difficulty in making such decisions.

I think to start I'll actually speak to @ibetitsmike,

As it relates to feedback on the quality of the output of the agent generated code:

I don't have a large corpus of examples to pull from yet, but most immediately we noticed the model not transversing the codebase as much as I would expect prior to making edits.

I would have expected Claude Code to spin up Explore subagents to get a deeper understanding before finding the first file that looks like it fit the criteria and opening an MR. ( and/or ask for confirmation )

This caused MRs to be opened targeting the wrong area of the code. I've since added instruction to the Workspace description to encourage the model to read claude.md before making any changes, and from there, there is further instruction for the model to follow moving forward. ( However I'd like to not guide the model too much )

This feedback is admittedly hand-wavy, I only just enabled this feature for our org as we prototype what development off of our local devices might look like. ( An extremely exciting future )

To answer your question specifically, we left all the defaults on the model config, but since you pointed it out to me, I've adjusted our reasoning to go to xhigh up from high as is what Anthropic recommends for coding with 4.8!

I'll come back to this thread over the next week or so and share updates on how we continue to try to configure the feature to hit our intended success metrics.

Now to @bpmct ,

I do see the challenge! The time has never been better for Coder to position itself squarely into the future "agentic" SDLC!
( fwiw, think even bigger, not just code gen, but test runs + output, recorded browser automation artifacts etc. Not to say you're not thinking about this, but man, the future is really changing fast )

I think the native first party UX of Coder Agents is actually top tier conceptually, I do just wish we didn't have to give up some harness first party features.

I'll remain open minded about what might be possible with Coder Agents and continue trying the product.

An alternative I tried that got pretty good, but the ergonomics were a bit of was the following:

We have a script bun coder:new "\goal rewrite the backend in typescript" This would spawn a workspace that auto clones the repos and starts a claude code session with that prompt in a tmux session. Then bun coder:resume {workspacename} would connect to that tmux session using the native api of cmux. From there anyone could interface with the claude code session directly persistently. This works well enough, but scroll is bad.

I thought about building a UI over this, but just haven't had the time.

The one thing we want to do, is see how hard it might be to shift-left development even further, and for that, we would need a UI as nice as what you all are working on with Coder Agents.

Anyways, no actionable take away for you guys here, not directly from me anyways. Appreciate the conversation and if anything strikes me I'll share some feedback, hopefully this will help you keep making coder a killer product! 🫡

bpmct Jun 8, 2026
Maintainer

We have a script bun coder:new "\goal rewrite the backend in typescript" This would spawn a workspace that auto clones the repos and starts a claude code session with that prompt in a tmux session. Then bun coder:resume {workspacename} would connect to that tmux session using the native api of cmux. From there anyone could interface with the claude code session directly persistently. This works well enough, but scroll is bad.

This is super interesting and we have actually had several discussions on how we can make the experience of firing off claude code in workspaces easier. I'm curious, which Claude Code features you are using in this that are missing in agents?

Also, regarding tmux scrolling with Claude Code, I did find a way to improve this with Claude Code config. I don't think it was ever perfect, but it was a significant improvement. Hope this helps: https://gist.github.com/bpmct/9c9d1e9d9629f6a44789d72dc37e7542

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coder Agents vs Coder Tasks ( a "harness" opinion ) #26048

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Coder Agents vs Coder Tasks ( a "harness" opinion ) #26048

Uh oh!

Uh oh!

mdgozza Jun 4, 2026

Replies: 3 comments · 4 replies

Uh oh!

ibetitsmike Jun 4, 2026 Maintainer

Uh oh!

ibetitsmike Jun 4, 2026 Maintainer

Uh oh!

kuza55 Jun 4, 2026

Uh oh!

Uh oh!

bpmct Jun 5, 2026 Maintainer

Uh oh!

bpmct Jun 5, 2026 Maintainer

Uh oh!

mdgozza Jun 7, 2026 Author

Uh oh!

bpmct Jun 8, 2026 Maintainer

mdgozza
Jun 4, 2026

Replies: 3 comments 4 replies

ibetitsmike
Jun 4, 2026
Maintainer

ibetitsmike Jun 4, 2026
Maintainer

kuza55
Jun 4, 2026

bpmct
Jun 5, 2026
Maintainer

bpmct Jun 5, 2026
Maintainer

mdgozza Jun 7, 2026
Author

bpmct Jun 8, 2026
Maintainer