Streaming AI agent desktops with gaming protocols

▲

Streaming AI agent desktops with gaming protocols(blog.helix.ml)

71 points byquesobob10 days ago |15 comments

I find this very curious. I don’t think agents care about UI, humans do. So in the end the UI is not required. As soon as the ai can get into the physical world. The whole IT world is done for. All of this will be automated away. The IT and CS only ever started to make us more productive more connected to improve our physical well being. When we don’t need to touch computers anymore there is no need for …

▲lewq1 hour ago

Vision language models have been trained on how to operate human UIs though, so at least for a while, computer use will be an interesting area to explore. I think debugging web apps and building UIs is a particularly fruitful area for this

▲jarym1 hour ago

Trying to do something similar but using kasm[0] as the backend.

[0] https://kasm.com

▲lewq1 hour ago

Fascinating, wanna compare notes on a call some time?

▲vladgur3 hours ago

I’m curious how far are we from giving coding agents access to these desktop agents so that when we are using say Claude Code to build a native desktop app, the coding agents can actually see and act on the desktop UI that it is building

▲drphilwinder3 hours ago

This is a great point. Not that far. We also snapshot the desktop for "slow" non-streaming updates to the UI. We could push these into Claude itself to act on or describe or whatever.

▲jsight1 hour ago

For web apps, I'd guess that many of us already do that via Playwright or other MCPs. I'd bet there are people doing something similar with desktop apps too.

▲lewq2 hours ago

That's the next move :-D

▲lewq7 hours ago

Author of helix code here. Here's a demo of the full system working. https://youtu.be/vVmnpcnLDGM?si=b6LxW6lmM7843LY0

We're opening the private beta where we provide a hosted environment for testing, or you can install the latest Helix release and run the installer with --code to try it on your own GPUs

▲lewq6 hours ago

https://github.com/helixml/helix/releases/tag/2.5.3

▲lewq7 hours ago

Join our discord for the beta https://discord.com/invite/VJftd844GE

▲theelix8 hours ago

Moonlight-Web? I guess it's https://github.com/MrCreativ3001/moonlight-web-stream but there's no information in the article

> Moonlight expects: Each client connects to start their own private game session

Nope, it's a Wolf design choice, eg. Sunshine allows users to concorrenly connect to the same instance/game

▲lewq6 hours ago

Wolf now supports multiple clients connecting to the same session via the wolf-ui branch that landed recently. After lots of stability work we are now running that mode in production (and in the latest release) https://github.com/helixml/helix/releases/tag/2.5.3

▲kaspermarstal3 hours ago

> The Wolf maintainer has done heroic work ...

I commend the fact they acknowledge the maintainer's work, but seeing the singular 'maintainer', I can't help but notice the weight on that one person's shoulders.

▲lewq2 hours ago

I should have said creator. He seems to have a healthy community backing him, but we should ask him!

▲_pdp_7 hours ago

IMHO, the goals is not to have to watch what agents do and let them do the work.

I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.

▲lewq7 hours ago

Yeah that's what we did :) https://youtu.be/vVmnpcnLDGM?si=b6LxW6lmM7843LY0

▲majormajor3 hours ago

>I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.

Seems difficult to research better autonomy without extensive monitoring. You need specific data on before/after effects of changes, for instance.

▲asmor9 hours ago

> because we’re all going to become managers of coding agents whether we like it or not

I will join the woodworking people before that happens, thanks.

▲DrewADesign8 hours ago

A career change that left me as a recent graduate in a decimated marketplace missing the bottom ten rungs on the ladder and no interest in getting back into the software world has led me to advanced manufacturing as a metal worker. I code a little, move heavy steel pieces periodically which is a nice way to break up the standing/sitting but not nearly as much as a general laborer, solve lots of problems, keep my trigonometry muscles toned, am forced to take breaks, get paid for my overtime, there’s a union that the company ownership is totally willing to work with, and when I’m not at work, work isn’t with me. There’s something very satisfying about leaving work with exercised muscles, smelling slightly of cutting oil. The money sucks comparatively so early in my career, but the rate increases more for performance than seniority so its rising quickly, the benefits are good, the career trajectory is pointing upwards, and longevity-wise, it’s certainly a whole lot better than gig work.

There’s a huge crisis in US manufacturing: we’re bleeding craft knowledge because off-shoring let companies hire existing experienced workers for decades, so they never had to train a new generation of tradespeople. Now all those folks are dying and retiring and they need people to pick up that deep knowledge quickly. Codifying and automating is going to kill jobs either way, but one factory employing a few people making things for other factories with local materials is better than everything perpetually shifting to the cheap labor market du jour. I’m feeling much more optimistic about the future of this than the future of tech careers.

I think over the next few years, a very large percentage of folks in tech will find themselves on the other side of the fence, quickly realize that their existing expertise doesn’t qualify them for any other white collar jobs where vibe coding experience is a bullet point in the pluses section, that tech consulting is declining even faster than salaried jobs, and that they’re vastly less qualified than the competition for blue collar jobs. Gonna be a rough road for a lot of folks. I wouldn’t invest in SF real estate any time soon.

▲seemaze6 hours ago

There’s 1,000 established industries that don’t offer the rapid growth and pay outs of the modern tech ecosystem. I’m excited to see some of the current industrial backwaters soak up technical talent freed up by the SV AI brain drain.

To think we’ve handsomely paid our best and brightest the last few decades in pursuit of.. advertising?

▲majormajor3 hours ago

> To think we’ve handsomely paid our best and brightest the last few decades in pursuit of.. advertising?

I think "efficiency" is more accurate there. Even post-Google/ad-tech-boom the overall trends that started decades earlier continued to be: (1) faster turnaround time on communications, (2) faster delivery of result artifacts, (3) faster knowledge of changes in the market and faster response.

Advertising is a particularly visible field with lots of money to throw at those things (active investment trading is another). But practically every other industry has chased those same things as well, all the way down to things like parking meters.

Personally I'm not convinced that this is such a great thing anyway - does anyone enjoy their boss messaging you at 11PM on any day they want whenever they get the fancy? - but that's the larger reason so much brainpower has been invested into it.

▲DrewADesign6 hours ago

Hopefully the exodus from the tech industry won't kill demand for too many job markets that are close comfortable cousins to the tech world.

▲tomnipotent1 hour ago

> don’t offer the rapid growth

How are these industries going to absorb new headcount without the revenue to support it?

▲danielbln8 hours ago

You will always be able to produce artisanal hand-set code, same as how artisanal woodworking exists alongside industrial manufacturing. There will be a lot less demand for it, and compensation will align accordingly, but it won't go away.

▲dude2507111 hour ago

Either the crap truly works and nobody is needed or it does not work. Where is this half-arsed human-agent hybrid vision coming from? The land of plateaued LLM gains?

▲eisbaw1 hour ago

xpra has video streaming and allows for sharing

▲lewq1 hour ago

Interesting, thanks!

▲mxkopy1 hour ago

I’ve also independently concluded Moonlight was the best way to go after trying my hand at a very similar task. I didn’t want to dig through moonlight’s source, but I’m sure if you’re dedicated enough it would pay dividends later on, it basically does everything you’d need for realtime control in the setting of simulating human input.

▲vladgur1 hour ago

Another question regarding Helix - its being built as a platform for private air-gap-ready ai agents that can work against private LLM models.

Are there appliances or easy to deploy hardware that allow one to run these private models on-premise vs cloud

▲lewq1 hour ago

Hey! Yeah we are working with partners on fully integrated hardware+software stack for this. We particularly like the RTX 6000 Pro Blackwell chips for this

▲momocowcow6 hours ago

What’s the most intricate system that’s been written with this?

▲lewq6 hours ago

Itself, more recently

▲reactordev6 hours ago

Whilst impressive to “bend a protocol to your will”, why did you not just take Moonlight and build on top of it, making your own?

No shoehorns needed. Just take what you like and build what you need.

▲lewq6 hours ago

It's nice for unmodified moonlight clients to be able to connect - they have tons of them, you can even run it on a Nintendo DS

▲reactordev6 hours ago

But is the ability to run it on the DS a feature? I highly doubt it.

I’m not trashing anything, I’m just saying that if they focused on what their market is, it would be clear no one is going to be coding/working on a Nintendo DS.

▲lewq6 hours ago

I suppose but we got it working, and the primary interface is webrtc in the browser, and going via moonlight internally is just an implementation detail that got us here quickly. We are open to refactoring in the future of course :)

▲luizfwolf4 hours ago

Hi, quite interesting project but have a hard time to understand why would stream a desktop.

From my (ignorant) understanding, the important part is the context of the LLM in the task. Some conversations you need visuals, some you don't. What's the advantage of giving a full desktop streaming instead of using integrations?

▲lewq4 hours ago

There's also value in being able to run multiple agents in parallel with their own isolated filesystems and runtimes. One agent won't tread on the toes of another whatever they do. You can let it loose and it doesn't matter if it breaks something, you can just spin up another one

▲lewq4 hours ago

Mainly so you can give the agent access to the desktop as well. Then it can debug your web app in Chrome Dev tools but also you can pair with it with streaming that is so good it feels local

▲CuriouslyC6 hours ago

This is beautiful madness.