I find this very curious. I don’t think agents care about UI, humans do. So in the end the UI is not required. As soon as the ai can get into the physical world. The whole IT world is done for. All of this will be automated away. The IT and CS only ever started to make us more productive more connected to improve our physical well being. When we don’t need to touch computers anymore there is no need for …
Vision language models have been trained on how to operate human UIs though, so at least for a while, computer use will be an interesting area to explore. I think debugging web apps and building UIs is a particularly fruitful area for this
I’m curious how far are we from giving coding agents access to these desktop agents so that when we are using say Claude Code to build a native desktop app, the coding agents can actually see and act on the desktop UI that it is building
This is a great point. Not that far. We also snapshot the desktop for "slow" non-streaming updates to the UI. We could push these into Claude itself to act on or describe or whatever.
For web apps, I'd guess that many of us already do that via Playwright or other MCPs. I'd bet there are people doing something similar with desktop apps too.
We're opening the private beta where we provide a hosted environment for testing, or you can install the latest Helix release and run the installer with --code to try it on your own GPUs
Wolf now supports multiple clients connecting to the same session via the wolf-ui branch that landed recently. After lots of stability work we are now running that mode in production (and in the latest release) https://github.com/helixml/helix/releases/tag/2.5.3
I commend the fact they acknowledge the maintainer's work, but seeing the singular 'maintainer', I can't help but notice the weight on that one person's shoulders.
IMHO, the goals is not to have to watch what agents do and let them do the work.
I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.
>I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.
Seems difficult to research better autonomy without extensive monitoring. You need specific data on before/after effects of changes, for instance.
A career change that left me as a recent graduate in a decimated marketplace missing the bottom ten rungs on the ladder and no interest in getting back into the software world has led me to advanced manufacturing as a metal worker. I code a little, move heavy steel pieces periodically which is a nice way to break up the standing/sitting but not nearly as much as a general laborer, solve lots of problems, keep my trigonometry muscles toned, am forced to take breaks, get paid for my overtime, there’s a union that the company ownership is totally willing to work with, and when I’m not at work, work isn’t with me. There’s something very satisfying about leaving work with exercised muscles, smelling slightly of cutting oil. The money sucks comparatively so early in my career, but the rate increases more for performance than seniority so its rising quickly, the benefits are good, the career trajectory is pointing upwards, and longevity-wise, it’s certainly a whole lot better than gig work.
There’s a huge crisis in US manufacturing: we’re bleeding craft knowledge because off-shoring let companies hire existing experienced workers for decades, so they never had to train a new generation of tradespeople. Now all those folks are dying and retiring and they need people to pick up that deep knowledge quickly. Codifying and automating is going to kill jobs either way, but one factory employing a few people making things for other factories with local materials is better than everything perpetually shifting to the cheap labor market du jour. I’m feeling much more optimistic about the future of this than the future of tech careers.
I think over the next few years, a very large percentage of folks in tech will find themselves on the other side of the fence, quickly realize that their existing expertise doesn’t qualify them for any other white collar jobs where vibe coding experience is a bullet point in the pluses section, that tech consulting is declining even faster than salaried jobs, and that they’re vastly less qualified than the competition for blue collar jobs. Gonna be a rough road for a lot of folks. I wouldn’t invest in SF real estate any time soon.
There’s 1,000 established industries that don’t offer the rapid growth and pay outs of the modern tech ecosystem. I’m excited to see some of the current industrial backwaters soak up technical talent freed up by the SV AI brain drain.
To think we’ve handsomely paid our best and brightest the last few decades in pursuit of.. advertising?
> To think we’ve handsomely paid our best and brightest the last few decades in pursuit of.. advertising?
I think "efficiency" is more accurate there. Even post-Google/ad-tech-boom the overall trends that started decades earlier continued to be: (1) faster turnaround time on communications, (2) faster delivery of result artifacts, (3) faster knowledge of changes in the market and faster response.
Advertising is a particularly visible field with lots of money to throw at those things (active investment trading is another). But practically every other industry has chased those same things as well, all the way down to things like parking meters.
Personally I'm not convinced that this is such a great thing anyway - does anyone enjoy their boss messaging you at 11PM on any day they want whenever they get the fancy? - but that's the larger reason so much brainpower has been invested into it.
You will always be able to produce artisanal hand-set code, same as how artisanal woodworking exists alongside industrial manufacturing. There will be a lot less demand for it, and compensation will align accordingly, but it won't go away.
Either the crap truly works and nobody is needed or it does not work. Where is this half-arsed human-agent hybrid vision coming from? The land of plateaued LLM gains?
I’ve also independently concluded Moonlight was the best way to go after trying my hand at a very similar task. I didn’t want to dig through moonlight’s source, but I’m sure if you’re dedicated enough it would pay dividends later on, it basically does everything you’d need for realtime control in the setting of simulating human input.
Hey! Yeah we are working with partners on fully integrated hardware+software stack for this. We particularly like the RTX 6000 Pro Blackwell chips for this
But is the ability to run it on the DS a feature? I highly doubt it.
I’m not trashing anything, I’m just saying that if they focused on what their market is, it would be clear no one is going to be coding/working on a Nintendo DS.
I suppose but we got it working, and the primary interface is webrtc in the browser, and going via moonlight internally is just an implementation detail that got us here quickly. We are open to refactoring in the future of course :)
Hi, quite interesting project but have a hard time to understand why would stream a desktop.
From my (ignorant) understanding, the important part is the context of the LLM in the task. Some conversations you need visuals, some you don't. What's the advantage of giving a full desktop streaming instead of using integrations?
There's also value in being able to run multiple agents in parallel with their own isolated filesystems and runtimes. One agent won't tread on the toes of another whatever they do. You can let it loose and it doesn't matter if it breaks something, you can just spin up another one
Mainly so you can give the agent access to the desktop as well. Then it can debug your web app in Chrome Dev tools but also you can pair with it with streaming that is so good it feels local
[0] https://kasm.com
We're opening the private beta where we provide a hosted environment for testing, or you can install the latest Helix release and run the installer with --code to try it on your own GPUs
> Moonlight expects: Each client connects to start their own private game session
Nope, it's a Wolf design choice, eg. Sunshine allows users to concorrenly connect to the same instance/game
I commend the fact they acknowledge the maintainer's work, but seeing the singular 'maintainer', I can't help but notice the weight on that one person's shoulders.
I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.
Seems difficult to research better autonomy without extensive monitoring. You need specific data on before/after effects of changes, for instance.
I will join the woodworking people before that happens, thanks.
There’s a huge crisis in US manufacturing: we’re bleeding craft knowledge because off-shoring let companies hire existing experienced workers for decades, so they never had to train a new generation of tradespeople. Now all those folks are dying and retiring and they need people to pick up that deep knowledge quickly. Codifying and automating is going to kill jobs either way, but one factory employing a few people making things for other factories with local materials is better than everything perpetually shifting to the cheap labor market du jour. I’m feeling much more optimistic about the future of this than the future of tech careers.
I think over the next few years, a very large percentage of folks in tech will find themselves on the other side of the fence, quickly realize that their existing expertise doesn’t qualify them for any other white collar jobs where vibe coding experience is a bullet point in the pluses section, that tech consulting is declining even faster than salaried jobs, and that they’re vastly less qualified than the competition for blue collar jobs. Gonna be a rough road for a lot of folks. I wouldn’t invest in SF real estate any time soon.
To think we’ve handsomely paid our best and brightest the last few decades in pursuit of.. advertising?
I think "efficiency" is more accurate there. Even post-Google/ad-tech-boom the overall trends that started decades earlier continued to be: (1) faster turnaround time on communications, (2) faster delivery of result artifacts, (3) faster knowledge of changes in the market and faster response.
Advertising is a particularly visible field with lots of money to throw at those things (active investment trading is another). But practically every other industry has chased those same things as well, all the way down to things like parking meters.
Personally I'm not convinced that this is such a great thing anyway - does anyone enjoy their boss messaging you at 11PM on any day they want whenever they get the fancy? - but that's the larger reason so much brainpower has been invested into it.
How are these industries going to absorb new headcount without the revenue to support it?
Are there appliances or easy to deploy hardware that allow one to run these private models on-premise vs cloud
No shoehorns needed. Just take what you like and build what you need.
I’m not trashing anything, I’m just saying that if they focused on what their market is, it would be clear no one is going to be coding/working on a Nintendo DS.
From my (ignorant) understanding, the important part is the context of the LLM in the task. Some conversations you need visuals, some you don't. What's the advantage of giving a full desktop streaming instead of using integrations?