Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB

▲

Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB(github.com)

275 points byquesomaster90009 hours ago |26 comments

This couldn't be more perfectly timed .. I have an Unreal Engine game with both VT100 terminals (for running coding agents) and Z80 emulators, and a serial bridge that allows coding agents to program the CP/M machines:

https://i.imgur.com/6TRe1NE.png

Thank you for posting! It's unbelievable how someone sometimes just drops something that fits right into what you're doing. However bizarre it seems.

▲quesomaster90006 hours ago

Oh dear, it seems we've... somehow been psychically linked...

I developed a browser-based CP/M emulator & IDE: https://lockboot.github.io/desktop/

I was going to post that instead, but wanted a 'cool demo' instead, and fell down the rabbit hole.

▲jaak2 hours ago

I've been playing the Z80-μLM demos in your CP/M emulator. Works great! However, I have yet to guess a correct answer in GUESS.COM! I'm not sure if I'm just not asking the right questions or I'm just really bad at it!

▲quesomaster90001 hour ago

Don't tell anybody, but you sit on it

▲sixtyj5 hours ago

Connections: Alternative History of Technology by James Burke documents these "coincidences".

▲TeMPOraL5 hours ago

Those "coincidences" in Connections are really no coincidence at all, but path dependence. Breakthrough advance A is impossible or useless without prerequisites B and C and economic conditions D, but once B and C and D are in place, A becomes obvious next step.

▲embedding-shape3 hours ago

Some of those really are coincidences, like "Person A couldn't find their left shoe and ended up in London at a coffee house, where Person B accidentally ended up when their carriage hit a wall, which lead to them eventually coming up with Invention C" for example.

Although from what I remember from the TV show, most of what he investigates/talks about is indeed path dependence in one way or another, although not everything was like that.

▲simonjgreen4 hours ago

Super intrigued but annoyingly I can’t view imgur here

▲abanana1 hour ago

Indeed, part of me wants to not use imgur because we can't access it, but a bigger part of me fully supports imgur's decision to give the middle finger to the UK after our government's censorship overreach.

▲wizzwizz420 minutes ago

It was a really clever move on Imgur's part. Their blocking the UK has nothing to do with the Online Safety Act: it's a response to potential prosecution under the Data Protection Act, for Imgur's (alleged) unlawful use of children's personal data. By blocking the UK and not clearly stating why, people assume they're taking a principled stand about a different issue entirely, so what should be a scandal is transmuted into positive press.

▲rahen4 hours ago

I love it, instant Github star. I wrote an MLP in Fortran IV for a punched card machine from the sixties (https://github.com/dbrll/Xortran), so this really speaks to me.

The interaction is surprisingly good despite the lack of attention mechanism and the limitation of the "context" to trigrams from the last sentence.

This could have worked on 60s-era hardware and would have completely changed the world (and science fiction) back then. Great job.

▲noosphr3 hours ago

Stuff like this is fascinating. Truly the road not taken.

Tin foil hat on: i think that a huge part of the major buyout of ram from AI companies is to keep people from realising that we are essentially at the home computer revolution stage of llms. I have a 1tb ram machine which with custom agents outperforms all the proprietary models. It's private, secure and won't let me be motetized.

▲Zacharias0302 hours ago

how so? sound like you are running Kimi K2 / GLM? What agents do you give it and how do you handle web search and computer use well?

▲gcanyon1 hour ago

So it seems like with the right code (and maybe a ton of future infrastructure for training?) Eliza could have been much more capable back in the day.

▲Dwedit7 hours ago

In before AI companies buy up all the Z80s and raise the prices to new heights.

▲nubinetwork4 hours ago

Too late, they stopped being available last year.

▲whobre2 hours ago

Kind of. There’s still eZ80

▲vedmakk7 hours ago

If one would train an actual secret (e.g. a passphrase) into such a model, that a user would need to guess by asking the right questions. Could this secret be easily reverse engineered / inferred by having access to models weights - or would it be safe to assume that one could only get to the secret by asking the right questions?

▲Kiboneu6 hours ago

I don’t know, but your question reminds me of this paper which seems to address it on a lower level: https://arxiv.org/abs/2204.06974

“Planting Undetectable Backdoors in Machine Learning Models”

“ … On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. …”

▲ronsor6 hours ago

> this secret be easily reverse engineered / inferred by having access to models weights

It could with a network this small. More generally this falls under "interpretability."

▲roygbiv27 hours ago

Awesome. I've just designed and built my own z80 computer, though right now it has 32kb ROM and 32kb RAM. This will definitely change on the next revision so I'll be sure to try it out.

▲wewewedxfgdf6 hours ago

RAM is very expensive right now.

▲wickedsight3 hours ago

I just removed 128 megs of RAM from an old computer and am considering listing it on eBay to pay off my mortgage.

▲nrhrjrjrjtntbt2 hours ago

I wonder what year past 128M ram would pay off mortgage. Maybe 1985

▲tgv6 hours ago

We're talking kilobytes, not gigabytes. And it isn't DDR5 either.

▲boomlinde4 hours ago

Yeah, even an average household can afford 40k of slow DRAM if they cut down on luxuries like food and housing.

▲wewewedxfgdf3 hours ago

Maybe the rich can but not all retro computer enthusiasts are rich.

▲charcircuit2 hours ago

If you can afford to spend a few dollars without sacrificing housing or food, you are being financial irresponsible.

▲ant6n2 hours ago

Busy cut down on the avocado toast!

▲nrhrjrjrjtntbt2 hours ago

Then I can afford eggs, ram and a studio appartment!

▲StilesCrisis51 minutes ago

thats-the-joke.gif

▲orbital-decay5 hours ago

Pretty cool! I wish free-input RPGs of old had fuzzy matchers. They worked by exact keyword matching and it was awkward. I think the last game of that kind (where you could input arbitrary text when talking to NPCs) was probably Wizardry 8 (2001).

▲Peteragain5 hours ago

There are two things happening here. A really small LLM mechanism which is useful for thinking about how the big ones work, and a reference to the well known phenomenon, commonly dismissively referred to as a "trick", in which humans want to believe. We work hard to account for what our conversational partner says. Language in use is a collective cultural construct. By this view the real question is how and why we humans understand an utterance in a particular way. Eliza, Parry, and the Chomsky bot at http://chomskybot.com work on this principle. Just sayin'.

▲nrhrjrjrjtntbt2 hours ago

MAYBE

▲bartread1 hour ago

This is excellent. Thing I’d like to do if I had time: get it running on a 48K Spectrum. 10 year old me would have found that absolutely magical back in the 1980s.

▲tomduncalf1 hour ago

This was my first thought too haha. That would be mind blowing

▲bartread1 hour ago

Yeah, very WarGames.

▲bitwize30 minutes ago

Don't be surprised if you're paid a visit by the SCP Foundation: https://scp-db.fandom.com/wiki/SCP-079

▲anonzzzies5 hours ago

Luckily I have a very large amount of MSX computers, zx, amstrad cpc etc and even one multiprocessor z80 cp/m machine for the real power. Wonder how gnarly this is going to perform with bankswitching though. Probably not good.

▲Zee27 hours ago

This is super cool. Would love to see a Z80 simulator set up with these examples to play with!

▲Imustaskforhelp4 hours ago

100% Please do this! I wish the same

▲andrepd3 hours ago

We should show this every time a Slack/Teams/Jira engineer tries to explain to us why a text chat needs 1.5GB of ram to start up.

▲dangus3 hours ago

> It won't write your emails, but it can be trained to play a stripped down version of 20 Questions, and is sometimes able to maintain the illusion of having simple but terse conversations with a distinct personality.

You can buy a kid’s tiger electronics style toy that plays 20 questions.

It’s not like this LLM is bastion of glorious efficiency, it’s just stripped down to fit on the hardware.

Slack/Teams handles company-wide video calls and can render anything a web browser can, and they run an entire App Store of apps, all from a cross-platform application.

Including Jira in the conversation doesn’t even make logical sense. It’s not a desktop application that consumes memory. Jira has such a wide scope that the word “Jira” doesn’t even describe a single product.

▲ben_w1 hour ago

> Slack/Teams handles company-wide video calls and can render anything a web browser can, and they run an entire App Store of apps, all from a cross-platform application.

The 4th Gen iPod touch had 256 meg of RAM and also did those things, with video calling via FaceTime (and probably others, but I don't care). Well, except "cross platform", what with it being the platform.

▲messe2 hours ago

> can render anything a web browser can

That's a bug not a feature, and strongly coupled to the root cause for slack's bloat.

▲andrepd2 hours ago

My Pentium 3 in 2005 could do chat and video calls and play chess and send silly emotes. There is no conceivable user-facing reason why in 20 years the same functionality takes 30× as many resources, only developer-facing reasons. But those are not valid reasons for a professional. If a bridge engineer claims he now needs 30× as much concrete to build the same bridge as he did 20 years ago, and the reason is his/her own conveinence, that would not fly.

▲ben_w1 hour ago

> If a bridge engineer claims he now needs 30× as much concrete to build the same bridge as he did 20 years ago, and the reason is his/her own conveinence, that would not fly.

By itself, I would agree.

However, in this metaphor, concrete got 15x cheaper in the same timeframe. Not enough to fully compensate for the difference, but enough that a whole generation are now used to much larger edifices.

▲andrepd12 minutes ago

So it means you could save your client 93% of their money in concrete, but you choose to make it 2× more expensive! That only makes my metaphor stronger ahaha.

▲vatary5 hours ago

It's pretty obvious this is just a stress test for compressing and running LLMs. It doesn't have much practical use right now, but it shows us that IoT devices are gonna have built-in LLMs really soon. It's a huge leap in intelligence—kind of like the jump from apes to humans. That is seriously cool.

▲acosmism5 hours ago

i'll echo that practicality only surfaces once it is apparent what can be done. yea this feels like running "DOOM on pregnancy test devices" type of moment

▲jacquesm4 hours ago

Between this and RAM prices Zilog stock must be up! Awesome hack. Now apply the same principles to a laptop and take a megabyte or so, see what that does.

▲a_t486 hours ago

Nice - that will fit on a Gameboy cartridge, though bank switching might make it super terrible to run. Each bank is only 16k. You can have a bunch of them, but you can only access one bank at a time (well, technically two - bank 0 is IIRC always accessible).

▲ant6n2 hours ago

You have 32KB of ROM, plus 8 Kb of ram on original game boy. Game boy color has more. Bank switching is super fast, as well. Given that models are likely streamed, I doubt the bank switching is a problem.

Biggest pain point is likely the text input.

▲magicalhippo6 hours ago

As far as I know, the last layer is very quantization-sensitive, and is typically not quantized, or quantized lightly.

Have you experimented with having it less quantized, and evaluated the quality drop?

Regardless, very cool project.

▲kouteiheika5 hours ago

(Not OP)

It depends on the model, but from my experiments (quantizing one layer of a model to 2-bit and then training the model with that layer in 2-bit to fix the damage) the first layer is the most sensitive, and yes, the last layer is also sensitive too. The middle layers take the best to quantization.

Different components of a layer also have a different sensitivity; e.g. the MLP downscale block damages the model the most when quantized, while quantizing the Q projection in self attention damages the model the least.

▲jasonjmcghee7 hours ago

For future projects and/or for this project, there are many LLMs available more than good enough to generate that kind of synthetic data (20 Qs) with permissive terms of use. (So you don’t need to stress about breaking TOS / C&D etc)

▲Y_Y2 hours ago

Very cool. Did you consider using sparse weights?

▲alfiedotwtf7 hours ago

An LLM in a .com file? Haha made my day

▲teaearlgraycold6 hours ago

SLM

▲quesomaster90006 hours ago

All the 'Small' language models and the 'TinyML' scene in general tend to bottom out at a million parameters, hence I though 'micro' is more apt at ~150k params.

▲bytesandbits2 hours ago

it's giving Eliza! Ha, fun

▲pdyc7 hours ago

interesting, i am wondering how far can it go if we remove some of these limitations but try to solve some extremely specific problem like generating regex based on user input? i know small models(270M range) can do that but can it be done in say < 10MB range?

▲Waterluvian6 hours ago

Generate an LLM that is designed to solve one extremely specific problem: answering the ultimate question of life, the universe, and everything.

Even with modern supercomputing the computation would be outpaced by the heat death of the universe, so token output must be limited to a single integer.

▲nrhrjrjrjtntbt2 hours ago

00101010

▲dirkt6 hours ago

Eliza's granddaughter.

▲NooneAtAll35 hours ago

did you measure token/s?

▲Zardoz846 hours ago

Meanwhile, Eliza was ported to BASIC and was run on many home computers in the 80s.

▲codetiger7 hours ago

Imagine, this working on a Gameboy, in those days. Would've sounded like magic

▲Sharlin7 hours ago

I don’t think this could beat an ELIZA-style bot in how magical it feels, given the extreme terseness of its replies.

▲numpad03 hours ago

Flip phones had predictive texts since forever. LLMs are just* supercharged predi[ctive text algorithms are computer algorithms that are]

▲lodovic6 hours ago

I love these thought experiments. Looking at the code size, it would have been possible for someone to come up with this back in the days, similar to the idea of a million monkeys on a typewriter eventually producing Shakespeare.

▲alfiedotwtf7 hours ago

And would have lasted 3 minutes.

Speaking of - I remember my first digital camera (Fujitsu 1Mb resolution using SmartMedia)… it used so much power that you could take 20-30 photos and then needed to replace all 4 batteries lol