Executing programs inside transformers with exponentially faster inference

▲

Executing programs inside transformers with exponentially faster inference(percepta.ai)

114 points byu1hcw9nx1 day ago |13 comments

▲bonoboTP1 hour ago

This shows the downside of using AI to write up your project. I see the eloquent sentences, but don't get the message.

> This works, but the actual execution happened outside the model. The model specified the computation, then waited for an external system to carry it out. > Our transformer also emits a program, but instead of pausing for an external tool, it executes that program itself, step by step, within the same transformer.

What's the benefit? Is it speed? Where are the benchmarks? Is it that you can backprop through this computation? Do you do so?

Why is it good that it's "inside" the model? Just making it more elegant and nice? The tool was already "inside" the overall hybrid system. What's the actual problem?

▲famouswaffles1 hour ago

>This shows the downside of using AI to write up your project. I see the eloquent sentences, but don't get the message.

Not really sure what this obsession with calling things you don't like AI generated is but it's poor form. If you have something to say about the text then say it. Otherwise leave baseless accusations out of it.

>What's the benefit? Is it speed? Where are the benchmarks? Is it that you can backprop through this computation? Do you do so?....

It's pretty clearly an ideological thing. Some people are firmly on the 'some sort of symbolic logic is necessary' camp. From the article, 'A system that cannot compute cannot truly internalize what computation is.'

Some things are just interesting for the sake of it. This is one of those things. I don't agree with the authors on the above and I'm still glad they shared. It's a very interesting read regardless.

▲bonoboTP49 minutes ago

> If you have something to say about the text then say it.

I could point out the individual phrases and describe the overall impression in detail, or I can just compactly communicate that by using the phrase "AI". If it bothers you, read it as "AI-like", so there is a pretension.

I have no problem with using AI for writing. I do it too, especially for documentation. But you need to read it and iterate with it and give it enough raw input context. If you don't give it info about your actual goals, intentions, judgments etc, the AI will substitute some washed-out, averaged-out no-meat-on-the-bone fluff that may sound good at first read and give you a warm wow-effect that makes you hit publish, but you read into it all the context that you have in your head, but readers don't have that.

Formatting and language is cheap now. We need a new culture around calling out sloppy work. You would not have had a problem with calling out a badly composed rambling article 5 years ago. But today you can easily slap an AI filter on it that will make it look grammatical and feel narratively engaging, now it's all about deeper content. But if one points that out, replies can always say "oh, you can't prove that, can you?"

▲famouswaffles33 minutes ago

>"This shows the downside of using AI to write up your project."

I just find phrases like this a bit obnoxious at times.

>You would not have had a problem with calling out a badly composed rambling article 5 years ago.

Then why not just say that? It's rambling bla bla bla. What's so hard about that? Why invent a reason for issues, as if rambling articles didn't get written 5 years ago.

Like No, being written by an LLM or not is not the reason the article has no benchmarks or interpretability results. Those things would be there regardless if the author was interested in that, so again, it just seems there's little point in making such assertions.

▲bonoboTP28 minutes ago

It's very hard to discuss this. To some people it's obvious, to some it isn't. To me, every single paragraphs is obvious fluff AI writing. One problem with it is the repetitiveness and the schmoozing salesman feel. The other is the lack of benchmarks and stuff. It's both. The two are connected because the AI has to lean in to its bullshitter persona when it's not given enough raw material to write up something strong. But whenever an AI writes in its default voice like this, it also indicates that the context was not well curated.

But anyway, yes, I can also just move on to the next article. Most of the time I indeed do that.

▲furyofantares30 minutes ago

I'm glad they shared too! Wish they shared without letting the LLM process it so heavily, it makes it too hard to read, it gives monotone importance to every piece of text. Mostly it does this by bringing everything up to a slight over-importance with tone and fluff language, and by turning everything into dry statements of fact.

As to why people call this out without going into great detail about the problems with the actual text, it's because this is happening all over the place and it's very disrespectful to readers, who dig into an article that looks very well written on the surface, only to discover it's a lot of labor to decode and often (but not always) a total waste of time. Asking for a critical report of the text is asking even more of a reader who already feels duped.

▲stalfie18 minutes ago

This is a nice case study of the downside of creating explicit policies of "no AI comments" without a technical method of enforcing it. I am sure the hacker news comment quality will suffer almost as much from an escalating culture of accusation and paranoia that it will from LLM comment themselves.

▲entropi1 hour ago

I got the same impression as the parent post. Even if its not AI-generated, the text reads like a politician's speech at a lot of places. Talks a lot, says little.

The idea itself was very cool, so I endured it. But it was not a pleasant read.

▲andy12_1 hour ago

Honestly, the most interesting thing here is definitely that just 2D heads are enough to do useful computation (at least they are enough to simulate an interpreter) and that there is an O(log n) algorithm to compute argmax attention with 2D heads. It seems that you could make an efficient pseudosymbolic LLM with some frozen layers that perform certain deterministic operations, but also other layers that are learned.

▲btown47 minutes ago

This seems way cooler than just computation (which is easy to hand off to a tool, and arguably more predictable that way). The broader point here is that you can have your model switch dynamically to/from a kind of attention that scales with the log of the token count, by only exploring the convex hull in a 2D space. A less capable version of attention, to be sure, but one capable of tracing a program’s execution with text representations of registers and stack - which is a meaningful level of flexibility, and one many humans would find difficult to do reliably!

What could you do with an LLM that can go into “focus mode” and generate tokens extremely rapidly? How much more powerful would a reasoning-token-generation phase be that can explore and cull large numbers of paths/hypotheses, so long as they are well defined? Does this have implications for multi-modal models and spatial reasoning?

As the paper suggests:

> These models could be useful in several modes: as a dedicated fast path paired with a slower, more general model; as part of a fast/slow hybrid architecture inside a single system; or as a speculative execution model that proposes tokens quickly while a regular-attention model verifies and accepts them. Regardless of their eventual capability ceiling, they already suggest a powerful systems primitive for speeding up larger models.

▲MattPalmer10861 hour ago

Interesting... But why? What is the benefit, other than increasing our understanding of model architectures?

Our brains can also simulate turing machines, slowly. We automated that with computers that are faster and more reliable. So why not allow a model to use external much faster and reliable tools, just as we do?

▲Rastonbury24 minutes ago

Why must models be analogous to humans using tools? Or to take the analogy route further wouldn't it be better if humans had calculators built into their brains, provided they are determisitic and reduce latency

▲andy12_21 hours ago

This seems a really interesting path for interpretability, specially if a big chunk of a model's behavior occurs pseudo-symbolically. This is an idea I had thought about, integrating tools into the main computation path of a model, but I never imagined that it could be done efficiently with just a vanilla transformer.

Truly, attention is all you need (I guess).

▲deviation1 hour ago

I really liked the article, but food for thought: is a transformer that offloads computation to python really that different from Python code being read and then executed by a compiler?

Both examples are of a system we created to abstract most of the hard work.

I think a more important concept here is that the term "AI" has a lot of built-in assumptions, one of which being that it is (or will be) super intelligent, and so folks like the author here think (correctly) that it's important for the AI to be actually doing the work itself.

▲yalok25 minutes ago

very cool idea. But, time savings are not true for every tool call, and it's not clear to me yet whether this is batch-able; also, intuitively, for most of the models that run on GPU, you'd still want to offload tool exec part to CPU since it's much cheaper...

▲pennomi19 hours ago

It makes sense that a next token predictor could execute assembly code. This is fascinating work, especially with the memory implementation.

▲mirekrusin1 hour ago

This is brilliant, game changing level.

Hey, give it also access to the dump of its weights and way to propose updates so it can see and tinker its brain directly.

▲koolala2 hours ago

I'd like to see this combined with reinforcement learning to optimize models to think computationally. Generating ideas with hypothetical results and then running them in the same thought. Their solution sounded like a lot of tokens though.

▲behehebd2 hours ago

Is this genius? Or just a new binary executable format? Can't tell.

▲galsapir19 hours ago

one of the most interesting pieces I've read recently. Not sure I agree with all the statements there (e.g. without execution the system has no comprehension) - but extremely cool

▲ndxone1 hour ago

big question is how efficient is this compare to executing assembly on CPU

▲ThouYS1 hour ago

what!