The CRDT Dictionary: A Field Guide to Conflict-Free Replicated Data Types

▲

The CRDT Dictionary: A Field Guide to Conflict-Free Replicated Data Types(iankduncan.com)

137 points bybirdculture10 hours ago |6 comments

▲btown4 hours ago

One of the most interesting things to me about CRDTs, and something that a skim of the article (with its focus on low-level CRDTs) might give the wrong impression on... is that things like https://automerge.org/ are not just "libraries" that "throw together" low-level CRDTs. They are themselves full CRDTs, with strong proofs about their characteristics under stress.

Per the Automerge website:

> We are driven to build high performance, reliable software you can bet your project on. We develop rigorous academic proofs of our designs using theorem proving tools like Isabelle, and implement them using cutting edge performance techniques adopted from the database world. Our standard is to be both fast and correct.

While the time and storage-space performance of these new-generation CRDTs may not be ideal for all projects, their convergence characteristics are formalized, proven, and predictable.

If you're building a SaaS that benefits from team members editing structured and unstructured data, and seeing each others' changes in real time (as one would expect of Notion or Figma), you can reach for CRDTs that give you actionable "collaborative deep data structures" today, without understanding the entire history of the space that the article walks through. All you need for the backend is key-value storage with range/prefix queries; all you need for the frontend is a library and a dream.

▲michelpp2 hours ago

Automerge is an excellent library, with a great API, not just in Rust, but also Javascript and C.

> All you need for the backend is key-value storage with range/prefix queries;

This is true, I was able to quickly put together a Redis automerge library that supports the full API, including pub/sub of changes to subscribers for a full persistent sync server [0]. I was surprised how quickly it came together. Using some LLM assistance (I'm not a frontend specialist) I was able to quickly put together a usable web demo of synchronized documents across multiple browsers using the Webdis [1] websocket support over pub/sub channels.

[0] https://github.com/michelp/redis-automerge

[1] https://webd.is/

▲mentalgear1 hour ago

Automerge is a great project, but it feels still way to academic in it's setup. If you need a superior DX and CRDT-based full-stack database, I'd recommend you to look at Triplit.dev and their docs. (while development has decreased somewhat, the product is in a fully-featured phase and should work well for anything from small to medium, probably also very large projects depending on your configuration). Give it a try, you will like it.

▲GermanJablo48 minutes ago

Triplit is my favorite local-first database. However, it doesn't compete in the same space as Automerge, which is doc-based. If you want a user-friendly alternative, I'm launching my proposal this week: https://docnode.dev

▲trm21712 minutes ago

Very interesting read! Thanks for sharing!

▲GermanJablo1 hour ago

Interesting read. I’ve spent the past two years developing my own CRDT, but along the way, I realized a CRDT involves too many trade-offs, so I ended up implementing an ID-based OT framework. Coincidentally, I’m planning to launch it this Tuesday, so here’s an exclusive for you: https://docnode.dev. I'd like to hear your thoughts!

In the future, I plan to add a CRDT mode for scenarios where P2P is required.

▲josephg30 minutes ago

Out of curiosity, which tradeoffs were problematic for your design?

▲rdtsc5 hours ago

That's a great summary of CRDTs, starting from the basics and to the more advanced ones.

Speaking of Riak, it's still around, in the form of https://github.com/OpenRiak!

▲macintux2 hours ago

Thank you, I was completely unfamiliar with OpenRiak. Pretty cool to see some of my former co-workers chiming in on the effort. Basho was a remarkable collection of smart people.

▲tbrownaw2 hours ago

what this calls OR-Set looks equivalent to what Monotone uses (used? It's kinda mostly dead now) for merging scalar values (eg names, content hashes) since 2005.

The best current page I can find is https://tonyg.github.io/revctrl.org/MarkMerge.html . Boo link rot.

▲fellowniusmonk4 hours ago

CRDTs are something you still have to write by hand, I finished creating a custom sequence based CRDT engine about 2 months ago (inspired by diamond types) and it was hilarious to ask Ai for assistance.

It's interesting when you are working on something that:

1. Is essentially a logic problem.

2. That LLMs aren't trained on.

3. That can have dense character sequences when testing.

4. To see how completely useless an LLM is outside of pre-trained areas.

There needs to be some blackbox test based on pure but niche logic to see if an LLM model is capable of understanding and even noticing exposure to new logics.

▲canadiantim3 hours ago

What about just using something like Loro?

▲fellowniusmonk3 hours ago

I love Loro and its probably my favorite open source project (you can see me refer to it as such in my comment history), I have a very specific multi CRDT and search indexing architecture that precluded me from using it.

▲fellowniusmonk1 hour ago

Ah, sorry, I completely left out context, when I say by hand I don't mean there are no good CRDT projects, Loro is absolutely great, others are very good as well.

I mean only in the context of writing your own, you can't use Ai, Ai can be used to write code and certainly can explain a lot of code and as a resuly people start ascribing more reasoning power to Ai than it has, CRDTs are an area where current models just completely lose the plot.

If you're only using Ai in well mapped areas it's easy to start assuming it has human level reasoning capabilities, the illusion is quickly shattered if you're operating at the edge.