Artisanal handcrafted Git repositories(drew.silcock.dev)
252 points by drewsberry 1 day ago | 16 comments
bradfitz 21 hours ago
My recent horror from some git work was discovering how git sorts its tree objects.

The docs just say to sort by C locale (byte-order sorting). Easy. Except git was sometimes rejecting my packfiles as being bogus per its fsck code, saying my trees were misordered.

TURNS OUT THERE'S AN UNDOCUMENTED RULE: you need to append an implicit forward slash to directory tree entry names before you sort them.

That forward slash is not encoded in the tree object, nor is the type of the entry. You just put the 20 byte SHA1 hash, which is to either a blob or a hash (or a commit for submodules).

So you can have one directory with directory "testing" and file "testing.md" and it'll sort differently than a directory with two files "testing" and "testing.md".

You can see a repro at https://gist.github.com/bradfitz/4751c58b07b57ff303cbfec3e39...

(So to verify whether a tree object is formatted correctly, you need to have the blobs of all the entries in the tree, at least one level)

xqb64 15 hours ago
I've had this exact bug happen to me when I implemented my git clone.

The way I found out was that Github kept rejecting my push, because as I later discovered, my git history was invalid precisely due to entries being sorted improperly due to the forward slash requirement. I could have solved this with the real git, but the point was to use my tool exclusively for version control from inception, so I just deleted the .git folder. So, my git history appears to begin near the end of the whole cycle. But I did manage to learn a lot, both about git and about the language I implemented it in.

Elucalidavah 16 hours ago
> directory tree entry names

But... git doesn't really store directories, does it?

kaoD 16 hours ago
I wrote a longer comment saying this (deleted now since I was wrong).

Turns out that Git does somewhat store dirs (in form of trees). See https://git-scm.com/book/en/v2/Git-Internals-Git-Objects (section "Tree Objects").

To understand op's repro look at the last two lines (objects in the tree) in each of their command outputs, not the files shown in the first few lines.

What I think op means is that the `testing` tree pointed in their first example is sorted after `testing.md` even though it's only called `testing` because it's being sorted as `testing/` and `/` is > `.` bytewise.

I'm not at a computer right now but it would be nice to test it with files named `testing.` and `testing0` since they are adjacent bytewise and would show the implicit forward slash more clearly with the tree object sitting between them.

This makes me wonder why Git can't just store an empty tree for empty dirs.

EDIT: did the Gist https://gist.github.com/alvaro-cuesta/bd0234e3e1a66819c7e9e9...

Notice the `git cat-file -p HEAD^{tree}` outputs.

lucasoshiro 8 hours ago
> This makes me wonder why Git can't just store an empty tree for empty dirs.

tl;dr: it can (see my other comment) and the empty tree is hardcoded. But since the index works with file paths and blobs, having no file means that there's no entry in the index

remram 15 hours ago
Yes it does, it just doesn't store empty directories.
lucasoshiro 8 hours ago
It can store empty directories (actually, trees). It can't do normally because the index maps paths to blobs, an empty directory doesn't have a file to map to a blob and then `git add` will have no effect. Given that normally we write commits from the index content, then normally we won't find an empty tree.

You can run `git commit --allow-empty` with an empty index and the root tree will be the empty tree:

   $ git init
   $ git commit --allow-empty -m foo
   $ git rev-parse @^{tree}
   4b825dc642cb6eb9a060e54bf8d69288fbee4904
4b825dc is the empty tree. And a funny thing about it is that it is hardcoded in Git, and you can use it without having this object:

   $ git init
   $ git commit-tree -m foo 4b825dc642cb6eb9a060e54bf8d69288fbee4904
   $ tree .git/objects # you'll see that there's no file for the empty tree
This is a good reading about that weird object: https://matheustavares.dev/posts/empty-tree
juped 8 hours ago
You can perfectly easily put the empty tree object as a tree object's child, this just isn't supported and some parts of Git will break.
lucasoshiro 1 day ago
Something that I really like in Git is how its data structures are easy to understand and how transparent it is. It's possible to write your own "Git" compatible with existing Git directories only by reading how it works under the hood
shivasaxena 1 day ago
I agree, but only in theory.

Projects like gitoxide have been in development for years now.

fiddlerwoaroof 1 day ago
I wrote a nearly complete implementation of git file format parsers in Common Lisp over like a month of evenings and weekends. I’m sure there are hard parts between where I am and a full git implementation but you can get quite a bit of utility out of a relatively small amount of effort.
MrJohz 19 hours ago
It's a case of Pareto. Parsing the git file format is relatively simple, but handling all the weird states a Git repo can be in and doing the correct things to those files in each state is a lot harder. And then adding the network protocol on top of that makes directly reproducing Git quite difficult.

I know JJ used to use Git2 for a lot of network operations like pushing and pulling, but ran into too many issues with SSH handling that they've since switched to directly invoking the Git binary for those operations.

fiddlerwoaroof 19 hours ago
There aren’t that many weird states a git repository can be in: the on-disk format of the repository is too simple for that. The hard part has to do with the various protocols for transferring objects around.
deathanatos 17 hours ago
I think there's more corners out there than most people would give credit to? Just off the top of my head: files in the index (but maybe this isn't "weird enough"), rebasing but paused, rebasing with conflicts, merge with conflicts, cherry-picking but conflicts, middle of a bisect with all the state that implies, alternate objects dirs, alternate working dirs, submodules and all of their weirdness, and a "bare" repo.

Heck, had my PS1 return an error this week after I created a separate working dir for a repo and cd'd into it. Did you know .git can be a normal file? I didn't when I wrote my PS1.

fiddlerwoaroof 7 hours ago
I knew .git can be a normal file because of worktrees. But most of the weird states have to do with the working tree not the repository. Even rebasing isn’t weird as far as the file formats go: it just is replaying commits on top of a new base commit. Since my goal was basically to implement enough of git to serve files from a git repository as a website, the actual task was fairly small.
lucasoshiro 23 hours ago
Yeah, I wrote mine in Haskell. It's a good exercise for understanding how Git works
ratmice 4 hours ago
If I'm not mistaken, gitoxide is attempting to not only be feature complete, but also the fastest. Both of which reduce development velocity
chubot 23 hours ago
Not sure what gitoxide is, but libgit already exists, and it seems to be an independent implementation - https://github.com/libgit2/libgit2

I think Github and most big Git hosts use it

steveklabnik 20 hours ago
libgit2 has a ton of compatibility issues, especially around authentication, that make it only useful in some circumstances.

(gitoxide is a similar project but in Rust, it's not ready for the big time either, though it keeps on getting better!)

3eb7988a1663 20 hours ago
Jujitsu threw in the towel and is shelling out to the git CLI because of minor variations in libgit vs the binary.

Failing to find a write-up, but there was this lobster thread[0] where someone from GitLab reported they had to do the same owing to some discrepancies vs the binary -where all of the real development happens.

[0] https://lobste.rs/s/vmdggh/jujutsu_v0_30_0_released

Dylan16807 18 hours ago
But nothing in that description of problems is tied to the repository format.
veganjay 1 day ago
Neat to see this done by hand! It helps demystify the magic behind git commands.

If you like this, I also recommend "Write Yourself a Git", where you build a minimal git implementation using python: https://wyag.thb.lt/

xqb64 16 hours ago
There is also James Coglan's "Building git" book that I just went through and can vouch for its quality.
bhasi 22 hours ago
A similar project is CodeCrafters' Build Your Own Git: https://app.codecrafters.io/courses/git/overview
wonderwonder 23 hours ago
How cool, thank you
sc68cal 1 day ago
To the site author: I'm on a MBP M1 Mac and honestly I can't really read the text. Far too small, and increasing the zoom just makes the text large but the margins less wide. Firefox reader mode also renders really badly.

Please, consider making the layout better for us old coders whose eyes are going, or for hi res displays

derefr 1 day ago
FYI: the pinch-to-zoom gesture from mobile browsers (from before websites were mobile-responsive) has also long been implemented for all modern desktop browsers. It's viewport zoom, which is far better than the font-scaling zoom you get by pressing Cmd-+, and makes this site easily readable.

(The much-less-well-known mobile double-tap-on-text gesture [it zooms-to-fit whatever element you tapped on to the width of the viewport] was also ported to desktop browsers. Though, on desktop with a touchpad, it's a two-finger double-tap — which I don't think anyone would ever even think to try.)

LocalPCGuy 8 hours ago
FWIW, most browsers by default now do a viewport zoom with Ctrl/Cmd-+ rather than a font-scaling zoom. I think browsers generally have the option to change that, so if you prefer the former but it's doing the latter, may check the browser settings.
BobaFloutist 23 hours ago
Double tap on text highlights it for me. Is that an Iphone/android thing or what?
derefr 23 hours ago
As I said, it's a two finger double-tap.

But also, under further investigation — and unlike with pinch-to-zoom — desktop support for the two-finger double-tap gesture seems to be specific to macOS. (Which is weird, because Chrome has support for arbitrary multitouch gesture processing to enable the JS multitouch API. So you'd think Chrome's support for "the multitouch gestures the OS expects" would be built on top of that generic multitouch recognizer [and therefore working everywhere that recognizer works], instead of expecting the OS to pre-recognize specific gestures and translate them to specific OS input events.)

BobaFloutist 21 hours ago
I was trying on my phone, but my laptop seems to interpret it as a right click. Which, frankly, makes sense.
antonvs 17 hours ago
On my iPad in Safari and Pixel Android phone in Firefox, one-finger double tap on text does the fit to viewport.

On my Ubuntu laptop in Chrome, I couldn’t find a way to make it work - even tapping the touchscreen didn’t work. But I’m not using the stock Ubuntu GUI, so it could be that (LXqt+XMonad).

BobaFloutist 6 hours ago
>Pixel Android phone in Firefox, one-finger double tap on text does the fit to viewport.

I'm very confused, can you clarify what makes this different from the gesture that highlights text?

Edit: it appears that "request desktop site" makes it fit the viewport, whereas using the mobile view it's I guess already fitting the viewport so it highlights the text. The strange thing is in the desktop view, if I pinch zoom after fitting the viewport and do it again, it zooms out, whereas the mobile view still highlights the text. Which kinda makes sense, since mobile view it's fairly likely that you zoomed in to highlight the text more accurately, though it's weird that it's so inconsistent.

retsibsi 7 hours ago
For me, the text size would be fine if the contrast were better. The background colour is similar to the colour of the non-central pixels of the text, and even the central pixels are grey rather than black.
sam_lowry_ 1 day ago
Works great on Firefox for Android though )
lucasoshiro 1 day ago
Also works great on Safari on a M1 MacBook Air, here
mitchitized 10 hours ago
I closed the tab as soon as I saw `ignorecase = true`.

Absolutely NOT going there again.

* points at numerous scars and trauma

lemming 17 hours ago
Git refers to the user-friendly commands as “porcelain”

Ahhhhahahaha… “user friendly”. When compared to coding the repo by hand, I guess.

antonvs 17 hours ago
This is what happens when you let an OS kernel guy write a cli.
aGHz 12 hours ago
When compared to the "plumbing" commands. If you want to know more about git's plumbing vs porcelain metaphor, this is a good quick overview: https://stackoverflow.com/a/39848551
jllyhill 13 hours ago
Am I the only one having troubles with the site on mobile? I'm using Firefox on a decent Android phone but the scroll is extremely stuttery and it distracts from the article unfortunately.
styanax 12 hours ago
The site is built with a content creation tool which has used a lot of JS and CSS, but the CSS is atrocious in it's automated output so it's triggering the browser to have to interpret the mess of directives in every code block. The tool is generating HTML trash like (brackets replaced for comment to not parse):

    [span style="--0:#E1E4E8;--1:#24292E"] [/span]
...over and over, essentially giving style directives for every blank space in the code block. A less capable mobile CPU may well have issues rendering this site due to the presence of so much trash CSS inside it guts. $0.02 hth
aeblyve 1 day ago
I thought this was going to be a sardonic article about doing programming without LLMs.
lioeters 23 hours ago
I'm starting to see this kind of wording as a unique selling point, that some software (or article, visual art, etc.) is handcrafted and artisanal, as opposed to AI-generated. "Every word was written by me, a human being!" At this point in the emerging technology I can usually tell the difference intuitively, but it's possible that one day it will be indistinguishable - and the quality of "handmade" will be simply a matter of branding for niche enthusiasts, like vinyl records.
aeblyve 2 hours ago
I honestly tend to think that products with "handmade" quality actually have worse quality. I'll usually chose software which is industrially produced and vetted versus some guy's weekend project.
lan321 12 hours ago
Homegrown bugs from sustainably raised Bio-certified devs vs industrial bugs.
HexDecOctBin 22 hours ago
Okay, there's something I have been thinking about recently. Is it possible to somehow make Git use the Content Defined Chunking algorithm from rsync? Maybe somehow using clean/smudge? If not git, then maybe Mercurial, Fossil or any other DVCS?

This would help with large binary assets without having to deal with the mess that is LFS, as long as the assets were uncompressed.

hanwenn 11 hours ago
IIRC it already uses content defined chunking for finding object deltas.
BobbyTables2 1 day ago
I realize the concept is very similar but would love to see a writeup on bow Docker stores images using OverlayFS. (Has quite a bit of metadata!)
kassah 1 day ago
The simplicity of Git is awesome. Great article! I had looked at what it would take to find a single file in a remote git repo. I decided against talking the git protocol directly and just checking out the entire repo to get a single file. Reading through this makes me think I may have given up too easily.

I asked a few git hosting providers, and they all said they had private APIs developed internally for the purpose.

DrBazza 14 hours ago
I'm glad I clicked through to the actual article rather than dismissing it via its slightly silly title. I learnt a few things about git, and I didn't realize that the tool `pigz` existed. Today I learnt...
iJohnDoe 18 hours ago
What is this web site theme or CMS?
gerdesj 1 day ago
This is all very well but how does Linus Thorvalds use git? Given he invented the bloody thing, it might be nice to see how the Boss uses it!

git was created to scratch an itch (actually a bit of a roiling boil, that needed a serious amount of soothing ointment and as it turns out: a compiler, some source code and quite a lot of effort). ... anyway the history of it is well documented.

FFS: git was called git because a Finnish bloke with English as a second, but well used, tongue had learned what a "git" is and it seemed appropriate. Bear in mind that Mr T was deeply in his shouty phase at that point in time.

Artisanal git sounds all kinds of wrong 8) Its just a tool to do a job and I suggest you use it in the same way as the XKCD comic mandates (that is the official manual, despite what you might think)

The Conclusion is spot on - great article.

lysace 1 day ago
I would have called this: "Futzing around with internal git data structures".