YouTube as Storage | Hono Hacker News

▲

101 points bysaswatms5 hours ago |17 comments

▲repeekad5 hours ago

I once asked one of the original YouTube infra engineers “will you ever need to delete the long tail of videos no one watches”

They said it didn’t matter, because the sheer volume of new data flowing in growing so fast made the old data just a drop in the bucket

▲jl61 hour ago

One day, it will matter. Not even Google can escape the consequences of infinite growth. Kryder's Law is over. We cannot rely on storage getting cheaper faster than we can fill it, and orgs cannot rely on being able to extract more value from data than it costs to store it. Every other org knows this already. The only difference with Google is that they have used their ad cash generator to postpone their reality check moment.

One day, somebody is going to be tasked with deciding what gets deleted. It won't be pretty. Old and unloved video will fade into JPEG noise as the compression ratio gets progressively cranked, until all that remains is a textual prompt designed to feed an AI model that can regenerate a facsimile of the original.

▲asah1 hour ago

You can see how Google rolls with how they deleted old Gmail accounts - years of notice, lots of warnings, etc. They finally started deletions recently, and I haven't heard a whimper from anyone (yet).

▲flux312556 minutes ago

The problem is that some content creators have already passed away (and others will pass away by then), and their videos will likely be deleted forever.

▲zaik26 minutes ago

Hopefully the deletion will not affect videos with thousands of views, even if the account is lost.

▲loloquwowndueo10 minutes ago

Sweet summer child.

▲CuriouslyC0 minutes ago

Goog is 100% not going to delete anything that is driving any advertising at all. The videos are also useful for training AI regardless, so I expect the set of stuff that's deleted will be a VERY small subset. The difference with email is that email can be deduplicated, since it's a broadcast medium, while video is already canonical.

I expect rather than deleting stuff, they'll just crank up the compression on storage of videos that are deemed "low value."

▲dessimus28 minutes ago

Monuments erode away and memories of those enshrined are lost time as well, nothing lasts forever.

▲arjie4 hours ago

Videos do disappear, though. https://www.reddit.com/r/DataHoarder/comments/1ioz4x1/is_it_...

Searching hn.algolia.com for examples will yield numerous ones.

https://news.ycombinator.com/item?id=23758547

https://bsky.app/profile/sinevibes.bsky.social/post/3lhazuyn...

▲Kwpolska3 hours ago

Of course videos disappear for copyright, ToS violations, or when the uploaders remove them. They do not disappear just because nobody watched them.

▲Gigachad1 hour ago

There’s a whole activity around discovering random 15 year old videos with almost no views. It’s usually some random home video

▲ntoskrnl_exe1 hour ago

Wouldn't it also be a performance nightmare?

The energy bill for scanning through the terabytes of metadata would be comparable to that of several months of AI training, not to mention the time it would take. Then deleting a few million random 360p videos and putting MrBeast in their place would result in insane fragmentation of the new files.

It might really just be cheaper to keep buying new HDDs.

▲dev1ycan1 hour ago

This is why they removed searching for older videos (specific time) and why their search pushes certain algorithmic videos, other older videos when found by direct link are on long term storage and take a while to start loading.

▲joecool10291 hour ago

I’m pretty sure this is the real reason why they changed old unlisted videos to being marked private: https://blog.youtube/news-and-events/update-youtube-unlisted...

▲stogot1 hour ago

S3 allows delete and is efficient here. I’m sure Google can figure it out

They allow search by timestamp, I’m sure YouTube can write algo to find zero <=1 view

▲moffkalast1 hour ago

Besides with their search deteriorating to the point where a direct video title doesn't result in a match, nobody can see those videos anyway and they don't have to cache them.

▲sfn421 hour ago

It's not just the search deteriorating. The frontend is littered with bugs. If you write a comment and try to highlight and delete part of that comment, it'll often delete the part you didn't highlight. So apparently they implemented their own textfield for some reason and also fucked it up. It's been like that for years.

The youtube shorts thing is buggy as shit, it'll just stop working a lot of the time, just won't load a video. Some times you have to go back and forth a few times to get it to load. It'll often desync the comments from the video, so you're seeing comments from a different video. Some times the sound from one short plays over the visuals of another.

It only checks for notifications when you open the website from a new tab, so if you want to see if you have any notifications you have to open youtube in a new tab. Refreshing doesn't work.

Seems like all the competent developers have left.

▲r_lee57 minutes ago

and if you do a hard refresh on the webapp, it literally takes like 10 seconds for the homepage to load

▲sfn4216 minutes ago

Yeah, one that I forgot to mention is if you pause a youtube short and go to a different tab, the short will unpause in the background, or it might change to an entirely different short and start playing that.

▲wasmainiac5 hours ago

I wonder if that still holds true? The volume of videos increases exponentially especially with AI slop, I wonder if at some point they will have to limit the storage per user, with a paid model if you surpass that limit. Many people who upload many videos I guess some form of income off YouTube so it wouldn’t that be that big of a deal.

▲weird-eye-issue4 hours ago

What they said only holds true because the growth continues so that the old volume of videos doesn't matter as much since there's so many more new ones each year compared to the previous year. So the question is more about whether or not it will hold true in the long term, not today

▲raincole2 hours ago

The framing here is really weird. The volume of videos increasing isn't 'growth.' Videos are inventory for Youtube. They're only good when people (without adblocks!) actually watch them.

▲amelius1 hour ago

^ This.

▲pogue4 hours ago

I assume it's an economics issue. As long as they continue making money off the uploads to a higher extent than it costs for storage, it works out for them.

▲throw_await4 hours ago

Do they make a profit nowadays

▲rezonant3 hours ago

Likely yes, with a margin of perhaps 38%

https://news.ycombinator.com/item?id=34268536

▲ranger_danger4 hours ago

I wonder if anyone has ever compiled a list of channels with abnormally large numbers of videos? For example this guy has over 14,000:

https://www.youtube.com/@lylehsaxon

▲HeliumHydride4 hours ago

There is a channel with 2 million videos: https://www.youtube.com/@RoelVandePaar/videos One with 4 million videos: https://www.youtube.com/@NameLook

▲buenzlikoder3 hours ago

NameLook puts a whole new meaning to "low effort videos"

▲wellf3 hours ago

First one has transcribed stack overflow to YT by the look of it

▲j-bos4 hours ago

This ia really cool but also feels like a potential burden on the commons,

▲vasco4 hours ago

That great commons that are the multi trillion dollar corporations that could buy multiple countries? They sure worry about the commons when launching another datacenter to optimize ads.

▲asah1 hour ago

no the "commons" in this case is the fundamental free-ness of YT - if abused then any corporations will have to shut it down...

OTOH I'm 100.0% sure that google has a plan, been expecting this for years and in particular, has prior experience from free Gmail accounts being used for storage.

▲justinclift21 minutes ago

> no the "commons" in this case is the fundamental free-ness of YT ...

Hmmm, isn't the "free-ness" of YouTube because there were determined to outspend and outlast any potential competitors (ie supported by the Search business), in order to create a monopoly for then extracting $$$ from?

I'm kind of expecting the extracting part is only getting started. :(

▲agnishom3 hours ago

You are right, but YouTube is also a massive repository of human cultural expression, whose true value is much more than the economic value it brings to Google.

▲anjel2 hours ago

So was Flickr

▲ancillary1 hour ago

Somebody wrote a file encoder to take advantage of Flickr's free photo storage, too (though based on its Github repo I don't think a ton of people used it): https://alexcbecker.net/projects.html#storing-data-in-gifs

▲komali23 hours ago

Yes, but it's a classic story of what actually happened to the commons - they were fenced and sold to land "owners."

Honestly, if you aren't taking full advantage within the constraints of the law of workarounds like this, you're basically losing money. Like not spending your entire per diem budget when on a business trip.

▲agnishom2 hours ago

This seems like a narrow understanding of value.

Which do you think has more value to me? (a) I save some money by exploiting the storage loophole (b) The existence of a cultural repository of cat videos, animated mathematics explainers, long video essays continue to be available to (some parts of) humanity (for the near future).

▲komali21 hour ago

This is assuming doing A has any meaningful impact on B.

Anyway in this situation it's less that YouTube is providing us a service and more, it's captured a treasure trove of our cultural output and sold it back to us. Siphoning back as much value as we can is ethical. If YouTube goes away, we'll replace it - PeerTube or other federated options are viable. The loss of the corpus of videos would be sad but not catastrophic - some of it is backed up. I have ~5Tb of YouTube backed up, most of it smaller channels.

I agree generally with you that the word "value" is overencompassing to the point of absurdity though. Instrumental value is equated with moral worth, personal attachment, and distribution of scarcity. Too many concepts for one word.

▲cheonn6383 hours ago

> That great commons that are the multi trillion dollar corporations that could buy multiple countries?

Exactly which countries could they buy?

Let me guess: you haven’t actually asked gemini

▲cheschire3 hours ago

Have you? Assuming Google would want to not put all their chips on that one number and invest all available capital in the purchase of a nation, and assuming that nation were open to being purchased in the first place (big assumption; see Greenland), Google is absolutely still in a place to be able to purchase multiple smaller countries, or one larger one.

▲arcticfox3 hours ago

Greenland already has a wealthy benefactor, I'd be surprised if poor countries wouldn't be interested

▲gregoryl3 hours ago

https://en.wikipedia.org/wiki/Hyperbole

▲K0balt3 hours ago

You don’t have to go ballistic!

▲RobotToaster2 hours ago

Nauru, possibly Tuvalu.

▲russfrank3 hours ago

The USA.

▲justinclift20 minutes ago

That one's not a "could" as it's already been done. ;)

▲Smalltalker-803 hours ago

Thechnically cool, but ToS state: "Misuse of Service Restrictions - Purpose Restriction: The Service is intended for video viewing and sharing, not as a general-purpose, cloud-based file storage service." So they can rightfully delete your files.

▲ilaksh2 hours ago

Its interesting that this exact use case is already covered in their ToS. I wonder when the first YouTube as storage project came out, and how many there have been over the years.

▲Valkryst1 hour ago

At-least as far back as 2017 when I wrote Schillsaver: https://github.com/Valkryst/Schillsaver

None of us, in the original discussion threads, knew of it being done before then IIRC.

▲KellyCriterion14 minutes ago

I can remember the years when YouTube was used by Contentdistributors by uploading high quality material protected with a password :-D

▲thrdbndndn4 hours ago

I don't get how it works.

> Encoding: Files are chunked, encoded with fountain codes, and embedded into video frames

Wouldn't YouTube just compress/re-encode your video and ruin your data (assuming you want bit-by-bit accurate recovery)?

If you have some redundancy to counter this, wouldn't it be super inefficient?

(Admittedly, I've never heard of "fountain codes", which is probably crucial to understanding how it works.)

▲Jaxan4 hours ago

Yes it is inefficient. But youtube pays the storage ;-). (There is probably a limit on free accounts, and it is probably not allowed by the TOS.)

▲genidoi4 hours ago

Right, you just pay daily in worrying when, not if, youtube will terminate your account and delete your "videos".

▲madmads3 hours ago

I think it's just meant to be a fun experiment, not your next enterprise backup site

▲K0balt3 hours ago

Stegonagraphic backup with crappy ai transmogrified reaction videos. Free backup for openclaw agents so they can take over the internet lol

▲zokier5 hours ago

Also, how to get your google account banned for abuse.

▲newqer4 hours ago

Just make sure you have you have a bot network storing the information in with multiple accounts. Also with with enough parity bits (E.g. PAR2) to recover broken vids or removed accounts.

▲compsciphd3 hours ago

par2 is very limited.

It only support 32k parts in total (or in reality that means in practice 16k parts of source and 16k parts of parity).

Lets take 100GB of data (relatively large, but within realm of reason of what someone might want to protect), that means each part will be ~6MB in size. But you're thinking you also created 100GB of parity data (6MB*16384 parity parts) so you're well protected. You're wrong.

Now lets say one has 20000 random bit error over that 100GB. Not a lot of errors, but guess what, par will not be able to protect you (assuming those 20000 errors are spread over > 16384 blocks it precalculated in the source). so at the simplest level , 20KB of errors can be unrecoverable.

par2 was created for usenet when a) the size of binaries being posted wasn't so large b) the size of article parts being posted wasn't so large c) the error model they were trying to protect was whole articles not coming through or equivalently having errors. In the olden days of usenet binary posting you would see many "part repost requests", that basically disappeared with par (then quickly par2) introduction. It fails badly with many other error models.

▲e145bc455f13 hours ago

what other tool do you recommend?

▲iberator2 hours ago

just pay for storage instead. It's absurd that rich developers are doing ANYTHING but to pay for basic services - ruining the internet for those in real need.

we can't have nice things

▲wellf3 hours ago

Or.... backblaze B2

▲willis9362 hours ago

Plus restic or borg or similar. I tried natively pushing from truenas for a while and it's just slow and unreliable (particularly when it comes to trying to bus out active datasets) and rsync encryption is janky. Restic is built for this kind of archival task. You'll never get hit with surprise bills for storing billions of small files.

▲encom1 hour ago

Have Backblaze software stopped being utterly awful, to the point of being almost nonfunctional, yet?

▲blackhaz4 hours ago

Has anyone got an example how such a video looks like? Really curious. Reminds me of the Soviet Arvid card that could store 2 GB on an E-180 VHS tape.

https://en.wikipedia.org/wiki/ArVid

▲xnx2 hours ago

An idea as old as YouTube. Here's on implementation: https://github.com/therealOri/qStore

▲predkambrij1 hour ago

Cool https://m.youtube.com/watch?v=hMS30w23zkQ

▲polotics4 hours ago

Wot no steganography? Come on pretty please with an invisible cherry on top! :-) Here to get you started: https://link.springer.com/article/10.1007/s11042-023-14844-w

▲madduci5 hours ago

Love this project, although I would never personally trust YT as Storage, since they can delete your channel/files whenever they want

▲rzzzt3 hours ago

Upload to other video sharing sites for redundancy. RAIVS!

▲iberator2 hours ago

Stop ruining the internet end exploiting free resources

▲rzzzt1 hour ago

It was a tongue-in-cheek / silly suggestion outright. I don't think many people are actually using the tool for its off-ToS purpose though, there is also a lot of prior art across multiple sharing services. It's still interesting to think about the inner workings of it.

▲qwertox4 hours ago

The explainer video on the page [0] is a pretty nice explanation for people who don't really know what video compression is about.

[0] https://www.youtube.com/watch?v=l03Os5uwWmk

▲nubinetwork1 hour ago

How do you manage to get youtube to not re-encode the video, trashing the data?

▲neals1 hour ago

Flashing a bunch of qr codes should do it

▲ranger_danger5 hours ago

Other examples of so-called "parasitic storage": https://dpaste.com/DREQLAJ2V.txt

▲andrewstuart4 hours ago

How does it survive YouTube transcoding.

▲the_dude_4 hours ago

reminds me of gmail fs, https://en.wikipedia.org/wiki/GMail_Drive very interesting project explanation video on youtube

▲finalhacker4 hours ago

after compression, all data lost.

▲sneak5 hours ago

Something at this link crashes both MobileSafari and iOS Firefox on my device.

▲Hamuko5 hours ago

The GitHub link? Works fine in Safari on my M4 iPad Pro.