Honey, I Shrunk the Model (Maybe): blk-archive vs AI Data

Because “It Should Work” Isn’t Data

After reading about the billions spent on AI infrastructure, I kept wondering: how much of that storage is just… the same bytes over and over? So I decided to find out. As I’m involved with https://github.com/device-mapper-utils/blk-archive I thought it would be good to understand how much storage blk-archive can realistically save when pointed at AI-style datasets.

Let me be clear upfront: this isn’t a speed test or hardware benchmark. I’m only interested in one question: how many bytes go in, and how many come out?

Read more...