What is the best compression algorithm?

Posted on June 18, 2024 by Anon

in terms of balance between speed and compression ratio

Thalidomide Vintage Ad Shirt $22.14

UFOs Are A Psyop Shirt $21.68

Thalidomide Vintage Ad Shirt $22.14

1 week ago

Reply

Anonymous

>447b conveys pepe frog
whatever you're already using
1 week ago

Reply

Anonymous

https://web.archive.org/web/20210725124915/http://www.piedpiper.com/
1 week ago

Reply

Anonymous

some hypothetical vector based compression algorithm
1 week ago

Reply

Anonymous

Sloot digital coding system but ~~*they*~~ don't want you to use it
- 1 week ago
  
  Reply
  
  Anonymous
  
  Depends on the data being compressed.
  If it's important enough to archive long term, I tend to compress the data multiple ways and just pick the winner since it's not always going to be a consistent choice. It varies depending on what you're encoding.
  This one wins about 80% of the time for the datasets I tend to archive.
1 week ago

Reply

Anonymous

guys I need a real answer I need to compress terabytes of csv files and be able to uncompress on the fly to serve my users
- 1 week ago
  
  Reply
  
  Anonymous
  
  Then use gzip like everyone else you fricking moron
- 1 week ago
  
  Reply
  
  Anonymous
  
  File system level compression as offered by systems such as ZFS, NTFS, et al.
- 1 week ago
  
  Reply
  
  Anonymous
  
  just benchmark multiple algorithms and pick the winner
  zstd is pretty balanced if you turn a blind eye to its extreme memory usage
  lz4 is the fastest
  - 1 week ago
    
    Reply
    
    Anonymous
    
    >so many posts
    >still no good answer
    Here's the Pareto frontier of compression algorithms: https://insanity.industries/post/pareto-optimal-compression/
    TL;DR is that depending on your desired tradeoff of speed and compression ratio, you should use lz4 (super fast but poor compression), zstd (moderately fast and pretty good), or lzma (slow but great compression).
    
    thanks, I think I'll try zstd
- 1 week ago
  
  Reply
  
  Anonymous
  
  You need filesystem level compression, otherwise you can't uncompress on the fly.
  
  If you are on linux and you can life with it being read-only, you can use squashfs.
  - 1 week ago
    
    Reply
    
    Anonymous
    
    No, you don't
1 week ago

Reply

Anonymous

DEFLATE
1 week ago

Reply

Anonymous

xor with itself
1 week ago

Reply

Anonymous

Brotli
1 week ago

Reply

Anonymous

https://lz4.org/

7zip/lzma/zstandard/xz are all OBSOLETE and DEPRECATED
1 week ago

Reply

Anonymous

what's the point of making a "pixel art" of 4x4 with a resolution of 666x666, you can't divide 666 by 4, and now there's 2 big pixels with a pixel more than the others
1 week ago

Reply

Anonymous
- 1 week ago
  
  Reply
  
  Anonymous
  - 1 week ago
    
    Reply
    
    Anonymous
  - 1 week ago
    
    Reply
    
    Anonymous
    
    >less than 25% the resolution
    >larger file size
    
    >like 5% the resolution or some pathetic shit
    >barely smaller file size
    - 1 week ago
      
      Reply
      
      Anonymous
      
      Just what is OPs secret?
      - 1 week ago
        
        Anonymous
        
        not sure. i couldnt match with GIMP. which makes me disgusted with GIMP
      - 1 week ago
        
        Anonymous
        
        idk, but not much
        *strips further 45B*
      - 1 week ago
        
        Anonymous
        
        how
      - 1 week ago
        
        Anonymous
        
        https://github.com/fhanau/Efficient-Compression-Tool
- 1 week ago
  
  Reply
  
  Anonymous
1 week ago

Reply

Anonymous

>so many posts
>still no good answer
Here's the Pareto frontier of compression algorithms: https://insanity.industries/post/pareto-optimal-compression/
TL;DR is that depending on your desired tradeoff of speed and compression ratio, you should use lz4 (super fast but poor compression), zstd (moderately fast and pretty good), or lzma (slow but great compression).
- 1 week ago
  
  Reply
  
  Anonymous
  
  >open ended question with no answer
  >hurr why no answer
  why don't you figure it out if you're so intelligent? Black person.
- 1 week ago
  
  Reply
  
  Anonymous
  
  Archtard shilling zstd, classic.
  No, there are way more variables to consider.
  
  Archtard presents a "Pareto Frontier" that makes zstd look better than it actually is. No, most of the time you don't want the "worst of both worlds" - you either want fast decompression times or high compression ratios, depending on the application.
  For hardware bound IO, decompression speed is king.
  For network data transfers and archival, compression ratio is king.
  For something you write-once and use a billion times, again compression ratio.
  
  Furthermore Archtard also makes bzip2 look bad by forgetting another important variable: the type of data you're working with.
  Yes, with high entropy synthetic data, bzip does quite poorly.
  For natural language text data, bzip2 is very very good.
  - 1 week ago
    
    Reply
    
    Anonymous
    
    Read the article you moron. He tested on various types of data including text, and he gives many options depending on which tradeoff of speed and ratio you want. bzip2 is simply not very good: you can achieve better speed at the same compression ratio, or the same ratio with less time, if you use other tools.
    - 1 week ago
      
      Reply
      
      Anonymous
      
      >A simple text file, being a dump of the Kernel log via thedmesgcommand, representing textual data
      No. That is not a good representation for the real world use-case of bzip.
      - 1 week ago
        
        Anonymous
        
        You're moronicly grasping at straws. Kernel logs are text and pretty low entropy too. zstd is simply a newer and better algorithm, if you disagree then post your own benchmarks.
      - 1 week ago
        
        Anonymous
        
        You seem to think it's an "either or" choice - it's not.
        You should always test multiple algorithms on your data.
        I have seen bzip beat lzma before in compression ratio.
        Generally speaking, yes, bzip does worse, but my point is that you must always *test* the algorithm on the data you're working with.
        newer != better, lzma is ancient, but it is superior to zstd on compression ratio, generally.
        But again, you sometimes get data where lzma is trash, and bzip wins.
        zstd is garbage that solves no real problems by being in the middle, depending on application lzma is superior for compression ratios, or lzo/lz4 for decompression speeds.
        jack of all trades master of none, is how I would describe zstd.
        But hey, yes I'll use it if there is a particular dataset it outperforms on.
      - 1 week ago
        
        Anonymous
        
        I think you're extremely biased against zstd. It beats bzip2 hands down in pretty much every case - better compression ratio at the same time, or shorter time taken at the same ratio. It's actually bzip2 that's the "worst of both worlds" algorithm, as you say, and there's no point in using it anymore.
        The real algorithms you should use are lzo/lz4 for speed, lzma for ratio, and zstd for something in the middle. You say it's useless, but I found it to be the optimal choice for many things, like streaming data to a HDD or over Gigabit Ethernet as fast as possible.
      - 1 week ago
        
        Anonymous
        
        >like streaming data to a HDD or over Gigabit Ethernet as fast as possible.
        On local hardware IO, lzo and lz4 will do better than zstd.
        >It's actually bzip2 that's the "worst of both worlds" algorithm, as you say, and there's no point in using it anymore.
        You're completely missing the point.
        On certain data, I've seen both bzip and gzip beat lzma on compression ratio.
        Your stupid article doesn't have error bars anywhere.
        There is no algorithm that's inherently better than the others. It's always a case-by-case basis.
      - 1 week ago
        
        Anonymous
        
        >On local hardware IO, lzo and lz4 will do better than zstd.
        You really don't get it and need it spelled out? In both use cases I mentioned you want the algorithm that has the best compression ratio possible while still letting you output 100-150 MB/s. It's the perfect use case for an algorithm with medium speed and medium ratio.
        >There is no algorithm that's inherently better than the others. It's always a case-by-case basis.
        Any more obvious truisms and goalpost moving to share with us, anon? Obviously it differs case by case, but in the vast majority of cases zstd beats bzip2, so you're moronic for shilling the latter.
  - 1 week ago
    
    Reply
    
    Anonymous
    
    what are the numbers on this graph? because idk about you but for me around 500-1000MB/s decompression speed is good enough for most cases
  - 1 week ago
    
    Reply
    
    Anonymous
    
    op asked for best balance, zstd being excellent as a middle ground option is what i would call balanced
    it's not the fastest nor does it have the highest ratio, but when i want something compressed and it doesn't need to be ultra fast nor ultra small, zstd is what i pick. the balanced option
1 week ago

Reply

Anonymous

Consider transposing your csv file, helps a lot for most datasets.
Do you only care about decompression speed? Or do you also care about compression speed?
1 week ago

Reply

Anonymous

Gtp-4o
1 week ago

Reply

Anonymous

Maid-LZW
1 week ago

Reply

Anonymous

Optimized.
- 1 week ago
  
  Reply
  
  Anonymous
  
  Incest between cousins in the first generation has an extremely low chance you cause any health issues for the children. I don't know why mutts are so brainwashed by the media.
1 week ago

Reply

Anonymous

>in terms of balance between speed and compression ratio
LZ + statistical compression (Huffman or Arithmetic coding)
1 week ago

Reply

Anonymous

Depends what you are compressing. Here's the best for text on EVERYONES favorite website.

https://github.com/qntm/base2048
1 week ago

Reply

Anonymous
1 week ago

Reply

Anonymous

zstd
1 week ago

Reply

Anonymous

Zstd, lz4 are good modern ones

Cancel reply