Glam Prestige Journal

Bright entertainment trends with youth appeal.

I have seen some highly compressed files around, such as 700MB of data compressed to around 30-50MB.

But how do you get such compressed files? I have tried using software like WinRAR and 7Zip but have never achieved such high compression.

What are the techniques/software that allow you to compress files so well?

(P.S. I'm using Windows XP)

4

9 Answers

If time taken to compress the data is not an issue, then you can optimize compressed size by using several different tools together.

Compress the data several times using different tools like 7zip, winrar (for zip) and bjwflate.

(Note that this does not mean compress the zip file over and over, but rather create a number of alternative zip files using different tools)

Next, run deflopt on each archive to reduce each archive a little more.

Finally, run zipmix on the collection of archives. Since different zip tools are better on different files, zipmix picks the best compressed version of each file from each of the archives and produces an output which is smaller than any that any of the zip tools could have produced individually.

You should note however that this is not guaranteed to work any kind of magic on your files. Certain types of data simply do not compress very well, like JPEGs and MP3s. These files are already compressed internally.

4

This depends entirely on the data being compressed.

Text compresses very well, binary formats not so well and compressed data (mp3, jpg, mpeg) not at all.

Here is a good Compression Comparison Table from wikipedia.

5

Previous answers are wrong by an order of magnitude!

The best compression algorithm that I have personal experience with is paq8o10t (see zpaq page and PDF).

Hint: the command to compress files_or_folders would be like:

paq8o10t -5 archive files_or_folders

Archive size vs. time to compress and extract 10 GB (79,431 files) to an external USB hard drive at default and maximum settings on a Dell Latitude E6510 laptop (Core i7 M620, 2+2 hyperthreads, 2.66 GHz, 4 GB, Ubuntu Linux, Wine 1.6). Data from 10 GB Benchmark (system 4).

Source: Incremental Journaling Backup Utility and Archiver

You can find a mirror of the source code on GitHub.


A slightly better compression algorithm, and winner of the Hutter Prize, is decomp8 (see link on prize page). However, there is no compressor program that you can actually use.


For really large files lrzip can achieve compression ratios that are simply comical.

An example from README.benchmarks:


Let's take six kernel trees one version apart as a tarball, linux-2.6.31 to linux-2.6.36. These will show lots of redundant information, but hundreds of megabytes apart, which lrzip will be very good at compressing. For simplicity, only 7z will be compared since that's by far the best general purpose compressor at the moment:

These are benchmarks performed on a 2.53Ghz dual core Intel Core2 with 4GB ram using lrzip v0.5.1. Note that it was running with a 32 bit userspace so only 2GB addressing was posible. However the benchmark was run with the -U option allowing the whole file to be treated as one large compression window.

Tarball of 6 consecutive kernel trees.

Compression Size Percentage Compress Decompress
None 2373713920 100 [n/a] [n/a]
7z 344088002 14.5 17m26s 1m22s
lrzip 104874109 4.4 11m37s 56s
lrzip -l 223130711 9.4 05m21s 1m01s
lrzip -U 73356070 3.1 08m53s 43s
lrzip -Ul 158851141 6.7 04m31s 35s
lrzip -Uz 62614573 2.6 24m42s 25m30s
2

Squeezechart.com contains comparisons of various compression rates. Although, as stated by Nifle's answer - you're unlikely to get such high compression rates for binary formats.

Just check the Summary of the multiple file compression benchmark tests which has the best compression list which consist the complete compression benchmark.

Top 30

enter image description here

Top performers (based on compression) in this test are PAQ8 and WinRK (PWCM). They are able to compress the 300+ Mb testset to under 62 Mb (80% reduction in size) but take a minimum of 8,5 hour to complete the test. The number one program (PAQ8P) takes almost 12 hours and number four (PAQAR) even 17 hours to complete the test. WinRK, the program with the 2nd best compression (79.7%) takes about 8,5 hours. Not surprisingly all mentioned programs make use of a PAQ(-like) engine for compression. If you have files with embedded images (e.g. Word DOC files) use PAQ8, it will recognize them and separately compress them, boosting compression significantly. All mentioned programs (except WinRK) are free of charge.

Most compression tools have settings to allow you to achieve a higher compression rate at a compromise of slower compression/decompression times and more RAM usage.

For 7-Zip, search for "Add to Archive Dialog Box" in the built-in help for more detail.

You may try 7zip with the following ultra settings:

7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on big_file.mysql.7z big_file.mysql

Your best bet here seems to be trial and error. Try all your available compression techniques on each file and pick the best to put on your website. Luckily computers do this sort of thing pretty fast and don't get bored. You could write a simple script to automate the process so it would be "relatively painless".

Just don't expect miracles - 700 mb down to 30 mb just doesn't happen that often. Log files as mentioned above - yes. "Your average file" - no way.

Nanozip seems to have highest compression together with FreeArc. But it is not in final version yet. There is how good compression Nanozip achieves. It has very high compression and it does not takes too much time, check the Summary of the multiple file compression benchmark tests, but FreeArc is faster.

1