Data Compression Programs

by Matt Mahoney

Current Work

Papers

Experimental Programs

Most of these programs were used as research into data compression techniques that I later incorporated into ZPAQ. I no longer maintain them. All software on this page is open source licensed under GPL except as noted below, and believed to be unencumbered by patents. LIBZPAQ is public domain. Most downloads include Windows executables and C++ source code for Windows or Linux/UNIX. The source code comments explain how the programs work. The PAQ8, LPAQ, and FPAQ projects have many contributors. Programs are last modified by Matt Mahoney unless otherwise specified.

PAQ8

PAQ8 is a series of archivers that achieve very high compression rates at the expense of speed and memory. My latest version is paq8l, Mar. 8, 2007. I am no longer maintaining this code. However, there have been many compression improvements since then written by others, as described in the history section below. Recent version can be found here.

To compress: paq8l -5 archive [files_or_folders...] (creates archive.paq8p) To decompress: paq8l -d archive.paq8l [output_folder]

NOTE: Files can only be decompressed with the same version of PAQ that they were compressed with. However, links to all versions can be found here. Decompression requires the same memory and time as compression.

All PAQ compressors use a context mixing algorithm described in Wikipedia. A large number of models independently predict the next bit of input. The predictions are combined using a neural network and arithmetic coded. There are specialized models for text, binary data, x86 executable code, BMP, TIFF, and JPEG images (except in the paq8hp* series, which are tuned for English text only).

Projects

PAQ variants licensed under GPL have held the Calgary Corpus Compression Challenge title since Jan. 2004, and the Hutter Prize since its inception in Aug. 2006.

PowerPAQ is a command line and GUI front end for 32 and 64 bit Linux by Eugene Varnavsky.

PeaZip (Giorgio Tani) is a GUI front end for Windows and Linux that supports the paq8o, lpaq1, and many other compression formats.

FreeBSD port, usually a few days behind the latest version.

History

History prior to March 2007 can be found on the PAQ history page.

paq8l was released Mar. 8 2007 by Matt Mahoney.

paq8hp12any (May 20 2007, Alexander Rhatushnyak, updated Jan. 9, 2009) is a specialized version which has won the Hutter Prize. It compresses English text better and faster than paq8l, but compresses binary data, images, etc. worse. It includes some dictionary files, which must be in the current directory when run. It does not compress or create directories or have a drag-and-drop interface. Older versions with source code.

To compress file enwik8 with 1 GB memory: paq8hp12any -7 enwik8.paq8hp12any enwik8 To decompress: paq8hp12any enwik8.paq8hp12any

paq8fthis2 (Jan Ondrus, Aug 12 2007), is based on paq8f (Matt Mahoney, Feb 28 2006) with improved JPEG compression.

paq8n (Matt Mahoney, Aug. 18, 2007) was created by inserting the JPEG model from paq8fthis2 into paq8l.

paq8o (Andreas Morphis, Aug. 22 2007) is derived from paq8n modified with an improved BMP model.

rafale.bmp Time lena.bmp Time ---------- ---- -------- ---- Original size 4,149,414 786,486 paq8n -5 634,547 66 419,735 17 paq8o -5 551,665 78 410,644 21 paq8osse -5 551,665 72 410,644 20

paq8osse is included in the paq8o distribution. It is for processors supporting SSE2 such as the Pentium 4 and Athlon. It is compatible with paq8o2 but 8% faster.

paq8o ver. 2 by Andreas Morphis, Aug. 24, 2007, is a minor bug fix to paq8o to fix the archive file name extension.

paq8fthis3 by by Jan Ondrus (Sept. 8, 2007) improves the JPEG model of paq8fthis2.

paq8o3 (KZ, Sept. 11, 2007) combines paq8o ver. 2 with the improved JPEG model from paq8fthis3 and the grayscale image PGM model introduced in paq8i by Pavel L. Holoborodko (Aug. 18, 2006), but not included in later PAQ versions until now.

paq8o4 ver 1 (KZ, Sept. 15, 2007) extends the PGM model of paq8o3 to 8 bit grayscale and paletted color BMP files.

paq8o4 ver. 2, released Sept. 17, 2007 by Matt Mahoney, is archive compatible with paq8o4 ver 1. It fixes directory creation and traversal and wildcard arguments, but is 8% slower because it was compiled with g++ instead of Intel C++.

paq8o5 by KZ, Sept. 21, 2007, is paq8o4v2 with the improved StateMap from lpaq1.

paq8fthis4 by Jan Ondrus, Sept. 27, 2007, is paq8f with an improved JPEG model (better than paq8o4/paq8o5). The archive also includes a faster JPEG model with less compression (paq8fthis_fast).

paq8o6 (mirror) by KZ, Sept. 28, 2007, improves on paq8o5 with the new JPEG model from paq8fthis4 and some code optimizations by Enrico Zeidler.

paq8o7 by KZ, Oct. 16, 2007, detects 4, 8 and 24 bit BMP, 8 bit PGM images, and has some improvements to the JPEG model.

paq8o8 by KZ, Oct. 23, 2007, improves JPEG compression further.

paq8o9 (mirror) by KZ, Feb. 16, 2007. Fixes a bug in .bmp detection that caused an infinite loop for files with invalid headers. Added grayscale .rgb support. Note: paq8o9 bug was fixed by KZ on Mar. 4, 2008. Mirror was updated on Mar. 9, 2008. No version number change.

paq8o10t is by KZ, June 11, 2008. Compression is better than paq8o9 on some files but worse on others. Discussion.

paq8p by Andreas Morphis, released Aug. 25, 2008, has greatly improved .wav compression and slightly improved .bmp compression. Discussion. Note: the SSE2 compiled version is not always archive compatible with the MMX version (noted by Rugxulo, Mar. 9, 2009).

here. paq8p1 (Oct. 8, 2008) by Kaidorav fixes some bugs in paq8p.

paq8p update: Jan 8, 2009. Added paq8o8sse2.asm (optimized assembler by Dark Shikari) and corresponding Windows Pentium 4+ compiled executable, paq8p_sse2.exe. It is archive compatible with paq8p.exe and might or might not be faster.

paq8p2 by Jan Ondrus, Mar. 2, 2009. Has some improvements in JPEG compression.

paq8o8z by Rugxulo, Jan.13, 2010, contains ports to several different operating systems (DOS, Windows (paq8o8zw.exe), OS/2, Linux 2.4 and 2.6, Sun, etc) and assembly languages. It contains run time CPU detection to select between SSE2, MMX or non vectorized x86 code appropriately. Will decompress paq8o8 archives. Replaces versions released Dec. 25, 2008, Feb. 28, 2009, Apr. 1, 2009, and Apr. 25, 2009. Based on paq8o8.

As of May 30, 2009, the latest version is paq8px_v40, with several new versions per day. See encode.su for discussion.

As of Apr. 18, 2013, the latest version is paq8pxd_v5. PAQ9A paq9a, Dec. 31, 2007, is an experimental archiver with compression and speed similar to lpaq1 and lacking specialized models. It has a different architecture than paq8. It uses LZP preprocessing to speed compression of highly redundant files, although it is slower than lpaq1 on other files. It also uses chains of 2-input mixers rather than a single mixer combining many predictions. The archive format and command line interface are like lpq1. The archive supports files over 2 GB. To compress: paq9a a archive [-1..-9] [[-s|-c] files...]... To decompress: paq9a x archive [new filenames...] List contents: paq9a l archive Options: -c compress remaining files (default) -s store remaining files -1..-9 select memory usage from 18 MB to 1585 MB, default -7 = 405 MB. Memory usage is about 12 + 3*2level MB. Memory option must appear first, e.g. paq9a a files.paq9a -4 file1 -s file2 -c file3 file4 compresses file1, stores file2, and compresses file3 and file4, creating archive files.paq9a. File names are stored exactly as entered (with or without a path). Wildcards are allowed if compiled with g++. Archives are solid and cannot be updated once created. Files can be renamed when extracted: paq9a x files.paq9a foo bar extracts file1 to foo, file2 to bar, then file3 and file4 without renaming. If a file already exists or if the stored filename has a path to a nonexistent directory, then extraction is skipped. LPAQ lpaq1 (July 24, 2007) is a "lite" version of PAQ, faster but with less compression. It is a single file compressor, not an archiver. To compress: lpaq1 5 input output To decompress: lpaq1 d input output 5 selects 102 MB memory. Options range from 0 (9 MB) to 9 (1542 MB). Option N uses 6 + 3*2N MB. lpaq1a by Matt Mahoney, Dec. 21, 2007. This is an experimental program (not a new version) that combines the model from lpaq1 with the asymmetric binary coder from fpaqb. It is about 1% slower than lpaq1, compresses 0.02% larger, and requires an extra 1.25 MB memory during compression. The purpose of the program is to demonstrate an alternative to arithmetic coding with performance that is nearly as good. I am no longer maintaining this code, but there have been many compression improvements (below): LPAQ History The first version of lpaq1 was released by Matt Mahoney on July 24, 2007. lpaq1 ver. 2 (Sept. 19, 2007) by Alexander Rhatushnyak is archive compatible with lpaq1 with speed optimizations (6% to 10% faster). lpaq2 (Sept. 20, 2007) by Alexander Rhatushnyak improves compression, but is not compatible with lpaq1. lpaq3 (Sept. 29, 2007) by Alexander Rhatushnyak includes a version tuned for large text files (elpaq3), compiled from the same source code with option -DWIKI. lpaq3a (Sept. 30, 2007) by Alexander Rhatushnyak, has slightly better compression on some files, but slightly worse on others. lpaq3a.exe was compiled with Intel C++ instead of g++, which might be faster on some machines (but is slower on my AMD Athlon-64). This archive also contains lpaq3e.exe, which is an Intel compile of elpaq3 (archive compatible). lpaq4 (Oct. 1, 2007) is by Alexander Rhatushnyak. lpaq4e is tuned for large text files. lpaq5 (Oct. 16, 2007) by Alexander Rhatushnyak includes a version for large text files (lpaq5e) with separate compression and decompression programs (lpaq5e-c.exe and lpaq5e-d.exe) all produced from the same source code. lpaq6 (Oct. 22, 2007) by Alexander Rhatushnyak includes a E8E9 filter for better compression of x86 executables. Compression times below are all about 16 seconds. Compressor Opt acrord32.exe mso97.dll ---------- --- ------------ --------- Uncompressed 3,870,784 3,782,416 lpaq5 9 1,287,001 1,651,519 lpaq5e 9 1,329,518 1,684,646 lpaq6 9 1,157,971 1,564,607 lpaq6e 9 1,187,698 1,591,053 lpaq7 by Alexander Rhatushnyak, Oct. 31, 2007. lpaq8 by Alexander Rhatushnyak, Dec. 10, 2007. Update Feb. 15, 2008: lpaq8.exe and lpaq8e.exe are no longer packed with Upack 0.399. This caused false alarms by some virus detectors. However the .exe files are now larger (but still under 30 KB). lpaq8e (included) is a version tuned for large text files. lpaq9m by Alexander Rhatushnyak, Feb. 20, 2009, is tuned for the large text benchmark. It includes a dictionary preprocessor (no source code) and a compressed dictionary. A large number of earlier versions (lpaq9e through lpaq9l) are omitted. See the large text benchmark for details. Note: you will need zpaq to extract the archive. Also, the executables contained within (DRT.exe and lpaq9m.exe) are compressed with upack, which compresses better than upx. Some virus detectors give false alarms on all upack-compressed executables. The programs are not infected. LPQ1 lpq1 (Dec. 23, 2007, v2 updated May 6, 2008, Matt Mahoney) is an archiver based on lpaq1 with support for files over 2 GB without any Windows or UNIX/Linux specific code (portable C++). The commands should be familiar to users of 7zip and rar. Archives are "solid" and cannot be updated once created. To create an archive: lpq1 a archive [options] files... To extract: lpq1 x archive [new names] To list contents: lpq1 l archive Compression options are -c to compress (default) or -s to store. Options can be mixed with filenames and apply to all files that follow, overriding previous options, for example: lpq1 a foo.lpq file1.txt -s file2.txt file3.txt -c file4.txt creates archive foo.lpq, compressing file1.txt and file4.txt and storing file2.txt and file3.txt without compression. Files are extrated in the order they are added and can be renamed during extraction: lpq1 x foo.lpq newfile1.txt newfile2.txt renames file1.txt to newfile1.txt, file2.txt to newfile2.txt, then extracts file3.txt and file4.txt without renaming. Extraction does not clobber files. If any file already exists or cannot be created, then it is skipped. lpq1 does not create folders. If a file is saved with a path (not recommended) and the directory does not exist during extraction, then the file is skipped. Version 2 fixes a bug in version 1 that caused it to fail (file not found) when compressing more than about 50 files. Archives format is unchanged. BBB bbb (Aug. 31, 2006) is a Big Block BWT (Burrows-Wheeler transform) compressor. It allows blocks as large as 80% of available memory, unlike other BWT compressors (bzip2, GRZipII, sbc, dark, nanozip, etc), which allow only 20%. This allows better compression of very large text files. However, it lacks a mechanism for handling slow sorting of long repeating strings, so it can be slow on highly compressible data. To compress/decompress: bbb cmd input output The command is a sequence of concatenated letters: c = compress (default), d = decompress. f = fast mode, needs 5x block size memory, default uses 1.25x block size. q = quiet (no output except error messages). bN, kN, mN = use block size N bytes, KiB, MiB, default = m4 (compression only). Commands should be concatenated in any order, e.g. bbb cfm100q foo foo.bbb means compress foo to foo.bbb in fast mode using 100 MiB block size in quiet mode. bbb uses a memory efficient BWT. For compression, blocks are first context-sorted in small blocks and then merged using temporary files. For the inverse transform, instead of building a linked list, the program builds an index to the approximate location of the next node, then searches linearly for the exact location. Fast mode (f) uses a normal BWT, but requires blocks no larger than 20% of memory. Fast/slow mode does not affect the compressed file format and is independent for compression and decompression. In either case, the context-sorted block is compressed with an adaptive order-0 model and arithmetic coded. SR2 sr2 (Aug. 3, 2007) is a symbol ranking compressor designed for high speed rather than good compression. It is a single file compressor and uses 6 MB memory. It models the last 3 bytes seen in the order-4 context by order of appearance, and models all other bytes in an order-1 context with arithmetic coding. To compress: sr2 input output To decompress: unsr2 input output History sr2 by Matt Mahoney, Aug. 3, 2007. sr3 modified by Nania Francesco Antonio, Oct. 28, 2007. Better compression, a little slower, recognizes some file types. There is only one program (no unsr2). sr3a modified by Andrew Paterson, Feb. 22, 2008, is about 5% faster than sr3 and compresses 3 bytes smaller. It will also compress and decompress in SR2 and SR3 formats. FLZP flzp (June 18, 2008) is a fast byte-oriented LZP compressor. It can be used as a preprocessor to improve the compression ratio and speed of low order compressors, for example: 57,366,279 enwik8.flzp 8 sec (2.2 GHz Athlon-64, WinXP Home) 63,391,013 enwik8.fpaq0 36 sec 39,879,072 enwik8.flzp.fpaq0 8+21 sec 36,800,798 enwik8.ppmd-o2 30,884,687 enwik8.flzp.ppmd-o2 30,017,979 enwik8.ppmd-o3 29,372,279 enwik8.flzp.ppmd-o3 The program uses 8 MB memory. It compresses by dividing the input into blocks up to 64KB, then using any unused bytes in the block to represent LZP match lengths. FPAQ FPAQ0 is a simple order-0 arithmetic file compressor for stationary sources (independent bytes with a uniform distribution throughout the file). If you want to build a custom compressor based on your own model, this is a good place to start. The code is much smaller, simpler, and faster than the other PAQ versions. This model only compresses well where the bytes are independent and statistics are uniform throughout the file, so you will probably want to change it. There are several versions. fpaq0.cpp originally posted to comp.compression on Sept. 3, 2004, posted here Oct. 9, 2004. To compress: fpaq0 c input output To decompress: fpaq0 d input output fpaq1.cpp is a 64-bit version that gets slightly better compression on purely stationary data, but is slower. It works like fpaq0 but compressed files are not compatible. Posted Jan. 10, 2006. fpaq0b.cpp by Fabio Buffoni uses the same model as fpaq0 but compresses as well as fpaq1 on stationary sources using only 32 bit arithmetic. It uses the coder from paqar/paq6fb which outputs one bit at a time and uses a carry counter achieving 30 bits precision. It is faster than fpaq1 but slower than fpaq0. Posted Jan. 10, 2006. fpaq0s.cpp by David A. Scott improves compression and speed over fpaq0b.cpp by coding 8-bit symbols without an explicit EOF bit or length field. It uses the end of the compressed file to mark the end of the uncompressed file, saving the O(log(n)) space normally required to encode the length. Posted Jan. 16, 2006. fpaq.zip by Nikolay Petrov is an assembly language implementation of fpaq0. It maintains archive compatibility but compresses 37% faster and decompresses 46% faster (on a 2.2 GHz Athlon-64 in 32 bit mode, XP home). Feb. 20, 2006. fpaq02.zip by David Anderson, May 27, 2007. It is a 64 bit version like fpaq1, but extended using the techniques of fpaq0s to exact rational representation of probabilities to 64 bits rather than 12 bits. The result is a tiny improvement in compression (a few bytes per million) but about 2.5 times slower than fpaq1. fpaq0p.zip by Ilia Muraviev, Apr. 15, 2007, uses an adaptive order 0 model. Instead of keeping a 0,1 count for each context, it keeps a probability and updates it by adjusting by 1/32 of the error. This is faster because it avoids a division instruction. fpaqa by Matt Mahoney, Dec. 15, 2007, is an adaptive order 0 model like fpaq0p, but is based on Jarek Duda's asymmetric binary coder (ABC) instead of an arithmetic coder. The coder has only one state variable (10-12 bits) so can be implemented using lookup tables without using multiplication. This is the first compressor to implement this coder. fpaqb by Matt Mahoney, Dec. 18, 2007, updated Dec. 20, 2007, is an improved ABC coder using bytewise I/O and no tables for decompression. It uses one table for compression to avoid division. It has improved compression (near the theoretical limit) and faster decompression. It also uses less memory. fpaqc by Matt Mahoney, Dec. 24, 2007 (documentation updated Dec. 25, 2007), is mainly a speed-optimized version of fpaqb. It also eliminates block size information from block headers, saving 3 bytes. fpaq0f by Matt Mahoney, Jan. 28, 2007 uses the bit history in each bitwise context as context, followed by arithmetic coding. fpaq0f2 by Matt Mahoney, Jan. 30, 2007 modifies fpaq0f by using a simplified bit history consisting of the last 8 bits, plus some minor improvements. D translation of fpaq0 by Leonardo Maffi was posted Feb. 8, 2008. The DMD compile is about 60% as fast as g++ -O3. Benchmarks The benchmarks below are for calgary.tar, the 14 files of the Calgary corpus as a tar file, and for enwik8, a 100 MB text file used in the Large Text Compression Benchmark and Hutter Prize. Compression and decompression times are for enwik8 in seconds on a 2.2 GHz Athlon-64 3500+ in 32-bit Windows XP. Memory is in MB. Options are selected for maximum compression with 2 GB memory (except paq8f with 1 GB) on enwik8. All of the high memory compressors have options to select less memory at the cost of some compression. As of 2009, timing results are tested on a Gateway M-7301U laptop with 2.0 GHz dual core Pentium T3200 (1MB L2 cache), 3 GB RAM, Vista SP1, 32 bit. Run times (on one core) are similar to my older computer. Program calgary.tar enwik8 Comp Decmp Mem Alg Best for ------------ --------- ---------- ----- ----- ---- --- -------- Original size 3,152,896 100,000,000 paq8hp12any -8 594,269 16,230,028 5700 5700 1850 CM English text (Hutter prize, slow) paq8o10t -8 594,587 17,772,821 14425 14372 1591 CM Best general purpose (very slow) zpaq ocmax.cfg,3 643,990 18,977,961 666 664 1861 CM Good compression, ZPAQ compatible paq9a -9 676,659 19,974,112 539 510 1585 LZP+CM Experimental lpaq8 9 676,409 19,523,803 368 371 1542 CM General purpose lpaq8e 9 681,497 18,982,007 342 347 1542 CM Large text files zpaq oc 699,191 20,941,558 239 245 246 CM Moderate compression, ZPAQ compatible bbb m100 798,705 20,847,290 369 247 146 BWT Very large text files (moderate speed) sr2 979,349 30,432,506 10 11 6 SR General purpose (fast) zpaq ocmin.cfg 1,027,229 33,460,960 41 38 4 LZP Fast compression, ZPAQ compatible flzp 1,691,924 57,366,279 8 4 8 LZP Speed or as a low order preprocessor fpaq0 1,903,024 63,391,013 33 35 <1 ord 0 Independent bytes, stationary (C++ source) fpaq 1,903,024 63,391,013 25 25 <1 ord 0 Independent bytes, stationary (assembler) fpaq0p 1,685,882 61,457,810 13 13 <1 ord 0 Independent bytes, nonstationary (C++ source) fpaqc 1,681,774 61,270,455 25 18 2 ord 0 Asymmetric coder source fpaq0f2 1,601,207 56,916,872 21 20 <1 ord 0 Independent bytes, adaptive ppmd -m256 -o10 -r1 756,106 21,388,296 88 89 256 PPM General purpose bzip2 -9 860,097 29,008,758 33 12 8 BWT General purpose zip -9 1,023,101 36,445,443 10 1.3 1.2 LZ77 General purpose Obsolete -------- paq8osse -8 594,798 17,916,451 12526 12457 1778 CM paq8f -7 605,782 18,289,559 6896 6900 854 CM paq8l -8 594,796 17,916,450 13600 13639 1643 CM paq8n -8 594,796 17,916,420 13488 13548 1643 CM paq8o -8 594,798 17,916,451 13585 13526 1643 CM paq8o3 -8 594,798 17,916,450 13458 13453 1636 CM paq8o4 v1 -8 594,798 17,916,450 12678 12656 1636 CM paq8o5 -8 594,700 1846 CM paq8o6 -8 594,734 17,904,721 13953 13972 1712 CM paq8o7 -8 594,734 17,904,756 13914 13853 1574 CM paq8o8 -8 594,734 17,904,756 13937 13915 1574 CM paq8fthis2 -7 605,782 855 CM -8 605,782 18,075,265 6910 6931 1693 CM paq8fthis3 -8 605,782 1693 CM paq8fthis4 -8 605,782 1693 CM paq8fthis_fast -8 605,782 1693 CM Faster JPEG version of paq8fthis4 lpaq1 v1 9 681,811 19,755,948 365 360 1539 CM lpaq1 v2 9 681,811 19,755,948 341 346 1539 CM lpaq2 9 681,854 19,755,471 329 337 1539 CM lpaq3 9 679,493 19,580,276 369 373 1542 CM elpaq3 9 683,949 19,392,604 341 345 1542 CM lpaq3a 9 679,045 19,585,951 424 389 1542 CM lpaq4 9 679,034 19,583,905 374 375 1542 CM lpaq4e 9 683,848 19,358,662 346 350 1542 CM lpaq5 9 680,101 19,455,395 367 368 1542 CM lpaq5e 9 683,016 19,027,721 348 364 1542 CM lpaq6 9 679,053 19,562,861 385 391 1542 CM lpaq6e 9 681,405 19,054,076 369 376 1542 CM lpaq1a 9 681,943 19,759,778 365 349 1540 CM drt|lpaq9m 8/9 663,480/ 17,964,751 209 209 774/1542 CM fpaq1 2,110,511 63,502,003 48 49 <1 ord 0 fpaq0b 1,902,674 63,375,460 46 44 <1 ord 0 fpaq0s 1,902,673 63,375,457 43 42 <1 ord 0 fpaq02 2,110,506 63,501,997 134 132 <1 ord 0 fpaqa 1,682,231 61,310,408 27 24 2 ord 0 Asymmetric binary coder source fpaqb 1,681,777 61,270,458 26 18 2 ord 0 Bytewise ABC coder source fpaq0f 1,622,153 58,088,230 27 24 <1 ord 0 Independent bytes, adaptive Additional benchmarks can be found here: Large Text Compression Benchmark by Matt Mahoney

Maximum Compression by Werner Bergmans

Black Fox

Squeeze Chart by Stephan Busch

Squxe

Compression Max (in French)

Ultimate Command Line Compression by Johan de Bock

Emilcont Ultracompression by Berto Destasio.

JPEG Benchmarks The PAQ series compressors beginning with PAQ7*, with the exception of PAQ8HP*, will compress JPEG images, including images embedded in other files (embed). They compress only baseline JPEG (base), which make up about 90-95% of images. They do not compress progressive mode (prog), the other 5-10%. Compression times are in seconds for a10.jpg. Both test files are baseline JPEG. Program a10.jpg dscn3974.jpg Comp Base Prog Embed Non-JPEG ------- ------- ------------ ---- ---- ---- ----- -------- Original size 842,468 1,114,198 Stuffit 11 (best) 638,540 834,079 1 yes yes no yes paq8o8 -6 638,206 827,071 30.8 yes no yes yes paq8fthis_fast -6 673,112 862,834 9.4 yes no yes yes PackJPG 2.3 697,822 873,261 1.5 yes yes no no lprepaq 5 699,692 1,083,737 9.5 yes no yes yes zpaq ocjpg_test2.cfg 716,043 916,353 26.9 yes no no no jpg2dct | ppmd -o2 770,302 964,803 2.0 yes yes no no (lossy) rings 0.1 819,169 1,075,239 0.3 yes yes ppmd -o2 833,336 1,094,853 1.2 no no no yes Obsolete -------- paq8f -6 698,214 880,902 17.6 yes no yes yes paq8l -6 698,510 881,244 21.5 yes no yes yes paq8fthis2 -6 660,740 843,264 22.7 yes no yes yes paq8n -6 661,321 843,920 25.1 yes no yes yes paq8fthis3 -6 652,266 835,871 23.7 yes no yes yes paq8o3 -6 652,040 835,556 26.8 yes no yes yes paq8fthis4 -6 645,102 831,258 29.7 yes no yes yes paq8o6 -6 643,952 829,142 31.4 yes no yes yes PackJPG 2.2 697,818 873,261 1.6 yes no no no paq8o7 -6 642,092 827,713 33.8 yes no yes yes jpg2dct is a JPEG preprocessor by Jean-Pierre Demailly, Sept. 4, 2007 (ported from Linux to Windows by Matt Mahoney, Sept. 15, 2007). It expands a JPEG file into DCT coefficients, which makes it larger but sometimes more compressible to other compressors. It is lossy in that the inverse transform does not restore exactly the same file, although it does restore a pixel for pixel identical image. Incompressible Data? SHARND generates cryptographically secure pseudo random files (I think. I am looking for feedback). These should not be compressible by any algorithm that does not know the key used to generate it. To use: sharnd key n to generate the n-byte random file, sharnd.out. The key can be any ASCII string (quoted if it contains spaces). Data is generated as a series of 20-byte strings, x[1], x[2], x[3]... such that x[i] = SHA1(x[i-1] + key), and x[0] = 0. sharnd.c, June 7, 2004

sha1.c from RFC 3174

sha1.h from RFC 3174

sharnd.exe, 29,860 bytes (16 bit Windows executable, compiled with g++, UPX)

sharnd.32.exe, 27,220 bytes (32 bit Windows executable)

sharnd.64.exe, 68,096 bytes (64 bit Windows executable)

The 100,000 byte file sharnd_challenge.dat posted on June 21, 2004 was generated by sharnd using a secret key. The SHARND challenge is to guess either the key (an ASCII string less than 80 bytes) or any of the bytes following the data. Furthermore, without knowing the key, it is believed to be impossible to compress this file, i.e. to find a decompression program such that its size (as source code or executable) plus the size of its input is 99,999 bytes or less. (Of course, such a program does exist. It is posted here, and its input is less than 80 bytes). BARF - Ultimate Compression? Compress any nonempty file. Recursively compress files to arbitrarily small sizes. Compress the Calgary corpus to 1 byte per file in under 1 second. Is it possible with the Better Archiver with Recursive Functionality? :-)

Matt Mahoney, mattmahoneyfl@gmail.com