The Art of Lossless Data Compression vol. 26t

Here are the results of tests performed in December 2003 to compare lossless compression of "plain" texts by all known good enough programs developed for such purpose, including UHArc, PPMd, Bzip2, RAR, ACE and 7-zip. See Archive Comparison Test by J.Gilchrist for more details: http://compression.ca If anybody wants to start or continue such tests, or can suggest some other sets of texts, or other compression programs, (not sources or algorithm descriptions, executable programs only) or knows we have missed something important, (some new fantastic technology, an algorithm or even a program capable of lossless compression of up to 1000:1 etc.) please let us know immediately: artest@inbox.ru Thank you! [[1]] COMPRESSION QUALITY ========================= (see also [[2]] Speed [[3]] Details [[4]] Comments) Last seventh line shows results for the sum of all 1231 texts in six sets. Origin DURILCA Entropy Slim RKC EPM Compressia PAQ6 PPMonstr PPMN 555.90% 100.04 100.47 100.13 100% 101.76 100.92 101.06 102.24 106.01 567.66% 100% 104.53 108.58 104.13 110.79 108.38 110.82 113.42 115.01 455.43% 100% 104.56 108.09 106.31 109.96 108.47 110.10 112.89 111.94 513.18% 100% 104.14 109.85 107.26 112.58 110.55 113.25 115.19 114.70 799.24% 100.41 101.80 108.25 124.62 102.27 106.97 102.55 100% 115.93 432.59% 100% 122.52 122.68 127.29 123.95 130.67 124.45 125.58 123.82 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 506.08% 100% 108.04 111.99 112.20 113.73 114.12 114.19 115.87 116.11 ASH BEE PPMd RAR UHArc DC SBC BZip2 7-zip pkzip 101.08 107.67 108.14 109.58 105.80 109.17 109.49 124.64 152.86 159.97 113.49 117.44 118.78 120.19 117.81 119.46 120.79 136.85 178.63 186.09 112.30 116.43 117.05 117.39 115.89 116.26 117.70 130.16 163.57 170.65 114.86 119.36 120.27 120.44 120.33 120.57 121.66 137.33 174.83 181.91 110.63 111.40 109.06 110.11 117.68 121.44 118.06 149.67 197.06 205.34 126.87 128.81 129.39 130.16 138.55 136.80 135.14 143.16 175.08 181.85 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~..~~~~~~~~~~~~~~~~~~~~~~ 116.43 120.13 120.75 121.21 123.03 123.18 123.45 137.84 173.93 181.03 Results of many other programs are in full version only, TEXTS.DAT file. [[2]] Speed =========== Canterbury Corpus Large Set http://corpus.canterbury.ac.nz/resources/large.zip was used for this test, and a 970MHz PC with 256Mb RAM and Windows98. Programs, Compression/ Overall Average Users' Compressed options Extraction, Score Score Size seconds seconds, % seconds, % bytes , % no compression 0 0 4446 559 4446 577 16005619 600 7za a -t7z 88 1 1104 138 1024 133 3650998 138 7za a -t7z -mx 154 1 982 123 843 109 2975903 113 7za a -tzip 23 0 1244 156 1223 158 4393623 167 7za a -tzip -mx 44 0 1268 159 1227 159 4401160 167 ash04a /o6 /m230 110 114 1100 138 1001 130 3154310 119 ash04a /o9 /m230 164 170 1130 142 982 127 2863384 108 ash04a /o16 /m230 221 210 1524 191 1324 171 3932449 149 ash04a /o6 /m230 /s16 117 121 1113 140 1007 130 3146157 119 ash04a /o9 /m230 /s16 172 177 1142 143 987 128 2855460 108 ash04a /o16 /m230 /s16 235 224 1541 193 1329 172 3895407 148 bee a -m1 70 71 1049 132 986 128 3268345 124 bee a -m2 144 148 1167 146 1037 134 3147914 119 bee a -m3 201 204 1267 159 1085 140 3099133 117 durilca e -o8 -t2(31) 77 77 964 121 895 116 2915855 110 durilca e -o9 -t2(31) 80 80 969 121 897 116 2911029 110 durilca e -o10 -t2(31) 82 83 970 122 896 116 2899602 110 durilca e -o12 -t2(31) 83 85 970 122 895 116 2886375 109 durilca e -o16 -t2(31) 85 86 972 122 895 116 2880306 109 durilca e -o32 -t2(31) 87 88 975 122 896 116 2878335 109 durilca e -o64 -t2(31) 88 89 977 122 898 116 2878490 109 durilca e -o128 -t2(31) 88 90 978 123 898 116 2878614 109 epm9 c008 117 116 1034 130 929 120 2882720 109 epm9 c012 142 142 1119 140 991 128 3007492 114 epm9 c016 147 147 1138 143 1006 130 3040288 115 grzipii e 16 10 908 114 893 115 3170895 120 paq6v2a 388 387 1597 200 1247 161 2957673 112 paq6v2a -6 2551 2710 6002 754 3706 481 2664718 101 rar a -m1 13 1 1239 155 1227 159 4408290 167 rar a -m2 21 1 1169 147 1150 149 4131324 157 rar a -m3 36 1 1156 145 1123 145 4026937 153 rar a -m4 19 11 914 114 896 116 3178761 120 rar a -m5 25 16 920 115 898 116 3164814 120 rar a -m5 -s 25 16 923 116 900 116 3173148 120 rar a -mc16t -s 35 26 992 124 960 124 3347746 127 rar a -mc16t+ -s 35 26 992 124 960 124 3347746 127 rar a -mc16:128t -s 40 31 955 120 919 119 3180234 120 rar a -mc16:128t+ -s 40 31 955 120 919 119 3180234 120 rar32 a -mc16t -s 37 28 996 125 962 124 3347746 127 rkc -mf -M230M -o8 94 6 1083 136 998 129 3537812 134 rkc -mx -M230M -o8 120 130 1019 128 911 118 2766843 105 rkc -mxx -M230M -o8 276 276 1320 166 1071 139 2763319 105 rkc -mxx -M230M -o12 297 297 1334 167 1067 138 2663521 101 rkc -mxx -M230M -o16 318 312 1361 171 1074 139 2630041 100 rkc -mxx -M230M -ft 316 316 1367 172 1082 140 2645990 100 rkc -mf -M230M -td+ 94 6 1083 136 998 129 3537812 134 rkc -mx -M230M -td+ 144 156 1032 129 902 117 2633402 100 rkc -mxx -M230M -td+ 305 311 1346 169 1072 139 2630041 100 slim a -d32 -w21 581 650 2034 255 1511 196 2890141 109 slim a -d16 -w21 577 648 2029 255 1509 195 2890280 109 slim a -d8 -w21 566 626 1995 250 1486 192 2891166 109 slim a -d4 -w21 521 583 1908 239 1439 186 2892897 109 uhbc e 40 31 951 119 914 118 3164344 120 //previous ace32 a -d4096 66 2 1124 141 1058 137 3801917 142 ace32 a -d4096 -m1 31 2 1134 143 1104 143 3965841 149 ace32 a -d4096 -m5 206 2 1249 157 1045 136 3746553 140 arh a 38 40 1091 137 1053 137 3647067 137 arh a -2 -1 68 40 1121 141 1054 137 3647067 137 ba -k -50 35 12 964 121 929 121 3298943 124 bix a -mdg -s 92 1 1069 134 978 127 3514944 132 boa -m1 86 88 1253 158 1168 152 3886863 146 boa -m15 139 141 1165 146 1027 133 3182739 119 boa -m15 -s 138 140 1148 144 1011 131 3132810 117 bzip2 -k 21 6 1032 130 1011 131 3616113 136 bzip2 -k -9 20 6 1031 130 1011 131 3616113 136 Entropy t o12 94 95 1003 126 910 118 2932445 110 Entropy t o16 98 99 1001 126 904 117 2892711 108 Entropy t o32 105 106 1009 127 905 118 2873677 108 Entropy t o64 112 111 1022 128 911 118 2873318 108 compcl c -b15 37 20 904 114 868 113 3049569 114 compcl c -b15 -s 38 29 808 102 770 100 2668128 100 dc e 13 7 903 114 890 116 3179173 119 dc e -b16300 -mt5 17 7 795 100 778 101 2773427 104 eri a 39 17 936 118 897 116 3168414 119 eri a -m3 59 21 996 125 937 122 3295385 124 eri a -m6 59 21 989 124 931 121 3272926 123 gcac a 26 12 980 123 954 124 3390603 127 gcac s 26 12 981 123 955 124 3395064 127 imp98 a -mm 31 1 1175 148 1143 148 4112387 154 imp98 a -mm -2 13 5 999 126 986 128 3533761 132 imp98 a -2 -s4 13 5 999 126 986 128 3533693 132 pkzip -es 1 1 1654 208 1652 215 5945622 223 pkzip -a 4 1 1308 164 1304 169 4691491 176 pkzip -exx 16 1 1296 163 1280 166 4605942 173 ppmdi e -o7 -m232 11 12 904 114 893 116 3169000 119 ppmdi e -o12 -m232 25 26 915 115 891 116 3113630 117 ppmdi e -o16 -m232 27 28 916 115 890 116 3100943 116 ppmn_km e -o6 -MT1 30 30 931 117 901 117 3132278 117 ppmn_km e -o8 -MT1 64 65 993 125 929 121 3107654 116 ppmn_km e -o9 62 63 990 125 929 121 3115560 117 ppmn_km e -o9 -M:50 49 50 949 119 900 117 3058436 115 ppmonstr e -o7 -m232 64 67 974 123 911 118 3035498 114 ppmonstr e -o8 -m232 71 74 980 123 910 118 3007964 113 ppmonstr e -o64 -m232 101 103 1020 128 920 119 2937387 110 qlfc a 22 11 973 122 952 124 3385084 127 rk -mf2 50 20 1108 139 1058 137 3735704 140 rk -mx1 144 143 1147 144 1004 130 3093640 116 rk -mx2 173 173 1203 151 1032 134 3086312 116 sbc c -b63 29 9 914 115 885 115 3151930 118 sbc c -os -b63 29 9 810 102 782 101 2779632 104 szip -o4 4 10 1027 129 1023 133 3647445 137 szip -o6 17 14 996 125 979 127 3475264 130 szip -o8 -b41 27 17 973 122 947 123 3348344 125 zzip a 21 11 977 123 956 124 3400243 127 zzip a -mx 22 12 973 122 952 124 3383060 127 zzip a -mx -30m 30 12 940 118 910 118 3233147 121 abc13 -c 20 9 950 119 931 120 3313820 124 abc24 -c 29 16 923 116 897 116 3159570 118 uharc a -m1 -md32768 63 5 1026 129 969 125 3446069 129 uharc a -m2 -md32768 100 5 980 123 890 115 3151572 118 uharc a -m3 -md32768 110 5 973 122 874 113 3087249 115 uharc a -mz -md32768 8 9 1084 136 1077 139 3842041 144 uharc a -mx -md32768 60 55 936 117 882 114 2953184 110 ybs -m1m 22 8 952 119 931 120 3316356 124 ybs -m2m 25 8 937 117 915 118 3255538 122 ybs -m4m 28 8 919 115 894 116 3178183 119 ybs -m8m 31 8 905 113 877 113 3116271 116 ybs -m16mu 33 9 835 105 805 104 2852642 106 ybs -m16mu -r 34 9 841 105 811 105 2874130 107 ybs_d -m16mu 34 9 836 105 805 104 2852642 106 Overall score is calculated by adding compression time, extraction time, and time it would take to transfer the compressed file over a 28,800bps network: (compressed_size)/3600 Average Users' score is calculated by adding (compress_time/10)+ extract_time + time it would take to transfer the compressed file over a 28,800bps network. Compression time is divided by 10 here, because more than 90% of people would never compress anything during their life (with compression programs), but they use compressed data almost _every_ time they use computers and/or Internet. That's why compression time is not so actual for them. [[3]] Details ============= are no longer put to this main text (thousands of lines reporting about 60,000 results on 1231 files in 6 sets), but can be found in FULL version with TEXTS.DAT and *.BAT at http://compression.ru/artest/artest26.zip or http://artest1.tripod.com/artest26.zip [[4]] Comments ============== Links to download programs and Homepages are now in links.htm file What's new: ~~~~~~~~~~~ 12 new programs were tested: ASH 04a 7-zip 3.13 RAR 3.30b5 UHBC 1.0 EPM 9 Slim 0.021a BEE 0.7.7 Durilca 0.3a PAQ 6v2 RKC 1.02 GRZipII 0.2.3 BWIC Latest beta versions of DC, Entropy, UHArc were available from authors by e-mail request: Entropy: artest@inbox.ru DC: EdgarBinder@t-online.de UHArc: Uwe.Herklotz@gmx.de Results of many other programs are in full version only, TEXTS.DAT file. The set of Russian texts was at http://arte.nm.ru, but now only artest@inbox.ru WARNINGS: ~~~~~~~~~ Beta versions of RKC, EPM and BWIC fail to compress and/or decompress many files. Authors are notified. ASH 04a can fail to decompress some large files if it lacks memory. BA 1.00beta5 can't correctly decompress shaks12.txt and set used for speed measurements. DC 0.99.158b failed to decompress 1DFRE10.dc , ANDES10.dc , and BTI0110.dc, saying "Corrupted block" (while t(est) command writes "Test successful"). Problems in all other compressors were not found. ESP, Rkive and many other programs are not tested any more, their results and links can be found in previous volumes of ARTest. The LATEST RELEASE, and all previous volumes can be found at http://compression.ru/artest/ Send your suggestions, comments to artest@inbox.ru With best kind regards, A.Ratushnyak Back to main ARTest page