LZMA Докладчик: Гареев Роман email:

advertisement
LZMA
Докладчик: Гареев Роман email: gareevroman@gmail.com
LZMA
(Lempel-Ziv-Markov chain-Algorithm)


7z(7-Zip)

LZMA SDK
2/31
3/31
4/31
LZMA:
 LZ77(Sliding Window)
 Deflate: Zip and Gzip
 Range Encoding
5/31
LZ77 (Sliding Window)
Jacob Ziv
Abraham Lempel
6/31
In computer science and information theory, data compression,
source coding, or bit-rate reduction involves encoding information using
fewer bits than the original representation. Compression can be either
lossy or lossless. Lossless compression reduces bits by identifying and
eliminating statistical redundancy. No information is lost in lossless
compression. Lossy compression reduces bits by identifying marginally
important information and removing it.
Compression is useful because it helps reduce the consumption
of resources such as data space or transmission capacity. Because
compressed data must be decompressed to be used, this extra processing
imposes computational or other costs through decompression. For
instance, a compression scheme for video may require expensive
hardware for the video to be decompressed fast enough to be viewed as
it is being decompressed, and the option to decompress the video in full
before watching it may be inconvenient or require additional storage. The
design of data compression schemes involve trade-offs among various
factors, including the degree of compression, the amount of distortion
introduced (e.g., when using lossy data compression), and the
computational resources required to compress and uncompress the data.
7/31
Закодированный текст… sir sid eastman easily t eases sea sick seals … Текст для …
(16, 3, “e”)
Закодированный текст… sir sid eastman easily tease s sea sick seals …. Текст для …
sir sid eastman ⇒ (0,0,“s”)
s ir sid eastman e ⇒ (0,0,“i”)
si r sid eastman ea ⇒ (0,0,“r”)
sir sid eastman eas ⇒ (0,0,“ ”)
sir sid eastman easi ⇒ (4,2,“d”)
8/31
Huffman Coding
a1
0.4
a2
0.2
a3
0.2
a4
0.1
a5
0.1
0.4
0.2
a2
0.2
a3
0.1
0
a1
a4
a12345
0
a2345
1
a345
1
a5
0
1
0.6
0.4
0
a45
0.1
1
1.0
0.2
9/31
Deflate: Zip and Gzip
Phillip Katz
10 /31
(сдвиг, длина, символ)
⇒ (сдвиг, длина)
Закодированный текст… old ...she needs..then…there… the new... Текст для пр…
Режимы:

Normal

High-compression

Fast
Режимы сжатия:
•Без сжатия
•Сжатие с фиксированным размером
таблиц
•Сжатие с индивидуальными таблицами,
создаваемыми для текущей информации
11 /31
“range encoding”
“Handbook of Data Compression” David Salomon, Giovanni Motta
12 /31
LZMA
Закодированный текст… … the new...
Текст для пр…
индекс
bt2
Binary Tree with 2 bytes hashing
bt3
Binary Tree with 3 bytes hashing
bt4
Binary Tree with 4 bytes hashing
hc4
Hash Chain with 4 bytes hashing
“hash-chain”, “binary-tree”
13 /31
Hash-chain
XY
…
123
…
123
…
…
…
24
…
…
14/31
Binary-tree
…abm…abcd2…abcx…abcd1…aby…
11
24
30
57
78
62
62
11
24
abm…
62
abcd2…
11
abm…
30
24
abcd2…
62
abcx…
57
11
abm…
62
abcd1…
30
24
abcd2…
abcx…
11
abm…
aby…
78
abcd1…
57
30
24
abcx…
11
abm…
15 /31

Пример реализации LZMA(реализация на JAVA из LZMA SDK)
16 /31
LZMA SDK

2004

ANSI-C/C++/C#/Java

lzma.exe

lzma.txt

7zFormat.txt

history.txt
17 /31
Основные характеристики LZMA SDK

Различный размер словаря

Предполагаемая скорость сжатия: около 2MB/s на 2 GHz CPU

Предполагаемая скорость распаковки:
20-30 MB/s на 2 GHz Core 2 или AMD Athlon 64
1-2 MB/s на 200 MHz RISC

Небольшое количество затрат памяти для распаковки(16 KB + размер
словаря)

Поддержка многопоточности
18 /31
Основные опции LZMA SDK





a{N} Режим сжатия. 0, 1, 2 fast, normal, max
-d{N}
-si
-so
-mf{MF_ID}
MF_ID
bt2
bt3
bt4
hc4
Memory
d * 9.5 + 4MB
d * 11.5 + 4MB
d * 11.5 + 4MB
d * 7.5 + 4MB
Description
Binary Tree с 2 байтным хэшированием
Binary Tree с 3 байтным хэшированием
Binary Tree с 4 байтным хэшированием
Hash Chain с 4 байтным хэшированием
19 /31
Lasse Collin
http://tukaani.org
20 /31
Сравнение Gzip, Bzip2 и LZMA

AMD mobile Athlon XP2400+

512 MB RAM

Linux 2.6.12

gzip 1.3.3, bzip2 1.0.3, LZMA SDK 4.17 (lzmash)
Обращалось внимание на:

размер файлов после сжатия

время распаковки

память, требуемая для распаковки

обычный формат, который все знают
21 /31
Tar archive OpenOffice.org 1.1.4(Linux) (203 MB)
Compressed size / Uncompressed size * 100%
gzip
bzip2
lzmash
1
40,6%
35,8%
31,7%
2
39,9%
34,9%
29,2%
3
39,3%
34,5%
28,0%
4
38,2%
34,3%
27,4%
5
37,5%
34,2%
26,7%
6
37,2%
34,1%
26,4%
7
37,1%
34,1%
26,1%
8
37,1%
34,0%
25,7%
9
37,0%
34,0%
25,4%
22 /31
Tar archive OpenOffice.org 1.1.4(Linux) (203 MB)
Compression time
Decompression time
gzip
bzip2
lzmash
gzip
bzip2
lzmash
1
11.5s
1m 26s
0m 58s
1
3.3s
16.5s
11.3s
2
12.0s
1m 40s
2m 7s
2
3.3s
24.2s
10.5s
3
13.7s
1m 54s
4m 58s
3
3.3s
29.2s
10.5s
4
15.1s
2m 5s
5m 26s
4
3.3s
32.1s
10.4s
5
18.4s
2m 11s
6m 47s
5
3.2s
34.2s
10.2s
6
24.5s
2m 18s
7m 30s
6
3.2s
35.4s
10.2s
7
29.4s
2m 25s
8m 24s
7
3.2s
36.5s
10.1s
8
45.5s
2m 32s
10m 59s
8
3.2s
37.5s
10.0s
9
66.9s
2m 37s
12m 20s
9
3.1s
38.2s
10.0s
23 /31
Tar archive The Linux kernel 2.6.11.0 source (199 MB)
Compressed size / Uncompressed size * 100%
gzip
bzip2
lzmash
1
27,8%
21,1%
21,1%
2
26,5%
19,7%
18,7%
3
25,7%
19,1%
16,7%
4
23,9%
18,7%
16,1%
5
22,9%
18,4%
15,6%
6
22,6%
18,2%
15,2%
7
22,5%
18,1%
14,8%
8
22,4%
17,9%
14,5%
9
22,4%
17,8%
14,3%
24 /31
Tar archive The Linux kernel 2.6.11.0 source (199 MB)
Compression time
Decompression time
gzip
bzip2
lzmash
gzip
bzip2
lzmash
1
8.3s
1m 9s
0m 45s
1
2.8s
12.8s
7.7s
2
8.7s
1m 22s
1m 45s
2
2.7s
19.4s
6.9s
3
9.8s
1m 34s
5m 10s
3
2.6s
23.8s
6.4s
4
11.1s
1m 45s
5m 43s
4
2.5s
26.4s
6.3s
5
13.8s
1m 57s
7m 39s
5
2.5s
28.3s
6.3s
6
17.8s
2m 2s
8m 23s
6
2.4s
29.6s
6.2s
7
20.7s
2m 11s
9m 11s
7
2.4s
30.6s
6.2s
8
29.7s
2m 21s
11m 34s
8
2.4s
31.3s
6.1s
9
40.9s
2m 26s
12m 31s
9
2.4s
32.1s
6.1s
25 /31
XMMS 1.2.10 binary package (5.2 MB)(Slackware 10.1)
Compressed size / Uncompressed size * 100%
gzip
bzip2
lzmash
1
39,3%
32,8%
26,0%
2
38,4%
29,3%
20,7%
3
37,7%
28,0%
18,8%
4
36,9%
27,0%
18,3%
5
36,2%
26,6%
18,0%
6
36,0%
26,1%
17,9%
7
35,9%
26,0%
17,9%
8
35,9%
25,7%
17,8%
9
35,8%
25,2%
17,8%
26 /31
XMMS 1.2.10 binary package (5.2 MB)(Slackware 10.1)
Compression time
Decompression time
gzip
bzip2
lzmash
gzip
bzip2
lzmash
1
0.3s
2.4s
1.4s
1
0.1s
0.4s
0.3s
2
0.3s
2.9s
2.7s
2
0.1s
0.6s
0.2s
3
0.4s
3.2s
6.2s
3
0.1s
0.7s
0.2s
4
0.4s
3.3s
6.6s
4
0.1s
0.8s
0.2s
5
0.5s
4.6s
8.2s
5
0.1s
0.9s
0.2s
6
0.7s
5.6s
8.5s
6
0.1s
0.9s
0.2s
7
0.8s
4.7s
8.6s
7
0.1s
0.9s
0.2s
8
1.1s
4.9s
10.5s
8
0.1s
1.0s
0.2s
9
1.8s
5.1s
10.5s
9
0.1s
1.0s
0.2s
27 /31
XMMS 1.2.10 source tarball (15.2 MB)
Compressed size / Uncompressed size * 100%
gzip
bzip2
lzmash
1
29,5%
23,2%
21,2%
2
28,6%
19,9%
13,3%
3
27,9%
18,3%
12,0%
4
26,4%
17,2%
11,3%
5
25,7%
16,7%
10,8%
6
25,4%
16,2%
10,3%
7
25,3%
15,7%
9,7%
8
25,3%
15,4%
9,6%
9
25,3%
15,1%
9,6%
28 /31
XMMS 1.2.10 source tarball (15.2 MB)
Compression time
Decompression time
gzip
bzip2
lzmash
gzip
bzip2
lzmash
1
0.7s
6.1s
3.5s
1
0.2s
1.0s
0.6s
2
0.7s
7.3s
6.0s
2
0.2s
1.5s
0.4s
3
0.8s
8.5s
19.0s
3
0.2s
1.9s
0.4s
4
0.9s
9.9s
19.9s
4
0.2s
2.1s
0.4s
5
1.1s
11.2s
28.9s
5
0.2s
2.3s
0.4s
6
1.4s
11.0s
30.1s
6
0.2s
2.5s
0.4s
7
1.7s
12.5s
30.9s
7
0.2s
2.6s
0.4s
8
2.5s
15.9s
41.7s
8
0.2s
2.7s
0.4s
9
2.9s
17.5s
41.7s
9
0.2s
2.8s
0.4s
29 /31
Memory requirements
RAM usage on compression
gzip
bzip2
lzmash
1
<1 MB
2 MB
2 MB
2
<1 MB
2 MB
3
<1 MB
4
RAM usage on decompression
gzip
bzip2
lzmash
1
<1 MB
1 MB
1 MB
12 MB
2
<1 MB
2 MB
2 MB
3 MB
12 MB
3
<1 MB
2 MB
1 MB
<1 MB
4 MB
16 MB
4
<1 MB
2 MB
2 MB
5
<1 MB
5 MB
26 MB
5
<1 MB
3 MB
3 MB
6
<1 MB
5 MB
45 MB
6
<1 MB
3 MB
5 MB
7
<1 MB
6 MB
83 MB
7
<1 MB
3 MB
9 MB
8
<1 MB
7 MB
159 MB
8
<1 MB
4 MB
17 MB
9
<1 MB
7 MB
311 MB
9
<1 MB
4 MB
33 MB
30 /31
Спасибо за внимание!
31/31
Download