2018年1月31日 星期三

Introduction to Coding Theory and Data Compression - Part 1


Data compression has been an important subject since the dawn of electronic age. It was more so when the cost of transmissions and the cost of storing information on the storage devices were high. Nowadays both of these costs are much lower than before. Is there still a place for data compression?

It is true that many of us no longer compress (or zip) our data files these days. But for photos or video images we are still using data compression. For example, the Jpeg format that we use today (which has a compression ratio of 10:1) are based on very complicated compression techniques.

Data compression can be traced back to Claude Shannon’s 1948 paper “A Mathematical Theory of Communication”, where he introduced the concept of “entropy” to measure the statistical nature of the source.  By source we mean the set of objects or symbols we will assemble to transmit or store. 

If the source consists of 4 symbols s1, s2, s3, s4, to transmit these symbols a simple way is to use 2 bits for their representations. For example,  00 for  s1, 01 for s2, 10 for s3, and 11 for s4.

Within this article all logarithm functions mentioned will have the base of 2. So log n = log(2, n)

In general, if we have n symbols in the source, we need to use log n bits  (round up to the nearest integer if necessary) to represent all these symbols. This is because if we use m binary bits to represent n symbols, then 2 to the power of m should equal or greater than n   (2^m >= n).  Therefore by the definition of logarithm we have m >= log n.

So we need a string at least of log 4 = 2 bits to represent 4 symbols.

Example 1:  The source is still s1, s2, s3, s4. But now we assume the probability for s1 to occur is  ½, the probability for s2 is ¼, the probabilities for s3 and s4 are both 1/8.  If we accept a variable length coding scheme then we can reduce the “average length” for these symbols.

In this case we use 0 for s1, 10 for s2, 110 for s3, and 111 for s4. So the number of bits used (the length) for coding these 4 symbols are 1,  2, 3, 3 respectively.

The average length of this coding scheme is therefore  1 * ½  + 2 * ¼ + 3 * 1/8 + 3 * 1/8 = 1.75
This number is also the expected value of the code length in Probability Theory.

Finding a coding scheme with the shortest average length is one of the main goals in coding theory.

A “real life example” of  s1, s2, s3,s4 above are the sunny, cloudy, light rain,  and heavy rain weathers for a town in Florida, and the probabilities are coming from their weather statistics. In that case to tell people in New York or Toronto their current weather in Florida they only need to transmit one single symbol, and the average length is only 1.75.


This variable length transmission does have its own disadvantages. It is more difficult for the receiving end to decode the message. Furthermore, it may cause some confusion or ambiguity when decoding the message.  For example, once you use 10 for s2, you can’t use 101 or 100 for the other symbols unless you have another means to resolve the ambiguity.  Since there is no stop bit to mark the symbol boundary the coded message may in some case be interpreted in more than one way if 10 appear once more as a prefix (or the front part) of the codeword for another symbol.

The solution is to restrict your coding scheme to “prefix-free”. In other words, a symbol’s code can not appear again as a prefix of the codeword for another symbol.

One may think that a coding scheme has to be decodable to its original source (this kind of compression is called lossless compression). But there are lossy compressions being used today. The Jpeg coding scheme for storing photo files doesn’t display the exact image, yet we all use it because of its high compression ratio and the discrepancy it generates is not noticeable by human eyes. However, for text messages we normally demand lossless compressions.  A uniquely decodable scheme in variable-length codes ensures that codewords can be recognized unambiguously so that the decoding process is the exact inverse of the encoding process. A prefix-free coding scheme is always uniquely decodable.  Please note that some authors use the term “prefix codes” for the prefix-free coding method. 
 
Exercise: The source consists of  A  (coded as 1),  B (coded as 10), C (coded as 000), and D (coded as 100).   Please decode the message  101100000101100.
Is this coding scheme uniquely decodable?   Is it also prefix-free?

Huffman Coding Method

David Huffman, a student in MIT published the paper "A Method for the Construction of Minimum-Redundancy Codes" in 1952, in which he proposed a procedure to construct a prefix-free coding scheme to minimize the average code length.

The first step of this procedure is to arrange these symbols in the decreasing order of their probabilities, then repeatedly join two nodes (or symbols) with the smallest probabilities to form a new node with the sum of the probabilities just joined. Assign ‘1’ to one branch (the lower one in following example)  and ‘0’ to the other branch (upper one).  When we reach the highest level that has the probability of 1 then we start constructing the code based on the 1 and 0 that we have assigned.  Let’s use the following example for illustration.

Example 2:    Symbol/ Probability:      A/0.3  B/0.3  C/0.13  D/0.12  E/0.1  F/0.05

Please note that it has already been arranged in decreasing order of probabilities.

Step 1:  We should first work from the lowest pair E and F, their combined probability is 0.15. We assign ‘0’ to E and ‘1’ to F. When combing two symbols we always assign ‘0’ to the symbol with higher probability (the one above) and ‘1’ to the one with lower probability (the one below).

A/0.3   
B/0.3  
C/0.13   
D/0.12   
E/0.1        ‘0’       
                EF 0.15
F/0.05       ‘1’

Step 2: Do the same to C and D because they have the lowest probability at this point:

A/0.3      
B/0.3       
C/0.13       ‘0’
                CD 0.25
D/0.12      ‘1’
E/0.1          ‘0 ‘      
                EF 0.15
F/0.05       ‘1’


Step 3: Combine  CD with EF to form  CDEF because 0.15 and  0.25 are the lowest pair.

A/0.3      
B/0.3       
C/0.13      ‘0’
                CD 0.25        ‘0’
D/0.12       ‘1’
                                 CDEF 0.4
E/0.1         ‘0 ‘      
                EF 0.15         ‘1’
F/0.05        ‘1’


Step 4: Combine A and B to form  AB, and then combine AB and CDEF:

A/0.3        ‘0’
                AB 0.6            
B/0.3        ‘1’                                         
                                                            0’
                                                       ABCDEF  1
                                                             1’
C/0.13      ‘0’
                CD 0.25        ‘0’
D/0.12       ‘1’
                                 CDEF 0.4
E/0.1         ‘0 ‘      
                EF 0.15         ‘1’
F/0.05        ‘1’


Now we reach the highest level that has the combined probability of 1 so we can start constructing the code. To obtain the code for A we start from the highest level, which is ‘0’ for A and B and ‘1’ for CDEF, then go to next level, which is A itself, and that is also ‘0’.

So A is represented by 00.
B is represented by 01
C is represented by 100
D is represented by 101
E is represented by 110
F is represented by 111.

The average length =  0.3*2 + 0.3*2 + 0.13*3 + 0.12*3 + 0.1*3 + 0.05*3  = 2.4 bits
Certainly this is a prefix-free coding scheme.

When we create this “tree” we start from the lowest probability (the bottom one), and work our way up. The one with highest probability at the top will be handled last. In this way the bottom one with lower probability will always represented by more bits because the number of bits will increase by one each time we move up one level.

The following article shows the frequencies of each of  the 26 English letters based on a sample of 40,000 words:
https://www.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html
In theory you can construct a prefix-free coding scheme for English messages with the lowest average length.

According to Shannon this type of coding scheme is based on the First Order Statistic.
If you want to take into account the conditional probability of a character following another character, then the coding scheme has to be based on Second Order Statistic. We all know that if character “q” appears in a message the next character is almost certain to be “u”.  In the Second Order Statistic table of the previous article that probability is about 0.9949749.  Coding schemes using Second Order Statistic are more complicated and we are not going to discuss further.

In the previous two examples we notice that the number of bits required for coding a symbol is inversely related to the probability of that symbol.  In example 1 we have

              s1      s2     s3      s4
p            ½       ¼      1/8    1/8
length     1        2       3       3

Since log 2 = 1, log 4= 2 and log 8 = 3, in this example we have  log 1/p = codeword length for the symbol,  and the average length = p1 * log 1/p1 + p2 * log 1/p2 + p3 * log 1/p3 + p4 * log 1/p4   = - ( p1 * log p1 + p2 * log p2 + p3 * log p3 + p4 * log p4 )  = 1.75 = average length.

The amount  - (  p1 * log p1 + p2 * log p2 + p3 * log p3 + p4 * log p4 ) is called the entropy of the source.

So in example 1 the average length of our coding scheme equals to the entropy of the source.

However, in example 2

              A        B       C       D       E       F
p           0.3       0.3    0.13   0.12   0.1   0.05
length     2        2        3       3         3       3

Entropy of the source =
-(  p1 * log p1 + p2 * log p2 + p3 * log p3 + p4 * log p4 +  p5 * log p5 + p6 * log p6)
=  - (.3*log .3 + .3*log .3 + .13 *log .13 + . 12*log .12 + .1 * log .1 + .05 * log .05)
= 2.34018,   which is not the same as the average length of 2.4.

In general, the average length of any uniquely decodable coding scheme must
be greater than or equal to the entropy of the source. This is Shannon’s Source Coding Theorem of 1948.

When the average length of a uniquely decodable coding scheme equals to the entropy of the source, the coding is optimal. So the coding scheme in Example 1 is optimal.
So far we are talking about binary coding, the bits we use are either 0 or 1. Now we temporaryly lift such a restriction.
Suppose we have a source of n symbols s1, s2, …sn and each of them are coded by a string of alphabets selected from a pool of r distinct alphabets (which is similar to our 0 and 1). If there exists a uniquely decodable coding scheme with codeword lengths  l1, l 2, …. ,l n, then


1/+ 1/+ .....+ 1/ <= 1

 Conversely, given a set of numbers l1, l 2, …., l n  that satisfy this inequality, there exists a uniquely decodable coding scheme over a set of  r alphabets with these codework lengths.

This is the famous Kraft-McMillan Inequality in Coding theory (1956).

In example 1 we have  r = 2  and codeword lengths 1, 2, 3, 3.
and  ½ + 1 / (2^2)  + 1/ (2^3) + 1/(2^3) = 1

In example 2  we have r = 2 and codeword lengths 2, 2, 3, 3, 3,3
  and 1 / (2^2)  + 1/ (2^2) + 1/(2^3) + 1/(2^3) + 1/(2^3) + 1/(2^3) = 1

When the calculated number at the left side of the inequality reaches 1 we know that there is almost no room for improvement in terms of codeword lengths. If we shorten the length for any codeword then the calculated number will go beyond 1.

The Hullman coding method is still being used today. Part of the Jpeg coding scheme is based on Hullman method:







枫丹白露

                      

那雷诺SUV一个急转弯把我从睡梦中摇醒了, 昨天坐了七个小时飞机到巴黎,然后立刻去普罗万(Provins)这个中世纪城镇参观,难怪我今早上车不久就睡着。我望了司机倪君一眼,他一边吃零食一边开车,毫无倦意。人家年轻你十岁就是不同!

路两旁高大的树木引起我的注意,我们在这青葱翠绿的森林已经走了好一段时间了,这是什么地方?

枫丹白露森林!我突然兴奋地呼叫,睁大眼睛往窗外望,似乎在寻找拿破仑和约瑟芬的踪影.

 “你说这是Fontainebleau倪君望一下他的GPS, “应该到了。

对中国人来说这个地方首先出现在徐志摩1925年的《巴黎的鳞爪》,他当时翻译为芳丹薄罗。后来朱自清在他的《欧游杂记》里把 Fontainebleau 称之为枫丹白露,这是他对枫丹白露树林的描述:

 “坐着小马车在里面走,幽静如远古的时代。太阳光把树叶子照得透明,却只一圈儿一点儿地洒到地上。路两旁的树有时候太茂盛了,枝叶交错成一座拱门,低低的,远看去好像拱门那面另有一界。林子里下大雨,那一片沙沙沙沙的声音,像潮水,会把你心上的东西冲洗个干净。

枫丹白露森林和枫丹白露宫殿差不多齐名, 随着时间的推移,这里从皇家猎场演变成巴黎市民的郊游之地.

枫丹白露市区不大,是个环绕着皇宫而建的小城。我们在靠皇宫门口附近的一个地下停车场泊好车, 进了皇宫大门才知道宫殿内部正在维修, 游客只能在户外溜哒,好处是不收门票。




枫丹白露宫的兴建始于1137年,当时路易七世下令在此修建城堡,后经历代君王的扩建和修缮,使枫丹白露宫成为一座富丽堂皇的行宫。

Fontainebleau 之取名和当地的一个泉源 fountain de Bliaud 有关。16世纪时 Francis I 把此宫殿大规模地扩大,还把意大利文艺复兴时期的风格和法国传统艺术融合在一起。这种风格后来被称为枫丹白露派

亨利二世, 亨利四世, 路易十四, 路易十五, 路易十六等法国帝王都曾在此居住过。有的在此长住,有的仅把它作为打猎的行宫。王室的婚丧大典也常在这里举行。

枫丹白露见证了许多历史上的重大事件:

1685年路易十四在此頒布了楓丹白露令”(Edict of Fontainebleau), 又稱為廢除南特敕令。路易十四認為要獲得無上的權力,就必須統一法國人的宗教信仰,因此他推翻了祖父亨利四世所頒布的南特敕令”.   南特敕令承認了法國新教徒(Protestants of France, also known as Huguenots) 的信仰自由,並在法律上享有和公民同等的權利, 其实就是说和天主教徒有同等的權利。而這條敕令也是世界近代史上第一个有關宗教寬容的敕令。南特敕令被路易十四廢除后法国新教徒四处逃亡,很大的一部分跑到英国,对英国后来新教的壮大影响巨大。

1812年至1814年,教皇庇护七世被拿破仑囚禁在这里。

1814年,拿破仑被迫在这里签字让位,并对其近卫军团发表了著名的告别演说。
19406月法国投降,希特勒在这里举行庆功宴并设立占领军指挥部。
1945年-1965年,北大西洋公约组织军事总部设于此,枫丹白露宫墙外至今还残留有“NATO”的标记。

从中世纪封建时代到拿破仑三世,到希特勒的庆功宴”, 法国盛衰交错的历史在枫丹白露宫这个舞台上轮番上演。从来没有哪一座法国宫殿像枫丹白露宫这样,经历了那么多帝王的出生, 登基, 结婚和死去,  经历了那么多历史沧桑。但不知道为什么,我总觉得枫丹白露宫是属于拿破仑的。这座古老的宫殿见证了他一生的荣辱与悲欢.

拿破仑雾月政变奪得大權后,枫丹白露成了他的总督府。我看过一部有关拿破仑的法国电影,里面说到拿破仑奪權后和约瑟芬游览一座宫殿,约瑟芬问他可不可以在那里住几天?拿破仑反问为什么不可以,  就搬进来住了。我想那就是枫丹白露宫。

1804年,已经登上权力顶峰的拿破仑将枫丹白露宫列为自己的第一皇宫。他要在这里接见前来为自己加冕的教皇庇护七世。大批的家具从巴黎运来,搞内部装修的工匠多达800人。
1804112日,教皇庇护七世一行从罗马出发前往巴黎。在此之前就算是查理曼大帝也得自己亲赴罗马接受教皇加冕。而今教皇却要亲自到法国来为拿破仑加冕,可见拿破仑的权倾天下之势。

1125日一点半钟左右,教皇的车队驶进枫丹白露。拿破仑既没有行跪礼,也没有行吻手礼。教皇受到这种对待却敢怒不敢言。教皇此行除了替拿破仑加冕, 还得替拿破仑和约瑟芬补办了婚礼的宗教仪式。从此约瑟芬正式成为法国和枫丹白露宫的女主人。

关于约瑟芬的故事太多,绝对八卦,这里不提也罢。除了他在军事上的成就外拿破仑还进行了多項政治, 教育, 司法, 行政, 立法, 經濟方面的重大改革,其中有至今天依然有重要影響的  “拿破崙法典。他手下奇人异士甚多,著名数学大师傅里叶(Fourier) 也在他麾下, 曾当过他的埃及总督。


1814年春天,拿破仑在枫丹白露宫获悉他的精锐之师被欧洲联军击垮,巴黎宣布战败投降。420日,他离开枫丹白露踏上流放之路。临行前,他在枫丹白露的庭院中与近卫军老兵告别。 

他慢慢走下马蹄形台阶,缓步来到队伍前面,一阵沉默后,他终于开口说话:

我親愛的戰友們:你們一定要多保重。這20年來我們一起生活你們為我所做一切讓我別無他求....


請不要為我的生命哀嘆。只要我知道你們都快樂我也會快樂。我可能會被判死刑,但如果我能幸存下來我很愿意將你們的榮譽發揚光大也將會把我們獲得的偉大成就記錄成冊…..”

这番话,把这批跟着他征战多年的老兵们感动不已。他们饱含泪水注视着拿破仑,看着他头也不回地登车而去……

从这天起,这个广场有了一个新名字:告别广场。
Napoleon Bonaparte's Farewell to the Old Guard:


My dear comrade-in-arms:

 You will have to cherish yourselves. In the past twenty years, we have lived together. What you have done for me will make me not ask for more from you all. I often find you are always advancing on the glorious road. It is you that made the power of Europe join together in order to be strong enough to fight against us.

Some of my generals are not loyal to their duties and to France. France has still more things to do. I wish I should rebel again with your brave comrade-in-arms who is loyal to me. But the France Parliament would not allow and would not agree. So, please devote yourselves to your new king and obey your new commander. Please do not give up our lovely motherland.

Please do not feel sorry for my life. Only if I know you all happy, I will be happy too. I may be sentenced to death. But if I should survive, I would be willing happily to promote your glories and I would write down and record the great achievements we make.

I cannot take you all in my arms instead I would like to embrace your generals. Come up to me, my little general. I will take you in my arms tight. Please give me the eagle flag. I want to embrace her. And I still hope that my kissing you will respond in your recent generations. I have to say goodbye to you, my children. I will always bless you all and I hope you will not forget me at all.



The above picture is from a reenactment.
一代英雄黯然退出世界舞台,枫丹白露宫近八百年的法国宫廷历史也随之落幕。拿破仑在流放之时还念念不忘他的枫丹白露宫,他说那里是最理想的国王寓所,一座划时代的建筑。

在拿破仑走后,枫丹白露宫终于安静了下来。从十九世纪中页,这里来了许多流浪画家,挤满了枫丹白露的两家旅店,在这个宁静庄严的宫殿周围,画着日出和日落,森林和小溪。枫丹白露宫的一草一木、一砖一瓦,在画家幽雅迷人的构思中去诉说昔日的辉煌和惆怅。

进入21世纪,枫丹白露宫迎来了新的游客, 他们的笑声在广场中回荡,在拿破仑走过的马蹄形台阶,两名东方时髦少女侧着头展示V型的手势,笑容可掬。


此人也想在拿破仑站过的地方拍照。






当日不能进入宫内参观是一大损失,据说里面美輪美奐宫内的主要景点有舞厅、会议厅、狄安娜壁画长廊、百盘廊、王后沙龙、国王卫队厅、王后卧室和教皇卧室、国王办公室等等。1808年,拿破仑将历代国王的卧室改为皇帝御座厅。宫内有中国馆,由拿破仑三世时的欧也妮皇后主持建造,馆内陈列着中国明清时期的古画、金玉首饰、牙雕、玉雕、景泰蓝佛塔等上千件艺术珍品,这些藏品大多来自圆明园,为法军统帅蒙托邦献给拿破仑三世帝后的战利品.

在枫丹白露停留了大半天后我们就起程了。 下一站 Auxerre  -  Fourier 的故乡。

***********
注: 此文有一小段文字抄了internet 上的一篇文字。