an overview or tutorial of the speech coding techniques and vocoders used the GSM system
If digitised in a linear fashion, the speech would occupy a far greater bandwidth than any cellular system and in this case the GSM system would be able to accommodate.
If digitised in a linear fashion, the speech would occupy a far greater bandwidth than any cellular system and in this case the GSM system would be able to accommodate.
To overcome this, a variety of voice coding systems or vocoders are used. These systems involve analysing the incoming data that represents the speech and then performing a variety of actions upon it to reduce the data rate. At the receiving end the reverse process is undertaken to re-constitute the speech data so that it can be understood. In GSM a variety of vocoders are used, including LPC-RPE, EFR, etc as described in the following paragraphs.
The vocoder that was originally used in the GSM system was the LPC-RPE (Linear Prediction Coding with Regular Pulse Excitation) vocoder. This vocoder took each 20 mS block of speech and then represented it using just 260 bits. This actually equates to a data rate of 13 kbps.
In GSM it is recognised that some bits are more important than others. If some bits are missed or corrupted, it is more important to the voice quality than others. Accordingly the different bits are classified:
Class Ia 50 bits - most important and sensitive to bit errors
Class Ib 132 bits - moderately sensitive to bit errors
Class II 78 bits - least sensitive to bit errors
The 50 Class 1a bits are given a 3 bit Cyclic Redundancy Code (CRC) so that errors can be detected. This makes a total length of 53 bits. If there are any errors, the frame is not used, and it is discarded. In its place a version of the previously correctly received frame is used. These 53 bits, together with the 132 Class Ib bits with a 4 bit tail sequence, are entered into a 1/2 rate convolutional encoder. The total length is 189 bits. The encoder encodes each of the bits that enter as two bits, the output also being dependent upon a combination of the previous 4 input bits. As a result the output from the convolutional encoder consists of 378 bits. The remaining 78 Class II bits are considered the least sensitive to errors and they are not protected and simply added to the data. In this way every 20 ms speech sample generates a total of 456 bits. Accordingly the overall bit rate is 22.8 kbps. Once in this format the data is interleaved to add further protection against interference and noise.
The 456 bits output by the convolutional encoder are divided into 8 blocks of 57 bits, and these blocks are transmitted in eight consecutive time-slots, i.e. a total of four bursts as each burst takes two sets of data.
Later another vocoder called the Enhanced Full Rate (EFR) vocoder was added in response to the poor quality perceived by the users. This new vocoder gave much better sound quality and was adopted by GSM. Using the ACELP (Algebraic Code Excitation Linear Prediction) compression technology it gave a significant improvement in quality over the original LPC-RPE encoder. It became possible as the processing power that was available increased in mobile phones as a result of higher levels of processing power combined with their lower current consumption.
There is also a half rate vocoder. Although this gives much inferior voice quality, it does allow for an increase in network capacity. It is used in some instances when network loading is very high to accommodate all the calls.
The vocoder that was originally used in the GSM system was the LPC-RPE (Linear Prediction Coding with Regular Pulse Excitation) vocoder. This vocoder took each 20 mS block of speech and then represented it using just 260 bits. This actually equates to a data rate of 13 kbps.
In GSM it is recognised that some bits are more important than others. If some bits are missed or corrupted, it is more important to the voice quality than others. Accordingly the different bits are classified:
Class Ia 50 bits - most important and sensitive to bit errors
Class Ib 132 bits - moderately sensitive to bit errors
Class II 78 bits - least sensitive to bit errors
The 50 Class 1a bits are given a 3 bit Cyclic Redundancy Code (CRC) so that errors can be detected. This makes a total length of 53 bits. If there are any errors, the frame is not used, and it is discarded. In its place a version of the previously correctly received frame is used. These 53 bits, together with the 132 Class Ib bits with a 4 bit tail sequence, are entered into a 1/2 rate convolutional encoder. The total length is 189 bits. The encoder encodes each of the bits that enter as two bits, the output also being dependent upon a combination of the previous 4 input bits. As a result the output from the convolutional encoder consists of 378 bits. The remaining 78 Class II bits are considered the least sensitive to errors and they are not protected and simply added to the data. In this way every 20 ms speech sample generates a total of 456 bits. Accordingly the overall bit rate is 22.8 kbps. Once in this format the data is interleaved to add further protection against interference and noise.
The 456 bits output by the convolutional encoder are divided into 8 blocks of 57 bits, and these blocks are transmitted in eight consecutive time-slots, i.e. a total of four bursts as each burst takes two sets of data.
Later another vocoder called the Enhanced Full Rate (EFR) vocoder was added in response to the poor quality perceived by the users. This new vocoder gave much better sound quality and was adopted by GSM. Using the ACELP (Algebraic Code Excitation Linear Prediction) compression technology it gave a significant improvement in quality over the original LPC-RPE encoder. It became possible as the processing power that was available increased in mobile phones as a result of higher levels of processing power combined with their lower current consumption.
There is also a half rate vocoder. Although this gives much inferior voice quality, it does allow for an increase in network capacity. It is used in some instances when network loading is very high to accommodate all the calls.
No comments :
Post a Comment