Global Sources
EE Times-India
Stay in touch with EE Times India
EE Times-India > Controls/MCUs

Arming an embedded's audio processing with ARM NEON

Posted: 16 Mar 2010     Print Version  Bookmark and Share

Keywords:audio decoding  power consumption  ARM core 

The NEON engine has its own 10 stage pipeline that begins at the end ARM integer pipeline. Since all mispredicts and exceptions have been resolved in the ARM integer unit, once an instruction has been issued to the NEON engine it must be completed as it cannot generate exceptions. NEON instructions are issued and retired in-order. A data processing instruction is either a NEON integer instruction or a NEON floating-point instruction.

The Cortex-A8 NEON unit does not parallel issue two data-processing instructions to avoid the area overhead with duplicating the data-processing functional blocks, and to avoid timing critical paths and complexity overhead associated with the muxing of the read and write register ports.

The NEON integer data path consists of three pipelines: an integer multiply/accumulate pipeline (MAC), an integer Shift pipeline and an integer ALU pipeline. A load-store/permute pipeline is responsible for all NEON load/stores, data transfers to/from the integer unit, and data permute operations such as interleave and de-interleave. The NEON floating-point (NFP) data path has two main pipelines: a multiply pipeline and an add pipeline.

Audio processing
Nowadays, WMA, MP3, AAC are the mainstream of audio compression algorithm (figure 1). From the applications and experiments of audio decoding and playback, it is found that the complexity is high and they take up lots of clock cycles.

image name

Figure 1: Shown is the flow diagram of an MP3 decoder.

Especially, in the application of audio/video decoding, since the video decoding algorithm take up the large part of processor resource, limited source remains for audio decoding. Thus, it's essential to improve the efficiency of audio decoding in such application.

The MP3 is one of the most common audio compression algorithms, which is used in audio files and compressed audio/video streams. So, MP3 decoding is taken as the example to describe the NEON technology application in audio processing. The complexity of the MP3 decoder modules is listed in table 1 below.

image name

Table 1: A list of MP3 decoder modules.

The Huffman decode, IMDCT and sub-band synthesis filter modules take up the most of the computing time, which is about 90 per cent of the whole computing time. Hence, if the computing time of these three parts is reduced, the efficiency of the whole MP3 decoder can be significantly improved.

Sub-band synthesis filter takes up about 50 per cent computation in the MP3 decoder algorithm. Hence, sub-band synthesis filter is to be analysed first. The filter contains matrix operation and PCM output window filter. The formula of matrix operation is:

image name

The algorithm mainly includes multiply-add operation. ARM assembly code can be summarised as:

image name

Since ARM multiply instruction (MUL) has to use pipeline 0, statement (1) and (2) cannot make the pipeline operation. The inputs of statement (3) are the output of statement (1) and (2).

So the three statements should execute one by one. Furthermore, each MUL instruction occupies two cycles. One multiply and one add operation need five cycles when running on ARM.

 First Page Previous Page 1 • 2 • 3 • 4 Next Page Last Page

Comment on "Arming an embedded's audio processin..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top