Digital signal processors. DSP - personal computer: cannot be

Digital signal processors (DSP) are a special type of microprocessor technology designed to operate in real time. Areas of application of DSP:

Digital filtering signals,

Optimal processing,computation correlation functions,

Spectral analysis signals,

Encoding and decoding of information,

Speech recognition and synthesis, music synthesis and processing,

Image processing,

Computer graphics, image synthesis,

Measuring technology.

The main distinguishing feature of DSP is the large volume of calculations performed in real time. This determines the following distinctive features of the DSP:

The use of extended Harvard architecture - separate command and data memories with independent buses, which allows for one clock cycle of the internal frequency of the crystal to fetch commands and execute them,

Short commands, implemented in conveyor devices, determine the RISC architecture of the DSP,

Mandatory presence of a parallel hardware multiplier that executes multiplication commands in one clock cycle of the internal frequency of the crystal,

Availability of special signal processing commands. For example, in the TMS320 family of processors from Texas Instruments there is a dmov command, which adds a new signal sample to the sample, shifting the remaining samples by one time slice, the LTD command loads the multiplicand into the multiplier register, shifts the signal samples and adds the result of the previous multiplication with the contents of the accumulator.

During its development history since the early 80s of the last century, several generations of DSPs have changed, but a number of DSPs of previous generations in modern versions continue to be produced due to their successful architecture. The first generation DSP TMS32010 was developed by Texas Instruments in 1982. This 16-bit microprocessor with a performance of 5 million instructions per second (MIPS) had an internal RAM of 144-256 words, ROM of 1.5 - 4K words. The ALU and battery were 32-bit, the hardware multiplier was 16x16 - the result of 32-bit, there were input-output ports.

Second generation DSPs appeared in the mid-80s. This is TMS32020, a TMS320C25 CMOS microprocessor with 10 MIPS performance. The most interesting are the DSP56000 and DSP56001 DSPs with performance of 10 and 25 MIPS, respectively. They were developed by Motorola. These are the only 24-bit DSPs. Modifications of DSPs of this architecture are still being produced. The DSP56001 architecture is shown in Fig. 7.1. The processor has an extended Harvard architecture. The X and Y data RAMs have separate address buses YA, XA and data XD, YD. In addition, a separate PA address bus is used to address the boot ROM and program RAM, which also have a separate PD data bus. The GD data bus is used to download programs from the host computer via a synchronous serial interface. In addition, GD is used to service interrupts from the programmable interrupt controller. Switching units can transfer data and addresses between these buses, and external bus switching units allow any of the buses to be output from the chip. The control signal generator generates external control signals. An external quartz is connected to the clock generator and it clocks the entire circuit.

Rice. 7.1. DSP56001 architecture

The X and Y ROMs contain sine and cosine samples, which allows quadrature reception and processing. Currently, such a DSP is most often used in sound processing and synthesis.

Third generation DSPs appeared at the turn of the 80s – 90s. These are TMS320C30 - TI, DSP96002 - Motorola, DSP32C AT&T Microelectronics. The features of these processors are that they are 32-bit, can perform integer and floating-point calculations in one ALU, have an extended Harvard architecture, the presence of timers and input/output ports. A modification of the TMS320C30 DSP is still in production - these are TMS320VC33-120 and -150. TMS320VC33-150 performance 150 million floating point operations per second (MFLOPS).

Its main parameters:

34K 32-bit word RAM with two parallel access buses,

Clock generator with the ability to multiply the internal frequency,

32-bit floating point core,

4 external device sampling strobes,

Interrupt controller

Bootloader,

8 40-bit registers increased accuracy,

One serial port,

Two timers

direct memory access coprocessor (DAM),

144 pin LQFP package.

DSP fourth generation were developed in the 90s. Here, the DSP was divided into relatively cheap 16-bit fixed-point DSPs and expensive high-performance 32-40-bit floating-point DSPs. Fixed-point DSPs began to be used in communications equipment, modems, audio multimedia devices, signal processing; among the companies developing such DSPs, the Analog Devices ADSP family is known. Floating point DSP - for processing broadband signals, images, in computer graphics. A typical representative of a floating point DSP is the TMS320C40 – TI. The architecture of this DSP is shown in Fig. 7.2. The performance of this processor is 275 MIPS. Main feature Its architecture is the presence of an input-output bus via direct memory access with a coprocessor. It is designed for high-speed exchange through communication ports 0 – 5 with other processors forming the MIMD architecture. Each port has 8 data bits and 4 accompanying signals with a throughput of 20 Mbps.

Rice. 7.2. Architecture TMS320C40

In Fig. Figure 7.3 shows a variant of the topology of processor connections.

Rice. 7.3. Topology of DSP connections

Fifth and sixth generation DSPs were developed at the beginning of the 21st century. Here it is worth noting the developments of TI - processors of the C5000 and C6000 families. The C5000 family is a family of low-cost fixed-point DSPs with high speed and low consumption of 0.9 V, and the C6000 is a fixed-point and floating-point DSP with performance up to 1200 MFLOPS. Some parameters of the TMS320C55x family:

Consumption 0.05 mV/MIPS,

Performance 140 – 800 MIPS, including multiplication operations,

Variable command length 8 – 48 bits,

2 multipliers, 2 ALUs, 4 batteries,

4 data registers,

The command selection is 32 bits each.

Currently, DSPs are used in conjunction with programmable logic. Tools for debugging equipment based on DSP and programmable logic are divided into two categories:

Software support for generating and debugging machine code for signal processing in DSPs (tools code generation),

Software and hardware support for integrating the DSP with the target hardware of the device being developed and a means for debugging the processing program with the hardware in real time.

These two types of debugging are usually performed by different developers with overlapping time, which speeds up the hardware design and manufacturing process. In Fig. 7.4 shows the structure of the process of debugging equipment with DSP and programmable logic.

Rice. 7.4. The process of developing equipment on DSP and FPGA

In the process of debugging a DSP program, there are returns and corrections of the program, but there are also returns to change the logic embedded in the programmable logic of the hardware. Thus, the debugging process when using DSPs and FPGAs turns out to be significantly more flexible and allows you to change both software and hardware.

What are the features of the development of equipment based on DSP and programmable logic?

1. Development of various parallel processing architectures both in DSPs and in programmable logic.

2. Development of appropriate debugging tools based on emulators, simulators and testing interfaces similar to JTAG.

3. Combination of DSP and programmable logic within one chip, for example in TMS320C54x.

4. Development of the efficiency of optimizing compilers of high-level languages such as C to a level so that assembly inserts into programs are not required.

5. Development of heterogeneous hardware systems on a single chip, including various types of microprocessors, including DSPs, and equipping them with parallel multiprocessors operating systems real time.

References

1. Super computer. Hardware and software organization / Ed. S. Fernbach: Trans. from English – M.: Radio and Communications, 1991.

2. Hockney R., Jessope K. Parallel computers. Architecture, programming and algorithms: Trans. from English-M.: Radio and Communications, 1986.

3. Kougi P.M. Architecture of conveyor computers: Per. from English-M.: Radio and Communications, 1985.

4. Parallel Computing/ Ed. G. Rodriga: Per. from English-M.: Nauka, 1986.

5. Pukhalsky G.I., Novoseltseva T.Ya. Design of discrete devices on integrated circuits: Handbook. – M.: Radio and Communications, 1990.

6. Steshenko V.B. ALTERA FPGA: design of signal processing devices. - M: DODEKA, 2000.

7. Knyshev D.A. FPGA company “ XILINX”: description of the structure of the main families. - M: DODEKA-XXI, 2001.

8. Sikarev A.A., Lebedev O.N. Microelectronic devices for generating and processing complex signals. - M.: Radio and communication, 1983.

Today, the conversations that were popular in the mid-eighties among electronics engineers about the degree to which Soviet electronics lagged behind Western ones have already been forgotten. Then the degree of development of electronics was judged by the development of processors to personal computers. The Iron Curtain was doing its job; at that time we could not even imagine that Soviet electronics lagged behind Western ones not by a year or two, but forever.

Ordinary Soviet engineers, not allowed to attend the world's largest professional seminars on electronics and not privy to the secrets discovered by the KGB, could judge the development of electronics from the "Time" program and from Hollywood films ten years ago. After the excitement about the electronic gadgets of James Bond, the conclusion was made that: these are all special effects of cinema; everything is created on specialized microprocessors (it was never specified which ones); and that “where we need and who needs it, we have cooler things.” After such profound conclusions, Soviet engineers, with a new creative impulse in their research institutes, continued to create masterpieces on 155 TTL microcircuits, or, the closest to the military-industrial complex, on the 133 series.

To my shame, I must admit that I also, until about the mid-nineties, implied that specialized processors were something completely complex and unimaginable. But, fortunately, times have changed, and the first specialized processors that I got acquainted with were digital signal processors or signal processors (DSP, Digital Signal Processor).

Signal processors emerged as a consequence of the development of digital technologies, which were increasingly being introduced into traditional “analog” applications: radio and wired connection, video and audio equipment, measuring and household appliances. Purely digital devices also required the creation of specialized processors for signal processing: modems, disk drives, data processing systems, etc. The main distinguishing feature of DSPs from conventional microprocessors is their maximum adaptability to solving digital signal processing problems. These are precisely “specialized” controllers, the specialization of which lies in such an architecture and command system that would allow optimal signal conversion and filtering operations to be performed in real time. Conventional microcontrollers either do not provide commands that perform such operations at all, or their operation is very slow, which makes it impossible to use them in speed-critical processes. Therefore, the use of traditional microprocessors led, on the one hand, to an unjustified complication and increase in the cost of the device circuit design, and on the other, to an ineffective, one-sided use of the controller’s capabilities. The DSPs were called upon to solve this contradiction and coped with their task perfectly.

Signal processors appeared in the early 80s. The first widely known signal processor was the TMS32010 DSP released in 1982 by Texas Instruments, with a performance of several MIPS (million instructions per second), created using 1.2 micron technology. Following Texas Instruments, other companies began to produce DSPs. Currently, Texas Instruments is the leader in the production of DSPs, it owns about half of the market for these controllers. The second largest manufacturer of DSPs is Lucent Technologies, which produces about a third of these devices. Rounding out the top four are Analog Devices and Motorola, which have approximately equal market shares and together produce approximately a quarter of all DSPs. The remaining manufacturers, although among them there are such well-known companies as Samsung, Zilog, Atmel and others, account for the remaining 5-6 percent of the signal processor market.

It is clear that the trendsetters among manufacturers are the leading companies in this field and, first of all, Texas Instruments. The policies of leading companies in the production and promotion of signal processors vary significantly.

Texas Instruments aims to produce the widest range possible, capable of covering all possible processor applications with ever-increasing performance. Currently, the performance of signal processors reaches up to 8800 MIPS, and they are produced using technology from 0.65 microns to 0.1 microns. The clock frequency reaches 1.1 GHz.

Lucent Technologies focuses on large manufacturers of end equipment and offers its products through a distribution network, without resorting to a wide advertising campaign. The company specializes in DSP for telecommunications equipment, in particular, in such a currently promising direction as the creation of cellular communication stations.

Analog Devices, on the contrary, pursues an active marketing policy and advertising campaign, as evidenced by the abbreviation in the name of the DSP of this company SHARK and Tiger SHARK (shark and tiger shark). In the technical field, the processors of this company are optimized for energy consumption and for building multiprocessor systems.

Motorola distributes its processors widely through its own extensive distribution network. In the DSP architecture, Motorola was the first to take the path of creating simultaneously a signal processor and a classic microcontroller on one chip, which operate as one system, which greatly simplifies the life of equipment developers by simplifying the circuit design.

The architecture and manufacturing technologies of DSPs have already been developed quite well, however, the requirements for stability of operation and accuracy of DSP calculations lead to the fact that it is not possible to get rid of the high complexity of functional devices that perform data processing (especially in floating point format), which does not significantly reduce costs in the production of processors. The cost of DSP can range from 2 to 180 or more dollars per unit.

Characteristics of DSP processors

Signal processors feature high-speed arithmetic, real-time data transmission and reception, and multiple-access memory architecture.

Any arithmetic operation during execution requires the following elementary operations: selecting operands; performing addition or multiplication; saving the result or repeating it. In addition, the computation process requires delays, sampling values from successive memory cells, and copying data from memory to memory. In signal processors, increasing the speed of performing arithmetic operations is achieved due to: parallel execution of actions, multiple access to memory (fetching two operands and storing the result), the presence of a large number of registers for temporary data storage, hardware implementation special features: implementation of delays, multipliers, ring addressing, etc. Signal processors also implement hardware support for program loops, ring buffers, and the ability to simultaneously retrieve several operands from memory during a command execution cycle.

The main advantage and difference between DSPs and general-purpose microprocessors is that the processor interacts with many data sources in the real world. The processor can receive and transmit data in real time without interrupting internal mathematical operations. For these purposes, analog-to-digital and digital-to-analog converters, generators, decoders and other devices for direct “communication” with the outside world are built directly into the chip.

The construction of multiple access memory is achieved mainly through the use of Harvard architecture. Harvard architecture refers to an architecture that has two physically separate data buses, allowing two memory accesses to occur simultaneously. But this alone is not enough to perform DSP operations, especially when using two operands in an instruction. Therefore, the Harvard architecture adds cache memory to store those instructions that will be used again. When using cache memory, the address bus and data bus remain free, which makes it possible to fetch two operands. This extension - Harvard architecture plus cache - is called extended Harvard architecture or SHARC (Super Harvard ARChitecture).

Let's look at the specific characteristics of the DSP using the DSP568xx family Motorola, which combine the features of digital signal processors and universal microcontrollers.

The DSP56800 core is a programmable 16-bit CMOS processor designed to perform real-time digital signal processing and computational tasks, and consists of four functional units: control, address generation, ALU, and bit processing. To increase productivity, operations on devices are performed in parallel. Each of the devices can function independently and simultaneously with three others, because has its own set of registers and control logic. The core implements the simultaneous execution of several actions: the control device selects the first instruction, the address generation device generates their addresses for the second instruction, and the ALU multiplies the third instruction. Combined transfers and operations are widely used.

The built-in memory may contain (for a family):

Flash program memory up to 60K

Flash data memory up to 8K

RAM programs up to 2K

RAM data up to 4K

2K download program flash memory

Implemented on family microchips large number peripherals: PWM generators, 12-bit simultaneous sampling ADCs, quadrature decoders, four-channel timers, CAN interface controllers, two-wire serial communication interfaces, serial interfaces, a programmable oscillator with a PLL to generate the clock frequency of the DSP core, etc.

General characteristics

Performance 40 MIPS at a clock frequency of 80 MHz and a supply voltage of 2.7: 3.6 V;

Single-ended parallel 16x16 multiplier-adder;

Two 36-bit accumulators, including expansion bits;

Single-cycle 16-bit rotary shifter;

Hardware implementation of DO and REP commands;

Three internal 16-bit data buses and three 16-bit address buses;

One 16-bit external interface bus;

A stack of subroutines and interrupts that has no depth limitation.

Chips of the DSP568xx family are intended for use in low-cost devices, household appliances that require low cost and do not require ultra-high parameters: wired and wireless modems, systems wireless transmission digital messages, digital telephone answering machines, digital cameras, specialized and multi-purpose controllers, control devices for servo motors and AC motors.

In general, signal processors have already reached such a stage of their development that they can be used in devices ranging from space stations to children's toys.

I recently saw how unexpected the applications of signal processors can be using the example of a toy. One day, an acquaintance turned to me and asked me to fix a talking doll that his German friends had given to his daughter. The doll, indeed, was wonderful; according to a friend, she understood up to fifty phrases and “consciously” maintained a conversation. In Germany it cost one hundred and fifty marks, which made me think that parents regret more about the breakage of the doll than their child. My daughter loved the doll anyway, especially since before she became mute, it spoke German. Without any hope of success, I set about repairing this doll. I used a file to file away the epoxy resin with which the circuit was filled and, under a thick, thick layer of epoxy, I found half a dozen microcircuit packages, the central one of which was the DSP for the DSP56F... the last digits, unfortunately, were irretrievably erased. It was never possible to make the doll talk, and how much intelligence it added signal processor, alas, I still haven’t determined it. As it turned out later, the eldest son of my friends, in order to make the doll scream louder, first connected the voltage to it instead of 3 V, 4.5 volts, which was not yet “lethal”, and although the toy wheezed, it screamed, but after 220 V ... . Hence the first conclusion - high technology good, but not always and not everywhere. The second conclusion is that soon, perhaps, we will be able to see DSP in kitchenware, shoes and clothes, at least there are no technical obstacles to this.

June 27, 2017 at 12:27 pm

Multi-core DSP TMS320C6678. Processor Architecture Overview

Microcontroller programming

This article opens a series of publications dedicated to the TMS320C6678 multi-core digital signal processors. The article gives a general idea of the processor architecture. The article reflects lecture and practical material offered to students as part of advanced training courses under the program “Multi-core digital signal processors C66x from Texas Instruments”, conducted at the Ryazan State Radio Engineering University.

TMS320C66xx digital signal processors are based on the KeyStone architecture and are high-performance multi-core signal processors that work with both fixed and floating point. The KeyStone architecture is a principle for manufacturing multi-core systems on a chip, developed by Texas Instruments, which allows organizing the effective joint operation of a large number of DSP and RISC cores, accelerators and peripheral devices, ensuring sufficient bandwidth internal and external data transfer channels, the basis of which are hardware components: Multicore Navigator (data exchange controller via internal interfaces), TeraNet (internal data transfer bus), Multicore Shared Memory Controller (access controller to shared memory) and HyperLink (interface with external devices at on-chip speed).

The architecture of the TMS320C6678 processor, the highest-performance processor in the TMS320C66xx family, is depicted in Figure 1. The architecture can be broken down into the following main components:

a set of operating kernels (CorePack);
subsystem for working with shared internal and external memory (Memory Subsystem);
peripheral devices;
network coprocessor;
internal forwarding controller (Multicore Navigator);
service hardware modules and internal TeraNet bus.

Figure 1. General architecture processor TMS320C6678

The TMS320C6678 processor operates at a clock frequency of 1.25 GHz. The functioning of the processor is based on a set of C66x CorePack operating cores, the number and composition of which depend on specific model processor. The TMS320C6678 DSP includes 8 DSP-type cores. The core is a basic computing element and includes computational units, sets of registers, a program machine, program and data memory. The memory that is part of the kernel is called local.

In addition to local memory, there is memory common to all cores - shared memory multi-core processor(Multicore Shared Memory - MSM). Shared memory is accessed through the Memory Subsystem, which also includes an interface external memory EMIF for data exchange between the processor and external memory chips.

The network coprocessor increases the efficiency of the processor as part of various types of telecommunication devices, implementing data processing tasks typical for this area in hardware. The coprocessor is based on the Packet Accelerator and the Security Accelerator. The processor specification lists a set of protocols and standards supported by these accelerators.

Peripherals include:

Serial RapidIO (SRIO) version 2.1 – provides data transfer speeds of up to 5 GBaud per line with the number of lines (channels) – up to 4;
PCI Express (PCIe) Gen2 version – provides data transfer speeds of up to 5 GBaud per line with the number of lines (channels) – up to 2;
HyperLink– internal bus interface, which allows you to switch processors built on the KeyStone architecture directly with each other and exchange at on-chip speed; data transfer speed – up to 50 Gbaud;
Gigabit Ethernet (GbE) provides transmission speeds: 10/100/1000 Mbps and is supported by a hardware network communications accelerator (network coprocessor);
EMIF DDR3– external memory interface of DDR3 type; has a 64-bit bus width, providing addressable memory space of up to 8 GB;
EMIF– external memory interface general purpose; has a bus width of 16 bits and can be used to connect 256MB NAND Flash or 16MB NOR Flash;
TSIP (Telecom Serial Ports)– telecommunication serial port; provides transmission speeds of up to 8 Mbit/s per line with the number of lines up to 8;
UART– universal asynchronous serial port;
I2C– internal communication bus;
GPIO– general purpose input/output – 16 pins;
SPI– universal serial interface;
Timers– used to generate periodic events.

Service hardware modules include:

Debug and Trace module– allows debugging tools to access the internal resources of a running processor;
boot ROM – stores the boot program;
hardware semaphore– serves for hardware support of the organization sharing parallel processes to shared resources processor;
power management module– implements dynamic control of the power modes of processor components in order to minimize energy consumption at times when the processor is not operating at full capacity;
PLL circuit– generates internal processor clock frequencies from an external reference clock signal;
Direct Memory Access (EDMA) controller– manages the process of data transfer, unloading the operating cores of the DSP and being an alternative to Multicore Navigator.

The Multicore Navigator is a powerful and efficient hardware module, responsible for arbitrating data transfers between various processor components. Multi-core systems on a chip TMS320C66xx are very complex devices and in order to organize the exchange of information between all components of such a device, a special hardware unit is required. Multicore Navigator allows kernels, peripheral devices, and host devices to not take over the control functions of data exchange. When any processor component needs to send an array of data to another component, it simply tells the controller what to send and where. All functions for the transfer itself and synchronization of the sender and recipient are taken over by Multicore Navigator.

The basis for the functioning of the TMS320C66xx multi-core processor is from the position of high-speed data exchange between all the numerous components of the processor, as well as external modules, serves as the internal TeraNet bus.

The next article will take a closer look at the architecture of the C66x operating core.

1. Multicore Programming Guide / SPRAB27B - August 2012;
2. TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor Data Manual / SPRS691C - February 2012.

Microprocessors are universal digital chips in which a computing unit under program control can perform various actions. As a result, all microprocessors allow you to exchange their maximum speed for the complexity of the implemented algorithm. Place of microprocessors in the classification digital devices shown in Figure 1.

Figure 1. Classification of microprocessors

However, when creating digital devices on microprocessors, the features of the problem being solved are superimposed on the architecture of a particular class of microprocessors. Let's consider the main tasks that have to be solved when processing signals (regardless of the analog or digital implementation of the circuit):

summation of several signals;
signal spectrum transfer;
signal filtering;
calculation of the signal spectrum (fast Fourier transform);
noise-resistant coding (noise suppression for analog communication systems);
personnel formation (only for digital systems communications)
signal scrambling (formation of equal probability of transmitting zeros and ones)

The last three of these types of digital signal processing are performed at low frequencies, so they usually require a small part of the processor's performance to implement them. The greatest performance is required when processing high-frequency signals. This is due to the short time between adjacent signal samples. In the same period of time it is required more simple operations.

Now let's look at the first two problems. When the operation requires one binary sum instruction. Spectrum transfer operation input signal at a given frequency requires a multiplication operation and the operation of forming the next sample of the sinusoidal function. This means that this operation will require more processor performance compared to the previous signal processing operation. Now let's compare the operation of summation and the operation of multiplication. When multiplying two numbers, you need to calculate several partial products and sum them. allows you to perform a multiplication operation in one processor cycle, so the presence of a hardware multiplier is an integral feature of signal processors.

Now let's analyze the signal filtering process. When implementing frequency filters in the time domain, it is necessary to perform a convolution operation. Typical scheme digital filter is shown in Figure 2.

Figure 2. Typical digital filter circuit

The figure clearly shows the sequence of identical sections of the algorithm. You have to repeatedly multiply the digital signal sample by the filter coefficient and sum the result with the previous sum. It should be noted that the adder has a large capacity. For a 16-bit signal processor, the width of the numbers at the output of the multiplier will be thirty-two bits. When summing several numbers, the bit depth of the result also increases. When summing 256 identical numbers, the value of the result will increase by 256 times, which corresponds to an increase in the number of digits by eight digits (2 8 = 256). Therefore, the width of the adder in a 16-bit signal processor will be equal to forty bits (32+8=40).

As a result, we have created one more requirement for the signal processor. The signal processor must contain not just a hardware multiplier, but a data multiplier-accumulator (MAC). Moreover, the multiplication-accumulation operation must be performed in one microprocessor machine cycle. I would like to note the fact that the multiplication-accumulation operation is integral part not only the filtering algorithm, but also (half of the basic butterfly algorithm)

Now let's talk about another method of increasing the speed of a signal processor. IN regular processor A single-bus structure of the processor operating unit is used. The signal processor uses at least . This allows you to simultaneously apply two operands to the input of an arithmetic-logical unit or multiplier-accumulator and write the result to the random access memory.

Another important feature of microprocessors is the way they organize cyclic program execution (the MAC multiplication-accumulation operation when implementing a digital filter or the “butterfly” operation when implementing a fast Fourier transform must be repeated a specified number of times). In a computing microprocessor, a special variable is used to organize a loop - the loop parameter. At the end of the loop, this variable is compared with given value(usually with zero) and the transition to the beginning of the loop is carried out. As a result, the filtering algorithm will look like this:

Generate the address of the next cell of the filter delay line
Read the next input signal sample from the filter delay line
Generate the address of the next filter coefficient
Calculate next filter coefficient
Multiply the input signal sample from the delay line by the filter coefficient (most often over several machine cycles)
Sum the result with the already accumulated sum (generate another signal sample at the filter output)
Change the value of a loop parameter variable
Compare the obtained value with the specified value
Jump to the beginning of a loop or exit from it (usually this is a lengthy procedure that takes several machine cycles)

In signal processors, the organization of the cycle and the formation of the next address of the coefficient and filter sample is carried out in hardware, and therefore does not require additional time. All of these features make it possible to increase the speed of the signal processor without increasing its clock frequency; as a result, the algorithmic speed of the signal processor when performing signal processing operations is many times greater than the speed of a computing microprocessor. Let us list the distinctive properties of the signal processor:

Availability of MAC accumulator with 40-bit adder and accumulator
Availability of hardware number shifter
Availability of hardware loop organization
Availability of two address generators
Three-bus structure of the microprocessor operating unit

Literature:

Along with the article “Features of signal processors” read:

To the beginning

Digital signal processors (Lecture)

LECTURE PLAN

1. General structure of digital signal processing

2. Structure of digital signal processors

3. Key indicators of digital signal processors

4. Major manufacturers of signal processors

5. Hardware implementation

1. General structure digital signal processing

Digital signal processors (DSP) or their equivalent name - digital signal processors (DSP or simply signal processors), English abbreviation - DSP (Digital Signal Processor), are designed to implement digital signal processing (DSP) algorithms and real-time control systems.

Scheme of digital processing of analog signals.

The encoder generates a sequence of numbers corresponding to the analog signal being processed.

Based on the received signal, the decoder generates analog signal, that is, it produces transformations that are reverse to those occurring in the encoder.

The system input receives a signal x(t) of limited duration. Due to the finite duration of the signal, its spectrum is infinite.

Analog-to-digital conversion is carried out in two stages: time sampling and level quantization.

Sampling is a procedure for taking instantaneous values of the signal x (t) at equal time intervals T. Instantaneous values x ( n T) are called samples, time T is the sampling period, and n - indicates serial number countdown. The more often samples are taken, the shorter the sampling period T, the more accurate the sequence of samples x ( n T) will represent the original signal x(t).

The sampling period T determines the sampling frequency:

f d=;T =

From the formulas it is clear that the smaller T, the higher the sampling frequency f d, and the higher the sampling frequency, the more difficult it is for the computer to perform a large number of operations on samples at the rate at which they are received for processing, and the more complex the device must be. Thus, the accuracy of signal representation requires increasing f d, and the desire to make the calculator as simple as possible leads to the desire to lower f d.

However, there is a restriction on the minimum value of f d: for complete reconstruction of the signal from its samples x ( n T) it is necessary that the sampling frequency f d be at least twice the highest frequency F in the spectrum of the transmitted signal x (t).

f d ≥ 2F in; T ≤

It follows that with an infinite spectrum, when F → ∞, sampling is impossible.

However, in the spectrum of any final signal there are such higher components that, starting from a certain upper frequency f c, have insignificant amplitudes, and therefore they can be neglected without noticeable distortion of the signal itself. Meaning f in is determined by the specific type of signal and the problem being solved. For example: for a standard telephone signal f = 3.4 kHz, minimum standard its sampling frequency f d = 8 kHz. Limiting the spectrum to frequency F = f is carried out by a low-pass filter.

Quantization of samples by levels (quantization)– is carried out in order to form a sequence of numbers: the entire range of changes in the value of samples is divided into a certain number of discrete levels, and each sample, according to a certain rule, is assigned the value of one of the two nearest quantization levels, between which this sample falls. The result is a sequence of numbers x ( n T) = x (n), represented in binary code. The number of levels is determined by the bit capacity of the ADC. For example: If the ADC width = 3, then in total you can have k = 2 3 = 8 quantization levels, and the minimum sample value is 0 (000), and maximum value count is 7 (111). It is clear that the quantized sample is different from the sample x ( n T). This difference is expressed by the quantization error:

which is larger, the smaller the ADC bit capacity.

After the ADC sequence x ( n T) = x (n) is fed to a signal processor (SP), which, according to a given algorithm, assigns each sample x (n) a unique correspondence to the output sample y (n T) = y (n).

The number of operations (multiplications, additions, etc.) to obtain one sample can be in the thousands, so the signal processor must operate at a higher frequency F g in order to have time to perform all the necessary actions before the next sample x (n) arrives, that is, what no matter the complexity of the algorithm, the processing time t ln should not exceed the sampling period T:

t lane≤ T

But this can only be ensured if the clock frequency f T computer significantly exceeds the sampling frequency f d:

f d << f T

It is under these conditions that the computer can operate in real time, that is, at the rate at which input samples are received.

The resulting output samples from the signal processor are fed to a DAC and then to a low-pass anti-aliasing filter, which converts them into an analog continuous signal y(t).

The main tasks (algorithms) of signal processors:

1.)Digital filtering

Digital filtering is frequency selection, that is, some frequencies are allowed to pass through and others are not. Behind digital filtering is the Z-transform, convolution.

2.) Spectroscopy

Spectroscopy is a set of digital signal processing methods that make it possible to find all the frequency components of the signal in a signal - without isolating or distorting them. Here, DFT (Discrete Fourier Transform) and FFT (Fast Fourier Transform) are performed.

3.) Signal identification

Signal identification is the process of distinguishing signals from a background of frequencies and noise to ensure that it is a signal and not interference. This is where correlation analysis is performed.

Correlation is the degree to which two functions coincide.

4.) Modulation and demodulation.

Behind modulation and demodulation is a hardware, mathematical Hilbert transform.

Example: demodulation of a single-sideband signal, which is obtained by separating one of the sidebands of the amplitude-modulated signal. The result of demodulation is a low-frequency signal, which is envelope narrowband signal. Demodulated signal x(n) can be represented in complex form:

;;, Where

Imaginary signal;

x ( n ) – real signal;

s ( n) – signal envelope x (n).

From the formulas it is clear that x (n) and are in quadrature relative to each other, that is, their phases differ by π /2. Therefore, it is necessary to have a phase shifter on π /2. Such signals are called Hilbert conjugate, and the device that forms a pair of conjugate signals is called a digital Hilbert converter (DHC), which allows the calculation of the envelope s (n) of the signal x (n).

5) Compression, stretching, spectrum transfer

Behind the compression, stretching, and transfer of the spectrum is the same Hilbert transform. They are considered one of the modifications of modulation and demodulation.

Calculations of digital signal processing algorithms are reduced to the form in real time, when the execution time of operations is completely predictable:

, where n = 0, 1, 2, … , N -1

x ( n ) – impact counts;

y ( n) – reaction counts;

b to - real coefficients that completely determine the properties of digital filters;

x ( n -к) - impact samples delayed by k sampling periods T.

The filter described by this expression is called a non-recursive or FIR filter (finite impulse response filter).

Example : You need to sample for a certain time, and not at all. Let the sampling rate f d = 48 kHz (rounded to 50 kHz). We need to sample in 20 μs. Let's take N = 5 and write the formula:

y 0 = b 0 x ( 0 - 0) + b 1 x (0 - 1) + b 2 x (0 - 2) + b 3 x (0 - 3) + b 4 x (0 - 4) = b 0 x 0 + b 1 x - 1 + b 2 x - 2 + b 3 x - 3 + b 4 x – 4

y 1 = b 0 x ( 1 - 0) + b 1 x (1 - 1) + b 2 x (1 - 2) + b 3 x (1 - 3) + b 4 x (1 - 4) = b 0 x 1 + b 1 x 0 + b 2 x - 1 + b 3 x - 2 + b 4 x – 3

y 2 = b 0 x ( 2 - 0) + b 1 x (2 - 1) + b 2 x (2 - 2) + b 3 x (2 - 3) + b 4 x (2 - 4) = b 0 x 2 + b 1 x 1 + b 2 x 0 + b 3 x - 1 + b 4 x – 2

y 3 = b 0 x ( 3 - 0) + b 1 x (3 - 1) + b 2 x (3 - 2) + b 3 x (3 - 3) + b 4 x (3 - 4) = b 0 x 3 + b 1 x 2 + b 2 x 1 + b 3 x 0 + b 4 x – 1

y 4 = b 0 x ( 4 - 0) + b 1 x (4 - 1) + b 2 x (4 - 2) + b 3 x (4 - 3) + b 4 x (4 - 4) = b 0 x 4 + b 1 x 3 + b 2 x 2 + b 3 x 1 + b 4 x 0

y 5 = written as y 0.

Note: x 0 is the ADC reading at a given time. If the ADC reading has a negative sign, this means that the reading is the previous one. To calculate y 0 you need to use the current ADC reading and the four readings that precede it, and to calculate y 1 you need to use x 1 and the four readings that precede it, etc.

2. Structure of digital signal processors

The basic operation of digital signal processing is the operation of multiplication and adding (accumulating) the result of the multiplication. The device of combined addition and multiplication is often denoted by mnemonics when describing MAS (Multiplier-Adder Combination). In order to operate at high performance, the processor must perform the MAC operation in one processor cycle (cycle). This must be done in hardware, not software. Signal samples, filter coefficients, and program instructions are stored in memory. To perform the operation, it is necessary to make three samples from memory - a command and two factors. Therefore, to work with high performance, these three samples must be performed in one processor cycle. This implies that the result of the operation remains in the operation execution device (in the central processing unit) and is not placed in memory. In a more general case, another operation is needed to write the result into memory, i.e. four memory accesses per cycle are required. Thus, processor performance is, first of all, determined by the capabilities of data exchange between the central processing unit and processor memory and the organization of their interaction.

Digital signal processors must have Harvard architecture with separate data and command buses. Thanks to this, it will be possible to simultaneously perform access operations to various memory devices, i.e. synchronously select an instruction from program memory and a multiplier from data memory. The data memory must consist of two parts (traditionally they are called: x memory and y memory). To store signal samples, for example, memory x is used, and memory y is used to store coefficients.

Thus, in Motorola processors, in order to be able to perform two operand samples in one clock cycle, the number of independent memory modules and the number of buses for data transfer are increased. Processors have three memory banks (modules) for three samples per clock cycle and a corresponding number of buses. Performance problems may arise if there is insufficient internal memory. External buses can only make one memory access per clock cycle.

Digital signal processors use specialized address generation devices (UGA), which form the addresses of the data retrieved from the data memory. UGAs operate in parallel with other modules and allow, simultaneously with the execution of operations in the ALU, the addresses of the operands for the next instruction.

Cyclic processes, i.e. repetition of single instructions and their blocks occupy a significant place among digital signal processing algorithms. The usual organization of loops in a programmatic way requires the use of commands for generating and checking the conditions for ending loops, which must be executed each time the “body” of the loop is passed. These commands take time to execute. Therefore there must be hardware cycle counter. The DSP uses devices that allow you to organize cycles with “zero loss” of time for organization (checking termination conditions).

Motorola processors use the DO loop instruction, which operates the loop start and loop end registers (LC and LA).

Harvard architecture automatically opens a multi-stage pipeline (from 3 to 11 pipeline stages). In the basic version: three conveyor stages.

Basic option: Motorola DSP 56 000 = 560 = 56K, where K = 000

Processor Number

seriesin this series

3. Key indicators of digital signal processors

1.) Method of data presentation.

According to this indicator, all digital signal processors are divided into:

1.1. Fixed-point processors (FP) or fixed-point processors (FP).

1.2. Floating point processors (FP) or floating point processors (FP).

The most common processors are fixed-point processors or fixed-point processors - they are found in all phones.

In floating point processors, data is represented as a mantissa or exponent. Floating point processors are much more complex and the most expensive (several hundred dollars).

2.) Data representation bit depth.

For fixed-point processors, the bit width is 16 (for most signal processors) or 24 (for Motorola).

For floating point processors, the bit width is 32 (of which the order is represented by 8 bits, the mantissa by 23 bits, and the sign by 1 bit).

Floating-point processors have a large range of representation of numbers (we remove the mantissa) taking into account the sign: from 2 −128 to 2 127 .

Range Number representation sets boundaries between the minimum and maximum permissible values represented in a given format and code.

Dynamic Range (DD):

DD = |max meaning |/ |min value ≠ 0 |

The dynamic range in decibels is:

20 log ( DD) = 20 log (| max value | / | min value ≠ 0 |)

The dynamic range of signals that processors can work with without distortion is much narrower for fixed-point processors (by several orders of decimal). With relatively simple processing algorithms, this may not matter, because The dynamic range of actual input signals is often less than what the DSP allows, but in some cases overflow errors may occur during program execution. This leads to fundamentally uncorrectable nonlinear distortions in the output signal, similar to distortions due to clipping in analog circuits.

3) Performance

One of the most common mistakes a developer makes is equating clock frequency and performance, which is wrong in most cases. Very often, the speed of a DSP is indicated in MIPS (millions of instructions per second). This is the most easily measured parameter. The performance of normal processors is several tens of MIPS.

However, the problem with comparing the speed of different DSPs is that the processors

have different instruction systems, and different processors require different numbers of these instructions to execute the same algorithm. In addition, sometimes different numbers of clock cycles are required to execute different instructions on the same processor. As a result, a processor with a speed of 1000 MIPS may well turn out to be many times slower than a processor with a speed of 300 MIPS, especially with different bit depths.

One solution to this problem is to compare processors based on execution speed

certain operations, such as multiply-accumulate (MAC) operations. Speed

performing such operations is critical for algorithms that use digital

filtering, correlation and Fourier transforms. Unfortunately, such an assessment is also not

provides complete information about the real performance of the processor.

The most accurate estimate is the speed of execution of certain algorithms -

for example, FIR and IIR filtering, but this requires the development of appropriate programs and careful analysis of test results.

There are companies that analyze and compare processors based on their main characteristics, including speed. The leader among such companies is BDTI (Berkeley Design Technology, Inc).

4. Major manufacturers of signal processors

1.) Texas Instruments (TI) ) occupies about 48% of the DSP market. It was she who released 1982 . the first DSP that was a commercial success. The TMS32010 DSP was used in the game Speak and Spell, as well as in a talking doll named Julie. All Texas Instruments digital signal processors are branded: TMS3200xxx.

2.) Company Analog Devices (AD). All Analog Devices digital signal processors are branded: ADSP21 xxx.

3.) Motorola. Series: DSP560xx

DSP 561xx Fixed Point Processors.

DSP 563xx

DSP 566xx

DSP 568xx

Intel Previously, it was also one of the top three manufacturers of signal processors, but now it has been pushed aside.

Signal processors are also produced in our country, although they are somewhat inferior to their foreign counterparts, but they exist. For example: currently the Research Institute of Electronic Technology (“NIIET”) is mass-producing 16-bit fixed-point DSP processors M1867VM x with a performance of 5 MIPS.

5. Hardware implementation

The digital signal processor is divided into two parts: an operating unit and a control unit.

Operating unit

Operation control unit.

To input registers x 0 , x 1 , y 0 , y 1 Data comes from memory and is transmitted to the MAC or ALU, which can be used either separately or in pairs. If double-length data needs to be used, 16 bits are typically used. The result of the operation from accumulator A or B is transferred to the data memory through the output shifter.

Load distribution between MAS and ALU: 62 commands in the basic version, of which: 61 are ALU and 1 are MAS.

MAC is executed 1000 times more often than all other commands and it is it that determines the speed of operation.

Rice. MAC block diagram

In the MAC block, after multiplication, the first summation occurs with zero, and then after each multiplication, summation occurs with each accumulator value. There are always two or more batteries.

The shifter allows you to make shifts when transferring and loading operands without using additional instructions.

If in Motorola processors (in the basic version Motorola DSP 560xx) the word width is 24, then the length of the extended word is: 24 + 24 + 8 = 56 bits, where 8 bits are allocated for data expansion.

If in Motorola processors the word width is 16, then the length of the extended word is: 16 + 16 + 8 = 40 bits, where 8 bits are allocated for data expansion.

An example of the representation of integers in double and extended accumulator word formats with a length of 56 bits in Motorola DSP560xx processors:

Note:

In the figure, the EXT extension is filled with zeros - the value of the 47th sign bit.

The representation of integers in the FT format in double and extended word formats assumes the following functional bit distribution:

1.) The most significant bit MSB of the high word MSP is used:

· How iconic upon presentation signed integers; the value MSB = 0 corresponds to a positive sign, and MSB = 1 - a negative sign; zero is considered positive; the remaining bits are significant;

· as the most significant when representing unsigned numbers; Unsigned are integers that have a positive sign by default.

2.) All bits except the sign bit are counted meaningful; they are aligned by right edge of the format, i.e. The least significant bit of the LSB corresponds to the least significant bit of the binary integer number.

3.) Upon presentation signed integers in the "extended word" format in the EXT extension occurs sign extension; this means that all EXT bits are automatically filled with the value of the most significant sign bit MSB of the MSP word : LSP.

4.) Upon presentation unsigned integers in extended word format, all EXT bits are cleared.