-
Notifications
You must be signed in to change notification settings - Fork 9
Design Strategies G4GUO
from G4GUO
DVB-S2, the standard we are implementing, has been designed for the substantial parallel processing powers of ASIC (Application Specific Integrated Circuits), FPGA (Field Programmable Gate Arrays), and GPUs (Graphical Processing Units). In G4GUO's opinion, GPUs don't have as steep a learning curve as FPGAs, so this was the path he selected for research and development.
Symbol tracking and root raised cosine filtering is best done in the FPGA on the LimeSDR.
His initial thoughts were to re-write some of the LimeSDR code so he can alter the ADC sample rate in fractions of a symbol. Then use the host computer to calculate the timing error and send the correction to the Lime FPGA code. The Lime can also do fine frequency error correction using a complex mixer. The error can be calculated in the host from the phase change in the preamble sequence.
This assumes native C/C++/Cuda code would be written, and would not be implemented in GNU Radio.
LDPC is highly parallel and optimized for RTL. The perfomance level for standard DVB-S2 is 30Msps minimum and some ASIC's support two 60Msps channels. Of course, we only need about 1/10 of that. The idea of using a GPU or DSP for this is intriguing and even if doesn't end up being the path for Phase 4 it may still be very useful in the future for Phase 5. It would also be extremely nice to have a working software implementation of any kind for verification, even if it is far from real time.
How you organize your code makes a lot of difference as to how fast it runs on a GPU. NVIDIA, the company that makes the GPU that G4GUO is using, has a set of profiling tools. We're assuming that one way to approach GPU development is to write a naive implementation and then use the profiler to optimize it.
G4GUO says that
even making slight changes to the code seems to greatly effect the time it takes. The profiler is very good at telling you about memory access conflicts, cache usage, register usage, and uS per kernel call. In fact it tells you a huge amount of information about what the GPU is doing.
He may leave all the baseband frame decoding to someone else and just concentrate on the modem parts.
He anticipates using Ron's GNURadio stuff to generate test files for the decoder at various points, but expects that it will be some months in the future.
A central problem is how to fit functions into the memory model of the GPU in order to keep all the threads fully occupied. This requires a combination of LDPC decoding, parallel thinking and NVIDIA GPU programming. Specific challenges include how to cope with the final XOR of the parity bit for each block as that makes every bit in the whole thing dependent on every other bit. When a sub block of the code meets a condition where all its parity check equation are correct it can be marked as finished and the decoder can then move on to the next sub block.
There is a paper on how to fit the LDPC decoder into an FPGA. It was written by the people that developed the DVB-S2 FEC code. This paper documents a technique of breaking the 64800 bit block into sub blocks for GPU decoding.
Partitioning the Design
As with RFNoC (RF Network on a Chip, see https://www.ettus.com/sdr-software/detail/rf-network-on-chip), the split between processor and co-processor is critical. In this particular implementation, the work is divided between the CPU (Central Processing Unit, usually a general purpose host machine) and GPU. Transferring data to and from the graphics card slows things down. Managing this data shipment makes or breaks processor/co-processor designs. G4GUO moved some of the BCH (Bose-Chaudhuri-Hocquenghem) decoder software from the CPU to the GPU simply to reduce memory moves across the PCIe bus. Learn more about BCH codes, which are one of the two concatenated codes in DVB-S2, here: https://en.wikipedia.org/wiki/BCH_code