Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does it make sense to do IQ imbalance correction after decimation? #136

Open
mryndzionek opened this issue Oct 11, 2024 · 47 comments
Open

Comments

@mryndzionek
Copy link
Contributor

Just a general RF DSP question. I doubt we'll be able to do IQ imbalance correction at 500ksps, especially on RP2040, but does it make sense to do this after decimation?

@dawsonjon
Copy link
Owner

Yes, it still makes sense after decimation. I've been meaning to do some measurements of mirror suppression, so we can do a before and after.

@mryndzionek
Copy link
Contributor Author

Yes, it still makes sense after decimation. I've been meaning to do some measurements of mirror suppression, so we can do a before and after.

What exactly needs to be done? Maybe I could help.

@dawsonjon
Copy link
Owner

dawsonjon commented Oct 11, 2024 via email

@mryndzionek
Copy link
Contributor Author

Thank for for the detailed explanation. So the idea is to have the correction coefficients fixed (or have few for different band)? I was thinking about constant software estimation and compensation. I was looking at this, but even computing all the needed moving averages seems to be too expensive in our case.

@dawsonjon
Copy link
Owner

I was thinking of implementing automatic correction similar to the method you linked. The Mosley paper they followed on the uhsdr project describes an efficient method of estimating the imbalance and calculating the correction coefficients. It basically uses accumulators to estimate the phase and magnitude differences between I and Q. We wouldn't need to calculate the correction coefficients very often, once every few blocks would be plenty. Applying the correction should be achievable if we apply it after decimation, it should be slightly simpler than the existing frequency shift, only a couple of multiplies and adds for each sample.

@mryndzionek
Copy link
Contributor Author

I was thinking of implementing automatic correction similar to the method you linked. The Mosley paper they followed on the uhsdr project describes an efficient method of estimating the imbalance and calculating the correction coefficients. It basically uses accumulators to estimate the phase and magnitude differences between I and Q. We wouldn't need to calculate the correction coefficients very often, once every few blocks would be plenty. Applying the correction should be achievable if we apply it after decimation, it should be slightly simpler than the existing frequency shift, only a couple of multiplies and adds for each sample.

I managed to write something that is efficient enough and might actually work. The computed values seem to be stable and close to what is expected (c1 small, close to zero, c2 close to one). Just listening to stations it's hard to say what the effect is and I don't have equipment to test. The code is here if you're interested.

@dawsonjon
Copy link
Owner

Nice! Looks like a very efficient implementation. I will try some tests and see if it improves the performance.

@mryndzionek
Copy link
Contributor Author

Nice! Looks like a very efficient implementation. I will try some tests and see if it improves the performance.

Here is a branch with some more adjustments and UI controls for the correction: https://github.com/mryndzionek/PicoRX/tree/iq_imb6

@dawsonjon
Copy link
Owner

Something not quite right at the moment, enabling IQ correction is making the images worse rather than better. I expect its probably a simple fix somewhere.
For interest, I have done some baseline measurements (a bit crude, only a single data point at each frequency) of the image rejection with no correction: [email protected], [email protected], 34dB@7MHz, 37dB@14MHz, 38dB@28MHz. I'm excited to see how much improvement we can get with IQ correction.

@mryndzionek
Copy link
Contributor Author

Something not quite right at the moment, enabling IQ correction is making the images worse rather than better. I expect its probably a simple fix somewhere.

I think we're running out of int32_t range on some accumulator variables...

@mryndzionek
Copy link
Contributor Author

I added one more commit to my branch. Now it should be less obviously worse after enabling correction 😄 (I still don't see a clearly positive effect).

@mryndzionek
Copy link
Contributor Author

On this branch I added some more adjustment and first order dc-block on IQ signals. This seems to have some positive effect and it's mostly due to dc-blocking, not the imbalance corrector.

@mryndzionek
Copy link
Contributor Author

Here is the final version. The impact is hard to judge. It's at least not obviously bad 😉 I'm not sure I'll be able to come up with something better.

@dawsonjon
Copy link
Owner

Cool, testing it now!

@mryndzionek
Copy link
Contributor Author

I have made a simple (and fixed point) simulation in Python:
Screenshot at 2024-10-15 19-29-57

So yeah, it works. At least for clean IQ signals.

@dawsonjon
Copy link
Owner

I tried a few tests with the iq_corr_mosely branch, but I still couldn't see any improvements to the image rejection. I have spent a bit of time trying to see if I could get it working. I wasn't able to put my finger on the issue, but I worked through testing each stage in turn and I came up with this.

It seems to work very well, improving the image rejection by at least 10-20dB, but it is now hard to measure the image rejection because the images are either below the noise floor, or too small to measure.

I really like your idea of putting the DC removal after the decimation, it makes it much faster, and works with I and Q individually. It turns out that the cic decimation doesn't care about the DC offset, so I was able to take out the earlier DC removal on the ADC inputs which saves a significant amount of CPU. I was also able to remove the rounding from the cic decimator which is now redundant, although the effect was much less noticeable.

@mryndzionek
Copy link
Contributor Author

Looks and works great! I would say shipit :shipit:

@mryndzionek
Copy link
Contributor Author

One more thing. I've been playing with CIC compensation filters. FIR needs at least 15 taps, so too expensive (gets me to 130% CPU load). However I managed to derive an IIR filter. Here is the implementation:

void __not_in_flash_func(rx_dsp ::comp_filt)(int16_t &i, int16_t &q)
{
#define COMPF_B0 (-89274L)
#define COMPF_A1 (42593L)
#define COMPF_A2 (13914L)

  static int32_t i_yprev = 0;
  static int32_t q_yprev = 0;
  static int32_t i_ypprev = 0;
  static int32_t q_ypprev = 0;

  i = ((COMPF_B0 * i) - (COMPF_A1 * i_yprev) - (COMPF_A2 * i_ypprev)) >> 15;
  q = ((COMPF_B0 * q) - (COMPF_A1 * q_yprev) - (COMPF_A2 * q_ypprev)) >> 15;

  i_ypprev = i_yprev;
  q_ypprev = q_yprev;
  i_yprev = i;
  q_yprev = q;
}

Seems to make spectrum flatter and boost the tuned signal by ~6dBm. What do you think?

@dawsonjon
Copy link
Owner

That's a very neat implementation. I always struggle a bit with IIR filters, especially in fixed point. Would be very interested to see how this was derived.

It's fairly easy to compensate for the curve of the CIC filter in the frequency domain, I have tried this and it works well. There is a Python simulation of the CIC filter, I just took the reciprocal of this to work out the gain for each frequency point in the FFT. I then simply scale each frequency bin by the appropriate gain in the FFT filter. For an efficient implementation, it's only necessary to scale the frequency bins in the pass band. The whole spectrum is also flattened using this technique, but it is only necessary to apply the correction once, each time the display is updated.

I have a branch that includes most of this, I will push it and post a link when I get a spare moment.

@mryndzionek
Copy link
Contributor Author

That's a very neat implementation. I always struggle a bit with IIR filters, especially in fixed point. Would be very interested to see how this was derived.

The Python script is very messy. I used my own least squares FIR design function and Prony's method do derive the IIR. I can only show frequency responses for now:
iir_comp

@mryndzionek
Copy link
Contributor Author

mryndzionek commented Oct 18, 2024

The only potential problem might be nonlinear phase response:
phase

If I remember correctly we're tuning around 6kHz, so still in linear range.

@mryndzionek
Copy link
Contributor Author

Regarding moving stuff to frequency domain, is this how frequency shifting supposed to be done?

@dawsonjon
Copy link
Owner

Yes, that's right the frequency shift is effectively rotating the spectrum in the frequency domain. The difficulty comes when we need a resolution finer than 1 bin.

Its quite unusual to see frequency shifting implemented in the frequency domain. I think this is because frequency shifting is actually slower in the frequency domain than the time domain.

When we are filtering, a convolution operation in the time domain becomes a multiplication operation in the frequency domain, dramatically reducing the number of operations (even after the FFT and IFFT are taken into account).

When we are frequency shifting, the same principle works against us, a multiplication operation in the time domain becomes a convolution operation in the frequency domain, increasing the number of operations.

@mryndzionek
Copy link
Contributor Author

mryndzionek commented Oct 20, 2024

But how exactly is it slower? We are just shuffling memory without multiplications. Or are you talking about the resolution finer than 1 bin case?

Would be good to have a more specific TODO list of things that need to be moved to frequency domain.

@dawsonjon
Copy link
Owner

Yes, I meant if it was finer than 1 bin. I'm not sure if there is anything existing that needs to be moved to the frequency domain, but we can exploit it when we add new features.

One thing I would like to implement is frequency based noise reduction. Essentially this works like having a squelch function for each frequency bin that blanks "noise" below a threshold. The difficult bit is estimating the noise level in each bin in order to set the thresholds, and to avoid distorting the signal.

@mryndzionek
Copy link
Contributor Author

Actually regarding squelch, I would like to improve it. I made #140 and I think having this state machine and adjustable timeout makes sense, but what is missing is automatic squelch, without explicitly setting the threshold. I'm planning to explore this. Should be efficient even on floats.

@dawsonjon
Copy link
Owner

Yes, really like the idea of automatic squelch. I have pushed my TFT branch here. It is still a work in progress, but has support for a separate TFT waterfall, CAT control and a few other bits and pieces. I have implemented CIC correction in the frequency domain.

@mryndzionek
Copy link
Contributor Author

Yes, really like the idea of automatic squelch. I have pushed my TFT branch here. It is still a work in progress, but has support for a separate TFT waterfall, CAT control and a few other bits and pieces. I have implemented CIC correction in the frequency domain.

What is the plan with TFT branch? Will it be merged to testing? I really would like to have CIC correction in testing. Should I cherry-pick?

@dawsonjon
Copy link
Owner

I think I have merged most features from testing (not the new squelch change yet), but I took a different path early on, so there are some differences in approach. I was thinking of making a new release based on the TFT branch (fairly) soon. I probably won't add any new features to it for now, but will focus on testing, documentation and bug fixes.

My plan was to keep using the testing branch as a place to try out and share new features and ideas, and incorporate the more stable changes into releases periodically. I could merge the TFT branch back into testing now, or I'm happy to cherry-pick, whichever you prefer.

@mryndzionek
Copy link
Contributor Author

Will the TFT branch also support OLED displays and u8g2?

@dawsonjon
Copy link
Owner

Yes, should be completely backwards compatible. I have merged all the latest u8g2 oled code. The TFT is just an optional extra.

@mryndzionek
Copy link
Contributor Author

Yes, should be completely backwards compatible. I have merged all the latest u8g2 oled code. The TFT is just an optional extra.

Okay, so I don't think it make sense merging back. I can wait. I'll be periodically testing the TFT branch. Good that the menu is now non-blocking, so better scheduling of "actions" is possible in picorx.cpp.

@mryndzionek
Copy link
Contributor Author

Just a general question. In multiple sources I found information that after filtering in frequency domain, after ifft a window should be applied. Would it be beneficial in our case?

@dawsonjon
Copy link
Owner

That's a really good question, and I don't really know the answer. I have been doing a bit of reading up and it certainly seems like it can be beneficial in some circumstances. At the moment I'm quite happy with the way the frequency domain filtering is working, but it is something that I would like to learn a bit more about.

@mryndzionek
Copy link
Contributor Author

Regarding this:

One thing I would like to implement is frequency based noise reduction. Essentially this works like having a squelch function for each frequency bin that blanks "noise" below a threshold. The difficult bit is estimating the noise level in each bin in order to set the thresholds, and to avoid distorting the signal.

I made some trials and it sort of works, but I'm basing this on FFT/IFFT after demodulation and this is too slow for RP2040, at least without overclocking. Any ideas on how to do this before demodulation?

@dawsonjon
Copy link
Owner

dawsonjon commented Oct 24, 2024 via email

@mryndzionek
Copy link
Contributor Author

provided that the granularity of the frequency bins is fine enough.

Yeah, so I think this will be a problem. I had to use a STFT, a 128-bin FFT on two consecutive 64 frames of sample, at the very leas to get decent result. I think that having a STFT with even smaller stride would be beneficial.

@dawsonjon
Copy link
Owner

dawsonjon commented Oct 26, 2024

I had to use a STFT, a 128-bin FFT on two consecutive 64 frames of sample, at the very leas to get decent result.

The FFT filter is using an stft of 128 points on two consecutive frames of 64 samples.

I think that having a STFT with even smaller stride would be beneficial.

Did you think about x2 decimating after the demodulator? That would give you twice the resolution for the same FFT size, and twice as long to calculate it.

@mryndzionek
Copy link
Contributor Author

The FFT filter is using an stft of 128 points on two consecutive frames of 64 samples.

Yes, but I think there is a difference, as the modulated signal is just a fraction of the spectrum.

@dawsonjon
Copy link
Owner

The output side of the filter is working at 15kHz sample rate, and so is the demodulator. There is no practical change to the bandwidth of SSB and AM signals during demodulation, and the frequency range and resolution of a 128 point FFT remains the same unless you reduce the sample rate. (Maybe you are doing this already?)

It's true that the signal is only occupying a fraction of the spectrum, a speech signal only uses a couple of kHz of bandwidth, but we would need to reduce the sample rate to exploit that.

I think that having another FFT/IFFT after demodulation would be a superior option (it would be more flexible, gives us the option to work at a lower sample rates, and works with FM too) provided we can make it fast enough to be viable.

@mryndzionek
Copy link
Contributor Author

Did you think about x2 decimating after the demodulator? That would give you twice the resolution for the same FFT size, and twice as long to calculate it.

I'm not sure if I understand. Decimating 2x after demodulator means 15ksps->7.5ksps samplerate drop. Do we want that?

Today I tried to implement spectral subtraction using CMSIS-DSP, as it provides efficient RFFT functions etc. It is better from computational point of view, but still not enough. I'm almost to 100% load with just FFT/IFFT roundabout. Nevertheless I continued just to see what can be achieved. I also had to use float for the actual energy estimations, as fixed-point divisions were blowing the ranges. Good news is that even with floats the actual spectrum subtraction is not that demanding. FFT/IFFT are still the bottleneck. So in the end with FFT/IFFT from CMSIS-DSP, floating-point energy estimations and overclocking to 225-240MHz here is the result:
denoising.wav.zip

Denoising is active between 12-28s and 40s till the end of file. Quite nice I would say. Now just to make it more efficient...

@dawsonjon
Copy link
Owner

Impressive results. What size FFT did you use in the end?

I'm not sure if I understand. Decimating 2x after demodulator means 15ksps->7.5ksps samplerate drop. Do we want that?

Yes, that was my thinking. The audio signal never uses more than 3.75kHz, so half of the FFT bins will be zero anyway. Then you can get the same results with half the FFT size, and code will run at least twice as fast.

@mryndzionek
Copy link
Contributor Author

Impressive results. What size FFT did you use in the end?

Same as before, 128 on two consecutive frames. This time using arm_rfft_q15 which internally uses FFT64, I think.

@mryndzionek
Copy link
Contributor Author

No matter what I do with the noise canceller I'm over 100% CPU load. I started looking at other parts of the system and found one place for optimization. Using FFT from CMSIS-DSP seem to reduce the load by ~20%. I'll try create a PR in next day, or two. Might still be not enough, but probably half way there. I'll also need to move from floats to fixed-point. I need log10f and in seems to be hard to implement with fixed-point...

@dawsonjon
Copy link
Owner

Cool, sounds interesting can't be far off now. Log10 shouldn't be too hard to approximate in fixed point (depending on accuracy). You can calculate log2, then change the base by multiplying by a constant. Log2 can be calculated by counting leading zeros for the integer part, then using a lookup table or polynomial approximation for the fraction part.

@mryndzionek
Copy link
Contributor Author

Yeah, so actually the log10 is for converting SNR to dB and the SNR range is fairly small, so a LUT will probably be enough! Meanwhile I've created another PR - #142 - just an optimization.

@mryndzionek
Copy link
Contributor Author

In #144 I added first version of the denoiser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants