Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_raw_xdf() can I use the nominal sampling rate? #436

Open
behinger opened this issue Sep 27, 2024 · 4 comments
Open

read_raw_xdf() can I use the nominal sampling rate? #436

behinger opened this issue Sep 27, 2024 · 4 comments

Comments

@behinger
Copy link

Hi!
Thanks a ton for the xdf importer. I stumbled today because I want to use my EEG-Amp as the "main-clock", that is, change all timestamps to the nominal sampling rate provided by the EEG stream (I trust the amp more than my recording laptop). If I don't do this, every subject has a slightly different sampling rate (1000.0001 vs. 9999.9998 etc.), which is annoying to continue working with.

As I understood, I could specify a fs_new, but this would also resample my EEG-dataset; whereas I don't need resampling for that one, just dropping the timestamps and recacluating them with the nominal sampling rate (also sometimes we collect with differet FS, would be nice to just be able to specify the nominal one).

I think this is the default behavior of the matlab xdf importer - but I dont have too much experience with that one either.

Maybe this is already possible, if not, I wonder how people are dealing with this issue right now? I'm willing to also spend some time to suggest a PR, but asking first is appropriate I think :)

Cheers, Bene

@cbrnr
Copy link
Owner

cbrnr commented Sep 30, 2024

Hi @behinger! What a happy coincidence, I have just started working on refactoring (and improving) the XDF importer! I'm mentioning this because it is related to your question about nominal vs. effective sampling frequency.

In summary, we decided to use the effective sampling frequency as the default, since currently we do not have any reason to believe that amplifier clocks are more accurate than computer clocks (see the original discussion for reference). Yes, this means that recordings will not be associated with "nice" frequencies like 1000 Hz, but you will see something like 1000.0001 Hz. In practice, this difference is absolutely negligible, so we (I) decided to use the effective sampling frequency as "the ground truth" in my implementation.

One of the reasons why I chose the effective sampling frequency is that it is derived directly from the time stamps (by default, pyxdf.load_xdf() also applies a dejittering algorithm to smooth out variations in lengths between samples). And now here's what I'm currently working on: if there are gaps in the data, these will be automatically reflected by the time stamps, whereas assuming a constant (nominal) sampling frequency will lead to errors in the timing of the signals. Unfortunately, this is what is currently happening, see #385 for reference. My solution is to treat time stamps as a non-uniformly sampled time series and interpolate onto a regular grid (e.g. the nominal sampling frequency). This is very similar to what is currently happening (it is effectively resampling to the nominal sampling frequency), but any gaps in the data are not accounted for.

However, I can still see the value in wanting to treat the data as if it had been sampled with the nominal sampling frequency without any resampling/interpolation. I think we could handle this use case by maybe adding/changing a parameter, but I don't have a good idea yet. In addition, I don't know how we could handle gaps with this approach, which I think is pretty important. One option could be to let the user decide to explicitly disable any gap handling when treating the data as sampled with the nominal sampling frequency. Otherwise, I don't think there's a way around resampling/interpolation, as that's just how XDF works.

I'm very open to input and ideas of course!

@behinger
Copy link
Author

behinger commented Sep 30, 2024

thanks for the detailed response. I didnt remember I had posted on that other thread before.

  • I try to remove necessitiy for downsampling. E.g. I once downsampled data from 1024 to 500 Hz and got the weirdest frequency artefacts (at 48/24Hz). Turns out, introduced due to the non-divisor downsampling. Now I understand that for linear interpolation this shouldnt happen - but ok - I need to figure out fs_new (as the nominal) in either case.

  • The dejitter etc. is independent of which time-stamps to use. I imagine resampling is only necessary if there are multiple regularly sampled streams in the XDF, and then only n-1 need to be interpolated/resampled.

-The gap issue: afaik the matlab importer has a gap detector, maybe this is used for this case? But if you do not resample, just modify the timestamps to follow the nominal rate, this shouldnt be a problem.
I imagine the calculation like this:

sf_eff= 1001
sf_nom = 1000
t_lsl = range(0,step=1/sf_eff,length=100)
t_new = t_lsl ./sf_eff .* sf_nom

edit: haha, this is indeed how I did it in Julia
https://github.com/s-ccs/LslTools.jl/blob/a21a45661022b2637c646a7ef14da3418f5e2504/src/LslTools.jl#L12

@cbrnr
Copy link
Owner

cbrnr commented Sep 30, 2024

If done correctly, resampling should not introduce any weird artifacts. I'm trying to do it correctly this time 😄.

Resampling is primarily necessary if there are multiple streams, yes, but as I've mentioned, I currently also use it even when there's only a single stream and I do not want the effective sampling frequency (but the nominal one for example).

I don't know if pyxdf is able to detect gaps, but even if it does, I still need to create a regularly sampled 2D NumPy array, where gaps are filled with NaNs.

As a workaround, you can do exactly what you suggest. I'm just not sure how to integrate this in the reader.

If there are multiple streams, there is no way around interpolation/resampling, right?

If there is only a single stream, there are two options: no resampling (which currently uses the effective sampling frequency) and resampling. I guess we could let the users decide whether to use the effective or the nominal sampling frequency in the first case.

One idea to change the API would be to remove the fs_new parameter and instead introduce a new parameter fs with possible values "effective", "nominal", or a float. All three values would be possible with a single stream (of which "effective" and "nominal" would not involve resampling), whereas multiple streams require a float (a new sampling frequency).

WDYT? I have to ponder this a little more, maybe there is a better solution.

@behinger
Copy link
Author

(the resampling was via eeglab... never looked into it in detail, no time!)

  • gaps could be added to MNE with some kind of break event (I dont know MNE well enough, in eeglab a boundary event), or maybe MNE doesnt support that?
  • multiple streams: if you only want to apply the nominal srate, no need to interpolate. If you need to have them on the same sampling rate & "grid", yes, at least n-1 need interpolation agreed
  • Your idea could work; another idea: selection option for the "main_clock" which could be "lsl" by default (resulting in effective sampling rates), or a stream name, which would result into a factor being applied to all timestamps, k = sf_nom / sf_eff. Maybe this is more transparent? It detaches resampling need from the main-clock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants