Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to save recordingless analyzer as #3443

Merged
merged 10 commits into from
Oct 1, 2024

Conversation

alejoe91
Copy link
Member

@alejoe91 alejoe91 commented Sep 25, 2024

This PR allows to save an analyzer as binary or zarr even if recordingless

@alejoe91 alejoe91 added the core Changes to core module label Sep 25, 2024
@chrishalcrow
Copy link
Collaborator

chrishalcrow commented Sep 26, 2024

Hello, if I've already saved a sorting_analyzer in my_sa and then run the following code:

import spikeinterface.full as si
sa = si.load_sorting_analyzer("my_sa")
sa._recording = None
sa_zarr = sa.save_as(format="zarr", folder="my_zarr_sa")

Then the saved zarr folder has the correct folder structure but all the files are things like called things like 0 and 0.0 and are a few bytes big. If you save sa_zarr as a binary folder, it works. So something is wrong with the zarr saving.

@alejoe91
Copy link
Member Author

Those are the name of zarr chunks :) The recording will be saved as an empty dict, so it makes sense that it's few bytes. everything else should be ok (but I'll double check)

@chrishalcrow
Copy link
Collaborator

chrishalcrow commented Sep 26, 2024

Those are the name of zarr chunks :) The recording will be saved as an empty dict, so it makes sense that it's few bytes. everything else should be ok (but I'll double check)

Oh right! Everything is named like this, even e.g. sorting_provenance?
EDIT: ok I've read about zarr, cool!

Screenshot 2024-09-26 at 08 18 41

@alejoe91
Copy link
Member Author

https://probeinterface.readthedocs.io/en/main/releases/0.2.18.html

Those are the name of zarr chunks :) The recording will be saved as an empty dict, so it makes sense that it's few bytes. everything else should be ok (but I'll double check)

Oh right! Everything is named like this, even e.g. sorting_provenance? EDIT: ok I've read about zarr, cool!

Screenshot 2024-09-26 at 08 18 41

If you're interested, you can check the hidden files. You should have a .zarray describing the properties of the datasets and a .zattrs which is a dictionary with additional metadata ;)

@@ -2015,7 +2023,10 @@ def copy(self, new_sorting_analyzer, unit_ids=None):
new_extension.data = self.data
else:
new_extension.data = self._select_extension_data(unit_ids)
new_extension.run_info = self.run_info.copy()
if self.run_info is not None:
Copy link
Collaborator

@chrishalcrow chrishalcrow Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you import copy from copy you can use copy(self.run_info) which returns None if self.run_info is None, which would save some lines of code and indentations here and at line 2047.

@samuelgarcia
Copy link
Member

I am not sure to like this.
Lets discuss before any merger please.
How do we get rec_attributes if not recording ?

@alejoe91
Copy link
Member Author

From the analyzer rec_attributes, which is always set! This is essential especially when loading an old waveform_extractor

if self.format == "binary_folder":
extension_folder = self._get_binary_extension_folder()
run_info_file = extension_folder / "run_info.json"
run_info_file.write_text(json.dumps(run_info, indent=4), encoding="utf8")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I have here in general not related to this PR is if we encode with 'utf-8' will that then mess with other encoding systems? Or do we think that the encoding was really just limited to shell script issues? Is there any place where bouncing between encodings may bite us?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are building the dictionaries we are sure that they can be UTF-8 encoded and decoded, so I think it's safe!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case for paths where the paths include Chinese or Japanese characters, which can't be encoded with uft-8? For run_info that doesn't matter, but I'm wondering about other json files we make. It might be that we just let people know that this works for utf-8 and may work for other encodings. But since most people on the team all use utf-8 I think it will be hard for us to know for sure for this. Doesn't really hold anything up I guess.

@alejoe91 alejoe91 added this to the 0.101.2 milestone Sep 27, 2024
@@ -352,8 +353,6 @@ def create_memory(cls, sorting, recording, sparsity, return_scaled, rec_attribut
def create_binary_folder(cls, folder, sorting, recording, sparsity, return_scaled, rec_attributes):
# used by create and save_as

assert recording is not None, "To create a SortingAnalyzer you need to specify the recording"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to the function def create_sorting_analyzer ?

Comment on lines 383 to 384
with open(folder / "recording.json", mode="w") as f:
json.dump({}, f, indent=4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write nothing I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

recording.dump(folder / "recording.json", relative_to=folder)
elif recording.check_serializability("pickle"):
recording.dump(folder / "recording.pickle", relative_to=folder)
if recording is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have a mutualy exclusive between recording and rec_attributes in this function at some point

Comment on lines 538 to 540
warnings.warn(
"SortingAnalyzer with zarr : the Recording is not json serializable, the recording link will be lost for future load"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning should be also in the other format no ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not there? I'll check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@samuelgarcia samuelgarcia merged commit b0c2bae into SpikeInterface:main Oct 1, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants