[WIP] Lhotse/K2 example #45

freewym · 2020-11-04T18:19:28Z

pzelasko · 2020-11-04T20:42:33Z

espresso/data/asr_k2_dataset.py

+            self.tgt_sizes = np.array(
+                [
+                    round(
+                        cut.supervisions[0].trim(cut.duration).duration / cut.frame_shift


You were previously using "cut.num_samples" if features were not available - it won't work here, as in that case frame_shift will be None

Also I thought tgt_size was used to denote "the number of output tokens" in a different context, here it seems like it's representing "the number of feature frames covered by a supervision"

It seems like your intention here (supervisions[0]) is to support only single-supervision cuts; maybe it makes sense to add an assertion?

tgt_sizes is used for determining batching sizes, and possibly affect the loss value. I am not sure if in the future we would add on-the-fly feature extraction in this class if only the recordings were available, and if we do the on-the-fly feature extraction, whether the field frame_shift will been populated. How about making tgt_sizes always the same as src_sizes?

I thought you'd rather want to use the number of tokens in supervision.text - unless I misunderstood the meaning of "target" in this context.

Yes, I have done exactly the same thing as what we did in Kaldi, i.e. making the length distribution of positives and negatives the same for training. This is done in local/data_prep.py in PR.

@danpovey I need to test if all the data prep work as expected (additive noise from MUSAN is still to be done). In the meantime, maybe we can start to think about implementing the LF-MMI loss using K2?

Regarding the data-prep and normalizing the sizes: I'm concerned that others who pick up the data from Lhotse may not do this and may get bad results? But IDK whether it would be natural to do that within Lhotse. Will comment in a second, about implementing the LF-MMI loss in k2.

Regarding implementing LF-MMI in k2:

you need to turn the nnet outputs into a DenseFsaVec. The nnet outputs will be of shape
(B,T,F) where B is the batch size, T is num-frames and F is number of features. Feature zero
will probably correspond to epsilon/blank. [If you do want an epsilon/blank then you should
probably call AddEpsilonSelfLoops() on your graphs before calling IntersectDensePruned() / intersect_dense_pruned(), since
IntersectDensePruned() does not treat epsilon specially. Caution: AddEpsilonSelfLoops() still
need to be wrapped to python, I created an issue on k2 for this.]

Anyway, the first step is to construct the DenseFsaVec from your nnet output. DenseFsaVec
supports different-sized supervision segments, and you have the choice here to omit any
padding frames from the frames you construct the DenseFsaVec from. Do git grep dense_fsa
in k2 and you'll find where the code is.

The next step is to construct the denominator graph as an Fsa (this is a k2 python type, although
there is also a C++ typedef of the same name; I refer here to the python type). You can probably create
it without epsilons and then add epsilon self-loops to it. If you want it can just be, effectively,
the union of 2 graphs, one for the numerator and one for the denominator. I expect you will use your
experience of what does and does not work, here.

The numerator graphs can start off as two Fsas, one for the positive and one for the negative examples.
Look at type Fsa in k2 (at the python level), because it does support being (really) a vector of Fsas.
Currently I don't know of a super efficient way to create the minibatch from the num and den fsas and
(say) a vector of bools, but this is doable; please consult @csukuangfj on this and maybe he can
create something.

The objective function will be like num_score - den_score, where each score comes from one call
to intersect_dense_pruned and then putting the output into get_tot_scores with log_semiring = true,
and summing the output tensor.

Sorry I have to go somewhere, but hopefully you can get stareted with this info and ask @csukuangfj for help.

AddEpsilonSelfLoops is added by k2-fsa/k2#313

espresso/data/asr_k2_dataset.py

espresso/tools/Makefile

freewym · 2020-11-06T00:48:52Z

@pzelasko I just drafted a data prep script in the file examples/mobvoihotwords/local/data_prep.py. I just would like to double-check with you whether I did everything correctly and efficiently.

Basically I want to augment the original training data with 1.1x/0.9 speed-perturbation, and reverberation separately, and then combined them into a single CutSet. I did that by first extracting augmented features and dump them into the disk separately, and then merging their respective CutSet and in the meantime modifying their ids (by prefixing) to differentiate utterances from the same underlying original one.

Also, I don't know the way I did speed perturbation is correct (in terms of both the use of pitch function and the value of pitch shift being passed on to the function)

Thanks

pzelasko

Looks great! I left you some comments.

examples/mobvoihotwords/local/data_prep.py

pzelasko · 2020-11-06T02:39:39Z

examples/mobvoihotwords/local/data_prep.py

+            if "train" in partition:
+                with LilcomFilesWriter(f"{output_dir}/feats_{partition}_orig") as storage:
+                    cut_set_orig = cut_set.compute_and_store_features(
+                        extractor=Mfcc(config=mfcc_hires_config),


it should be okay to instantiate Mfcc(config=...) once and re-use for all calls (although it won't make a difference in performance, just code terseness)

examples/mobvoihotwords/local/data_prep.py

pzelasko · 2020-11-06T03:15:48Z

BTW this is in a very experimental stage, but some time ago I was able to run Lhotse feature extraction distributed on our CLSP grid with these steps (admittedly not tested with data augmentation yet):

pip install dask distributed dask_jobqueue - Dask, a library that handles distributed computation in Python
pip install git+https://github.com/pzelasko/plz - my wrapper for Dask dedicated for CLSP grid
from dask.distributed import Client
from plz import setup_cluster
with setup_cluster() as cluster, Client(cluster) as ex: <- drop in replacement for process pool exec
cluster.scale(num_jobs)

If you'd like you can try it, else I will try it sometime, probably using your recipe as it'll be a great testing ground for this.

freewym · 2020-11-06T05:27:01Z

Thanks for the helpful comments! There are still additional steps for data preprocessing to been done before features extraction (additive noise and split the recordings). I will try the distributed extraction once they are done.

espresso/data/asr_k2_dataset.py

freewym force-pushed the lhotse branch 4 times, most recently from 2b230b6 to f801434 Compare November 4, 2020 18:46

pzelasko reviewed Nov 4, 2020

View reviewed changes

freewym force-pushed the lhotse branch 2 times, most recently from 1c925ee to e47b7f7 Compare November 5, 2020 03:09

freewym force-pushed the master branch from b3ed99c to 6f5f6a4 Compare November 5, 2020 06:09

freewym force-pushed the lhotse branch from e47b7f7 to a8f2457 Compare November 5, 2020 06:23

freewym force-pushed the master branch from 6f5f6a4 to daee300 Compare November 6, 2020 00:15

freewym force-pushed the lhotse branch 3 times, most recently from f21dd7c to 9885797 Compare November 6, 2020 00:34

pzelasko reviewed Nov 6, 2020

View reviewed changes

freewym force-pushed the lhotse branch from 9885797 to 1f09058 Compare November 6, 2020 05:21

freewym force-pushed the master branch from daee300 to e5c9d31 Compare November 7, 2020 02:25

freewym force-pushed the lhotse branch 2 times, most recently from fa07059 to 5ce9179 Compare November 7, 2020 02:45

danpovey reviewed Nov 7, 2020

View reviewed changes

espresso/data/asr_k2_dataset.py Outdated Show resolved Hide resolved

csukuangfj reviewed Nov 7, 2020

View reviewed changes

espresso/data/asr_k2_dataset.py Outdated Show resolved Hide resolved

freewym force-pushed the master branch 2 times, most recently from 705bb32 to 82dfb45 Compare November 8, 2020 08:07

freewym force-pushed the lhotse branch 5 times, most recently from 3f8f008 to 2fd59b9 Compare November 9, 2020 00:15

freewym force-pushed the main branch 4 times, most recently from 5a47606 to e597029 Compare October 10, 2022 20:44

freewym force-pushed the main branch 6 times, most recently from 227ec47 to 421b087 Compare October 18, 2022 22:28

freewym force-pushed the main branch 6 times, most recently from 97ef591 to 8ec673f Compare November 4, 2022 07:25

freewym force-pushed the main branch 7 times, most recently from 5d32606 to 41cdb00 Compare November 10, 2022 09:58

freewym force-pushed the main branch from 3dc3711 to 929ad76 Compare December 8, 2022 09:14

freewym force-pushed the main branch 6 times, most recently from 9c5ba0b to 660facf Compare July 24, 2023 00:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Lhotse/K2 example #45

[WIP] Lhotse/K2 example #45

freewym commented Nov 4, 2020

pzelasko Nov 4, 2020

pzelasko Nov 4, 2020

pzelasko Nov 4, 2020

freewym Nov 4, 2020

pzelasko Nov 5, 2020

freewym Nov 7, 2020

freewym Nov 7, 2020

danpovey Nov 7, 2020

danpovey Nov 7, 2020

csukuangfj Nov 7, 2020

freewym commented Nov 6, 2020

pzelasko left a comment

pzelasko Nov 6, 2020

pzelasko commented Nov 6, 2020 •

edited

Loading

freewym commented Nov 6, 2020

[WIP] Lhotse/K2 example #45

Are you sure you want to change the base?

[WIP] Lhotse/K2 example #45

Conversation

freewym commented Nov 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freewym commented Nov 6, 2020

pzelasko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pzelasko commented Nov 6, 2020 • edited Loading

freewym commented Nov 6, 2020

pzelasko commented Nov 6, 2020 •

edited

Loading