Sound event localization, detection, and tracking (SELDT) is the combined task of identifying the temporal onset and offset of potentially temporally-overlapping sound events, recognizing their classes, and tracking their respective spatial trajectory when they are active.
first detect and then localize
The SELDnet maps the spectrogram to two outputs – sound event detection, and tracking; together they produce the SELDT output.