Add options to fix clipping and apply peak normalization to the gener…

…ated soundscape (#132) * bump version to 1.6.3 * Remove unused imports and clean up formatting * Implement peak normalize * First pass at fix_clipping and peak_normalization, tests failing * fix bug in code for saving isolated events audio * test peak_normalize * Warn user when ref_db changes due to clipping prevention * Return ref_db_change * update regression jams with new generate fields * Store all generation parmas in the jams sandbox * Add ability to exclude sandbox keys from jams comparison * exclude sandbox keys not relevant to tests * add new sandbox fields to prevent unit test fail on load * add tests for clipping and normalization * Remove pyrsistent==0.15.4 dep since we've dropped 2.7 and 3.4 * Update changelog * start working on more tests, commented out for now * make generate from jams backward compatible with files that don't have fix_clipping and peak_normalization * test generating from file that doesn't have fix_clipping and peak_normalization * regression * Almost done with tests... * Fixing generate_from_jams so it saves to the ann. * move transformer creation into conditional reverb block. MUST apply reverb AFTER peak normalization, doesn't work otherwise (in sox) * Use os.makedirs(..., exist_ok=True), update some inline comments * Updating profile script. Co-authored-by: pseeth <[email protected]>
justinsalamon · Sep 28, 2020 · c688a0f · c688a0f
1 parent adf64f4
commit c688a0f
Show file tree

Hide file tree

Showing 16 changed files with 1,021 additions and 88 deletions.
diff --git a/docs/changes.rst b/docs/changes.rst
@@ -2,6 +2,14 @@
 
 Changelog
 ---------
+v1.6.3
+~~~~~~
+- Scaper.generate now accepts two new optional arguments for controlling audio clipping and normalization:
+    - fix_clipping: if True and the soundscape audio is clipping, it will be peak normalized and all isolated events will be scaled accordingly.
+    - peak_normalization: if True, sounscape audio will be peak normalized regardless of whether it's clipping or not and all isolated events will be scaled accordingly.
+- All generate arguments are now documented in the scaper sandbox inside the JAMS annotation.
+- Furthermore, we also document in the JAMS: the scale factor used for peak normalization, the change in ref_db, and the actual ref_db of the generated audio.
+
 v1.6.2
 ~~~~~~
 - Switching from FFMpeg LUFS calculation to pyloudnorm for better performance: runtime is reduced by approximately 30%

diff --git a/scaper/audio.py b/scaper/audio.py
@@ -1,18 +1,10 @@
 # CREATED: 4/23/17 15:37 by Justin Salamon <[email protected]>
 
-'''
-Utility functions for audio processing using FFMPEG (beyond sox). Based on:
-https://github.com/mathos/neg23/
-'''
-
-import subprocess
-import sox
 import numpy as np
 import pyloudnorm
 import soundfile
-import tempfile
 from .scaper_exceptions import ScaperError
-from .util import _close_temp_files
+
 
 def get_integrated_lufs(audio_array, samplerate, min_duration=0.5,
                         filter_class='K-weighting', block_size=0.400):
@@ -104,5 +96,42 @@ def match_sample_length(audio_path, duration_in_samples):
 
         audio = np.pad(audio, pad_width, 'constant')
 
-    soundfile.write(audio_path, audio, sr, 
-        subtype=audio_info.subtype, format=audio_info.format)
+    soundfile.write(audio_path, audio, sr,
+                    subtype=audio_info.subtype, format=audio_info.format)
+
+
+def peak_normalize(soundscape_audio, event_audio_list):
+    """
+    Compute the scale factor required to peak normalize the audio such that
+    max(abs(soundscape_audio)) = 1.
+
+    Parameters
+    ----------
+    soundscape_audio : np.ndarray
+        The soudnscape audio.
+    event_audio_list : list
+        List of np.ndarrays containing the audio samples of each isolated
+        foreground event.
+
+    Returns
+    -------
+    scaled_soundscape_audio : np.ndarray
+        The peak normalized soundscape audio.
+    scaled_event_audio_list : list
+        List of np.ndarrays containing the scaled audio samples of
+        each isolated foreground event. All events are scaled by scale_factor.
+    scale_factor : float
+        The scale factor used to peak normalize the soundscape audio.
+    """
+    eps = 1e-10
+    max_sample = np.max(np.abs(soundscape_audio))
+    scale_factor = 1.0 / (max_sample + eps)
+
+    # scale the event audio and the soundscape audio:
+    scaled_soundscape_audio = soundscape_audio * scale_factor
+
+    scaled_event_audio_list = []
+    for event_audio in event_audio_list:
+        scaled_event_audio_list.append(event_audio * scale_factor)
+
+    return scaled_soundscape_audio, scaled_event_audio_list, scale_factor