Skip to content

A collection of python scripts useful for managing, manipulating, extracting and exporting '90s sample CDs.

License

Notifications You must be signed in to change notification settings

transverberate/smpl_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

smpl_tools

A collection of python scripts useful for managing, manipulating, extracting and exporting '90s sample CDs.

Installation

Prerequisites

This toolset requires Python 3 and ffmpeg to be installed and added to the user's PATH. Often times, the installations of python and ffmpeg will not automatically add these programs to the system PATH. Searching Google for questions like 'how to add ffmpeg to my PATH' etc. usually yields good explanations on this process.

To confirm python is in your system's PATH, open an instance of command prompt and type python --version (sometimes it will be installed as python3, in which case - every time python is encountered in this document, it should be replaced with python3). If present, python should respond with its version number.

To confirm ffmpeg is in your system's PATH, open an instance of command prompt and type ffmpeg -version. If present, ffmpeg should respond with its version number.

Installing this tool-set

Download the contents of this repo and extract them to a convenient place on your machine. Open an instance of command prompt and navigate to the location of you extracted the repo. You should now be in the folder with setup.py and the sample smpl_tools directory. Enter the command,

python -m pip install cython

to install the Cython to your python package library. This is needed to compile the numpy library that this tool-set makes use of.

Next, enter the command (note the period!),

python -m pip install .

to install smpl_tools to your python package library.

Finally, confirm that smpl_tools has been installed properly by entering the command,

python -m smpl_tools --help

If successfully installed, smpl_tools should respond with an explanation of its command usage.

Updating this tool-set

To update these scripts having already previously installed an older version, download the latest version from the github repo, open a command prompt/shell window in the directory in which you downloaded the repo and run the command (note the period!),

python -m pip install .

Common Tasks

Splitting CDDA sample tracks

Many sample CDs from the '90s encoded multiple samples on a single CDDA track. Isolating individual samples requires that they be spliced out manually in an audio editor. Often, these samples were assigned names which were tabulated in a CD’s companion booklet. Unfortunately, the waveforms extracted from CDDA tracks do not contain any of this naming information - making the process of locating a particular sample by name arduous.

This tool-set provides a split-by-silence script that assists in the process of splitting CDDA tracks into individual samples by locating the silent segments of a track and splitting the track at the onset of sound.

A simple splitting operation

A basic split-by-silence command has the following form

python -m smpl_tools split_by_silence [source] [-s duration] [-c amplitude] [-d destination]

In its most basic form, these parameters represent,

  • source: A file-path to a .wav containing the CDDA waveform.
  • duration: The duration of silence (in seconds) that should be taken to indicate the start of a new sample (default value: 0.4).
  • amplitude: The amplitude (in decibels) below which the waveform is considered to be "silent" (default value: -60).
  • destination: The directory where the extracted samples will be placed (default value is the same directory as the source).

Splitting multiple CDDA tracks

To split multiple tracks, place all the tracks in a single folder and call the split_by_silence command on the directory, rather than a single file. If the source folder contains tracks named trackA.wav, trackB.wav... etc., this command will produce the files trackA_01.wav, trackA_02.wav... trackB_01.wav, trackB_02.wav... etc.

Batch splitting & renaming multiple CDDA tracks

Often, one wishes to split multiple tracks into multiple samples where the filename of each sample given the name assigned to it by the CD’s sample listing. This task is impractical to accomplish using a single command-line call. Rather, a separate metadata file is needed to describe the split-by-silence parameters for each track.

To call the split_by_silence command in batch mode, include the -b switch in the command

python -m smpl_tools split_by_silence [source] [-b json_batchjob] [-d destination]

These parameters represent

  • source: The root directory from which script will search for the files specified by the batch job. Usually, this is the folder containing the CDDA .wav files.
  • json_batchjob: A path to the json file containing the metadata for this batch-job.
  • destination: The destination directory for the files produced by the script.

Note that the parameters specifying the amplitude and duration of silence are not given here. Rather, they are provided per-track in the meta-data file.

Contents of the metadata file

This metadata file is a json file whose top-level element is an array of tracks. Each track is a dictionary containing the keys

  • source: the filename of the track's .wav file
  • silence: The duration of silence (in seconds) that should be taken to indicate the start of a new sample
  • amplitude: The amplitude in decibels below which the waveform is considered to be "silent"
  • sample_names: An ordered array of strings containing the file names of each sample found for this track

Technically, source is the only required key. If the other keys are left unspecified, the default values from the command definition above will be used. The power of the metadata file lies in the sample_names key. This is what allows for custom renaming of samples within a track in the order they are split.

To better understand the layout of a metadata file, consider the following example file

[
  {
    "source": "Track 01.wav",
    "silence": 0.95,
    "amplitude": -100,
    "sample_names": [
      "Alligator.wav",
      "Baboon.wav",
      "Caribou.wav",
      "Dalmatian.wav",
      "Elephant.wav"
    ]
  },
  {
    "source": "Track 02.wav",
    "silence": 0.4,
    "amplitude": -60,
    "sample_names": [
      "Fox.wav",
      "Gorilla.wav",
      "Hare.wav",
      "Jaguar.wav"
    ]
  },
]

Its top-level element is an array (indicated by the []), that contains two track entries (contained in {}). Each track entry contains the keys source, silence, amplitude and sample_names.

If this metadata is placed in a file called myjob.json and ran as a batch job with the command

python -m smpl_tools split_by_silence cdtracks/ -b myjob.json -d output/

The script will search for the files Track 01.wav and Track 02.wav in the folder cdtracks/.

It will split Track 01.wav into five separate samples (if possible) using the parameters -s 0.95, and -c -100. Finally, it will write the samples to the output/ directory with the filenames Alligator.wav, Baboon.wav, Caribou.wav, Dalmatian.wav, and Elephant.wav (in that order).

It will split Track 02.wav into four separate samples in a similar manner, this time, using the parameters -s 0.4, and -c -60.

Ignoring certain samples

Occasionally the script may detect a sample that is incorrect or unneeded. This is often the case for repeat/duplicate samples present in some tracks (many Zero-G CDs will have tracks that arbitrarily repeat certain samples and not others). To ignore a sample, put null (not as string but as a literal) at the position in the sample_names key where that sample occurs.

Consider the following track entry (as part of a larger file)

{
    "source": "Track 36.wav",
    "silence": 0.4,
    "amplitude": -100,
    "sample_names": [
        "Fox.wav",
        null,
        "Gorilla.wav",
        "Hare.wav",
        "Jaguar.wav"
    ]
},

When the script encounters this track entry, it will attempt to split the track waveform into five samples but only export four samples. It will skip exporting the second sample it detects.

Customizing export destinations with pattern strings

Occasionally it may be necessary to alter the output filepath based on additional parameters. For instance, changing the output path based on the current track being processed. This can be accomplished using pattern strings. Pattern strings specify a string whose realization changes based on the value of external parameters. The split_by_silence command can take a pattern string as an input using the -p trigger.

python -m smpl_tools split_by_silence [source] [-b json_batchjob] [-d destination] [-p pattern_string]

Which external parameters are used and the location of their use is determined by reference patterns. Reference patterns are denoted by

%(NAME)

Here NAME identifies the parameter that should be substituted at the time of realization.

Consider the pattern string

output/%(smpl).wav

This pattern string contains one reference pattern, %(smpl). This reference pattern refers to the current sample name of the file being exported. These sample names are specified per track in a json batch file (as described in the preceding section).

Consider the split_by_silence command run with this pattern string,

python -m smpl_tools split_by_silence cdtracks/ -b myjob.json -p "output/%(smpl).wav"

Where the contents of myjob.json are

[
  {
      "source": "First.wav",
      "silence": 0.4,
      "amplitude": -100,
      "sample_names": [
          "Alpha.wav",
          "Beta.wav",
          "Gamma.wav"
      ]
  }
],

Execution of this command will produce the following files

output/Alpha.wav
output/Beta.wav
output/Gamma.wav

So far, this behavior is not any different from calling the split_by_silence command with the destination trigger -d output/

python -m smpl_tools split_by_silence cdtracks/ -b myjob.json -d output/

The split_by_silence command offers the following reference patterns

  • %(smpl) The current sample name specified per track in a json batch job.
  • %(trck) The current track name specified by the source parameter in a json batch job.
  • %(dst) The destination directory specified by the -d trigger in the command line.

Consider the following pattern string example

python -m smpl_tools split_by_silence cdtracks/ -b myjob.json -p "%(dst)/%(trck)/%(smpl).wav" -d output/

Where the contents of myjob.json are

[
  {
      "source": "First.wav",
      "silence": 0.4,
      "amplitude": -100,
      "sample_names": [
          "Alpha.wav",
          "Beta.wav",
          "Gamma.wav"
      ]
  },
  {
      "source": "Second.wav",
      "silence": 0.4,
      "amplitude": -100,
      "sample_names": [
          "Delta.wav",
          "Epsilon.wav",
          "Zeta.wav"
      ]
  }
],

Execution of this command will produce the following files

output/First/Alpha.wav
output/First/Beta.wav
output/First/Gamma.wav
output/Second/Delta.wav
output/Second/Epsilon.wav
output/Second/Zeta.wav

Development and contributing

This tool-set is in active development and has only been rigorously tested in Windows 10. If a bug is found, please report it as an issue. Feature requests are welcome, but there is no guarantee I will be able to implement it.

About

A collection of python scripts useful for managing, manipulating, extracting and exporting '90s sample CDs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages