In this notebook, I want to classify emotion in speech from audiofiles with different Machine Learning techniques.
For this purpose I use the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). I only use the speech audio data, that consists of 1440 files. The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. The data is freely available for download here: https://zenodo.org/record/1188976#.YUuSMC0es1I
Additionally, I provide a small survey on human accuracy on the database for download.