Skip to content

yandexdataschool/speech_course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YSDA Speech Processing Course

  • Materials for each week are in ./week* folders

Course program

  • Week 1: Slides | Lecture | Seminar
    • Lecture: Intro to Digital Signal Processing (DSP)
    • Seminar: Implement DSP pipeline
    • Homework (5pt): Implement mel-spectrogram transformations
  • Week 2:
    • Lecture: Introduction to speech NN discriminative models. Voice Activity Detection (VAD) and Sound Event Detection (SED) tasks
    • Seminar: Train VAD models
    • Homework (15pt): Train SED models
  • Week 3:
    • Lecture: Keyword Spotting and Speech Biometrics tasks
    • Seminar: Train Biometrics model and look at embeddings
    • Homework (20pt): Train Biometrics model to better quality
  • Week 4:
    • Lecture: Speech Recognition I
    • Seminar: Metrics and augmentations for speech recognition
    • Homework (10pt): Implement CTC algorithm
  • Week 5:
    • Lecture: Speech Recognition II, Pretraining
    • Homework (5pt): Finetune Wav2Vec2
  • Week 6:
    • Lecture: ASR Inference
    • Seminar: Streaming ASR
    • Homework (5pt): Seminar continuation
  • Week 7:
    • Lecture: Text-to-Speech I, intro, preprocessor, metrics
  • Week 8:
    • Lecture: Text-to-Speech II, Acoustic models and vocoding
    • Seminar (5pt): Pitch estimation, Monotonic Alignment Search for phoneme duration estimation
    • Homework (10pt): Train FastPitch model
  • Week 9:
    • Lecture: Text-to-Speech III, Codecs
    • Seminar: Vector Quantizaton, Residual Vector Quantization
  • Week 10:
    • Lecture: Text-to-Speech IV, Tortoise and other tranformers for TTS
    • Homework (15pt): write codec transformer with delayed pattern
  • Week 11:
    • Lecture: Multimodality, How to build a big GPT with voice capabilities
  • Week 12:
    • Lecture: noise reduction
    • Seminar: Streaming STFT and ISTFT
    • Homework (15pt): Noise reduction model implementation
  • Week 13:
    • Lecture: Acoustic Echo Cancelation (AEC) and Beamforming
    • Homework (5pt): Basic AEC implementation
Course program for spring 2024
  • Week 1: Slides | Lecture | Seminar
    • Lecture: Intro to Digital Signal Processing (DSP)
    • Seminar: Implement DSP pipeline
  • Week 2: Slides | Lecture | Seminar
    • Lecture: Introduction to speech NN discriminative models. Voice Activity Detection (VAD) and Sound Event Detection (SED) tasks
    • Seminar: Train VAD models
    • Homework: Train SED models
  • Week 3: Slides | Lecture | Seminar
    • Lecture: Keyword Spotting and Speech Biometrics tasks
    • Seminar: Train Biometrics model and look at embeddings
    • Homework: Train Biometrics model to better quality
  • Week 4: Slides | Lecture | Seminar
    • Lecture: Speech Recognition I
    • Seminar: Metrics and augmentations for speech recognition
    • Homework: Implement CTC algorithm
  • Week 5: Slides | Lecture
    • Lecture: Speech Recognition II, Pretraining
    • Homework: Finetune Wav2Vec2
  • Week 6: Slides | Lecture
    • Lecture: Text-to-Speech I, intro, preprocessor, metrics
  • Week 7: Slides | Lecture
    • Lecture: Text-to-Speech II, Acoustic models
    • Seminar: Pitch estimation, Monotonic Alignment Search for phoneme duration estimation
    • Homework: Train FastPitch model
  • Week 8: Slides, p1 | Lecture, p1 | Slides, p2 | Lecture, p2 | Seminar
    • Lecture, p1: Text-to-Speech III, Vocoding
    • Lecture, p2: Vector Quantization, Codecs
    • Seminar: Vector Quantizaton, Residual Vector Quantization
  • Week 9: Slides | Lecture, p1 | Lecture, p2
    • Lecture: Tranformers for TTS
    • Homework: write inference for pre-trained transformer
  • Week 10: Slides | Lecture | Seminar
    • Lecture: noise reduction
    • Seminar: Streaming STFT and ISTFT
    • Homework: Noise reduction model implementation
  • Week 11: Slides | Lecture
    • Lecture: Acoustic Echo Cancelation (AEC) and Beamforming
  • Week 12: Slides | Lecture | Seminar
    • Lecture: ASR Inference
    • Seminar: Streaming ASR
  • Week 13: Slides | Lecture
    • Lecture: Flow based TTS + Voice Conversion

Contributors & course staff

Current:

  • Pavel Mazaev - spotter
  • Alex Rak - VAD, spotter, biometry
  • Mikhail Andreev - ASR
  • Stepan Kargaltsev - ASR
  • Evgeniia Elistratova - TTS
  • Roman Kail - TTS
  • Vladimir Platonov - TTS
  • Ivan Matvienko - TTS
  • Ravil Khisamov - VQE
  • Anton Parfiriev - AEC

Previous iteration:

  • Andrey Malinin - Course admin, lectures, seminars, homeworks
  • Vladimir Kirichenko - lectures, seminars, homeworks
  • Segey Dukanov - lecures, seminars, homeworks
  • Evgenii Shabalin - lecture and homework on conversion