GitHub - UWM-Libraries/collections2data: code repository for workflows developed for a speech-to-text project for archival audio

This directory contains three primary collections of repeatable code:

Python script submitted through Slurm to send audio files to Azure Cognitive Services Speech to Text service ( https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/#features ).
- Code includes Python streaming ( large audio files must be streamed ) and disabled profanity checks.

prepare_conda_explorer.R
- Merges contents of transcripts folder and metadata .xls files
- Generates updated corpus file
batch_topic_modeling.R
- Generates pre-trained LDA topic models and object to topic associations in a loop
topic_modeling.R
- Example topic model analysis of transcripts
corpora_explorer_demo
- Example corporaexplorer demo used in 1.6.21 demo with instructors

Two simple web apps that provide a content warning wrapper prior to accessing a sipmlified CorporaExplorer user interface to our corpus.

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
azure_transcripts		azure_transcripts
contentwarning		contentwarning
corporaexplorer		corporaexplorer
rshiny_dashboard		rshiny_dashboard
rshiny_prep_and_demo		rshiny_prep_and_demo
README.md		README.md