Skip to content

Runtime repository for Goodnight Moon, Hello Early Literacy Screening Challenge

License

Notifications You must be signed in to change notification settings

pnkalan/literacy-screening-runtime

Β 
Β 

Repository files navigation

Goodnight Moon, Hello Early Literacy Screening

Python 3.12 Goodnight Moon, Hello Early Literacy Screening

Welcome to the runtime repository for the Goodnight Moon, Hello Early Literacy Screening competition on DrivenData! This repository contains a few things to help you create your code submission for this code execution competition:

  1. Runtime environment specification (runtime/) β€” the definition of the environment in which your code will run.
  2. Example submissions (examples/) β€” simple demonstration solutions that will run successfully in the code execution runtime and output a valid submission.
    • Random probabilities (examples/random): a dummy submission that generates a random prediction for each audio file
    • Whisper transcription (examples/transcription): a baseline submission that shows how to load a model asset as part of your submission. This submission uses OpenAI's Whisper model to transcribe each audio clip and compares the transcription to the expected text. It requires that you download the model weights beforehand and include them in the assets directory. There's no internet access in the runtime container, so any pretrained model weights must be included as part of the submission.

You can use this repository to:

πŸ’‘ Get started: The example submissions provide a basic functional solution. They probably won't win you the competition, but you can use them as a guide for bringing in your own work and generating a real submission.

πŸ”§ Test your submission: Test your submission using a locally running version of the competition runtime to discover errors before submitting to the competition website.

πŸ“¦ Request new packages in the official runtime: Since your submission will not have general access to the internet, all dependencies must be pre-installed. If you want to use a package that is not in the runtime environment, make a pull request to this repository.

Changes to the repository are documented in CHANGELOG.md.



Quickstart

This quickstart guide will show you how to get the provided example solution running end-to-end. Once you get there, it's off to the races!

Prerequisites

When you make a submission on the DrivenData competition site, we run your submission inside a Docker container, a virtual operating system that allows for a consistent software environment across machines. The best way to make sure your submission to the site will run is to first run it successfully in the container on your local machine. For that, you'll need:

  • A clone of this repository
  • Docker
  • At least 8 GB of free space for the Docker image
  • GNU make (optional, but useful for running the commands in the Makefile)

Additional requirements to run with GPU:

Setting up the data directory

In the official code execution platform, code_execution/data will contain the test set audio files, test_metadata.csv, and submission_format.csv.

To test your submission locally, you should use the smoke test data from the data download page. Download smoke.tar.gz and then run tar xzvf smoke.tar.gz --strip-components=1 -C data/. This will extract the files directly into data/ without nesting them in subdirectories. Your local data directory should look like:

data
β”œβ”€β”€ bfaiol.wav
β”œβ”€β”€ czfqjg.wav
β”œβ”€β”€ fprljz.wav
β”œβ”€β”€ hgxrel.wav
β”œβ”€β”€ htfbnp.wav
β”œβ”€β”€ idjpne.wav
β”œβ”€β”€ ktvyww.wav
β”œβ”€β”€ ltbona.wav
β”œβ”€β”€ submission_format.csv
β”œβ”€β”€ test_labels.csv
└── test_metadata.csv

Now you're ready to run your submission against this data!

Keep in mind, the smoke test data contains clips from the training set. That's why we provide the labels too. Of course, the real test set labels won't be available in the runtime container πŸ˜‰

Running make commands

To test out the full execution pipeline, make sure Docker is running and then run the following commands in the terminal:

  1. make pull pulls the latest official Docker image from the container registry. You'll need an internet connection for this.
  2. make pack-example packages a code submission with the main.py contained in examples/random/ and saves it as submission/submission.zip.
  3. make test-submission will do a test run of your submission, simulating what happens during actual code execution. This command runs the Docker container with the requisite host directories mounted, and executes main.py to produce a submission file containing your predictions.
make pull
make pack-example
make test-submission

πŸŽ‰ Congratulations! You've just completed your first test run for the Goodnight Moon, Hello Early Literacy Screening Challenge. If everything worked as expected, you should see that a new submission file has been generated.

If you were ready to make a real submission to the competition, you would upload the submission.zip file from step 2 above to the competition submission page.

To run the Whisper transcription example instead, replace the second command with EXAMPLE=transcription make pack-example. Just be sure to download the Whisper model first and include it in the assets directory. There's no internet access in the runtime container, so any pretrained model weights must be included as part of the submission.

Testing your submission locally

As you develop your own submission, you'll need to know a little bit more about how your submission will be unpacked for running inference. This section contains more complete documentation for developing and testing your own submission.

Code submission format

Your final submission should be a zip archive named with the extension .zip (for example, submission.zip). The root level of the submission.zip file must contain a main.py which performs inference on the test audio clips and writes the predictions to a file named submission.csv in the same directory as main.py. Check out the main.py scripts in the example submissions.

Running your submission locally

This section provides instructions on how to run the your submission in the code execution container from your local machine. To simplify the steps, key processes have been defined in the Makefile. Commands from the Makefile are then run with make {command_name}. The basic steps are:

make pull
make pack-submission
make test-submission

Run make help for more information about the available commands as well as information on the official and built images that are available locally.

Here's the process in a bit more detail:

  1. First, make sure you have set up the prerequisites.

  2. Download the official competition Docker image:

    make pull

Note

If you have built a local version of the runtime image with make build, that image will take precedence over the pulled image when using any make commands that run a container. You can explicitly use the pulled image by setting the SUBMISSION_IMAGE shell/environment variable to the pulled image or by deleting all locally built images.

  1. Save all of your submission files, including the required main.py script, in the submission_src folder of the runtime repository. Make sure any needed model weights and other assets are saved in submission_src as well.

  2. Create a submission/submission.zip file containing your code and model assets:

    make pack-submission
    #> mkdir -p submission/
    #> cd submission_src; zip -r ../submission/submission.zip ./*
    #>   adding: main.py (deflated 73%)
  3. Launch an instance of the competition Docker image, and run the same inference process that will take place in the official runtime:

    make test-submission

This runs the container entrypoint script. First, it unzips submission/submission.zip into /code_execution/ in the container. Then, it runs your submitted main.py. In the local testing setting, the final submission is saved out to the submission/ folder on your local machine.

Note

Remember that /code_execution/data is just a mounted version of what you have saved locally in data so you will just be using the training files for local testing. In the official code execution platform, /code_execution/data will contain the actual test data.

When you run make test-submission the logs will be printed to the terminal and written out to submission/log.txt. If you run into errors, use the container logs written to log.txt to determine what changes you need to make for your code to execute successfully.

Smoke tests

When submitting on the platform, you will have the ability to submit "smoke tests." Smoke tests run on a reduced version of the train set data in order to run and debug issues more quickly. They will not be considered for prize evaluation and are intended to let you test your code for correctness. You should test your code locally as thorougly as possible before submitting your code for smoke tests or for full evaluation.

Updating runtime packages

If you want to use a package that is not in the environment, you are welcome to make a pull request to this repository. Remember, your submission will only have access to packages in this runtime repository. If you're new to the GitHub contribution workflow, check out this guide by GitHub.

The runtime manages dependencies using Pixi. Here is a good tutorial to get started with Pixi. The official runtime uses Python 3.12.7.

  1. Fork this repository.

  2. Install pixi. See here for installation options.

  3. Edit the runtime/pixi.toml file to add your new packages. We recommend starting without a specific pinned version, and then pinning to the version in the resolved pixi.lock file that is generated.

    • Conda-installed packages go in the dependencies section. These install from the conda-forge channel. Installing packages with conda is strongly preferred. Packages should only be installed using pip if they are not available in a conda channel.
    • Pip-installed packages go in the pypi-dependencies section.
    • GPU-specific dependencies go in the features.cuda.dependencies section, but these should be uncommon.
  4. With Docker open and running, run make update-lockfile. This will generate an updated runtime/pixi.lock from runtime/pixi.toml within a Docker container.

  5. Locally test that the Docker image builds successfully for both the CPU and GPU environment:

    CPU_OR_GPU=cpu make build
    CPU_OR_GPU=gpu make build
  6. Commit the changes to your forked repository. Ensure that your branch includes updated versions of both runtime/pixi.toml and runtime/pixi.lock.

  7. Open a pull request from your branch to the main branch of this repository. Navigate to the Pull requests tab in this repository, and click the "New pull request" button. For more detailed instructions, check out GitHub's help page.

  8. Once you open the pull request, we will use Github Actions to build the Docker images with your changes and run the tests in runtime/tests. For security reasons, administrators may need to approve the workflow run before it happens. Once it starts, the process can take up to 30 minutes, and may take longer if your build is queued behind others. You will see a section on the pull request page that shows the status of the tests and links to the logs ("Details"):

    Example appearance of Github Actions

  9. You may be asked to submit revisions to your pull request if the tests fail or if a DrivenData staff member has feedback. Pull requests won't be merged until all tests pass and the team has reviewed and approved the changes.

Make commands

A Makefile with several helpful shell recipes is included in the repository. The runtime documentation above uses it extensively. Running make by itself in your shell will list relevant Docker images and provide you the following list of available commands:

Available commands:

build               Builds the container locally
clean               Delete temporary Python cache and bytecode files
format              Format code with ruff
interact-container  Open an interactive bash shell within the running container (with network access)
pack-example        Creates a submission/submission.zip file from the source code in examples
pack-submission     Creates a submission/submission.zip file from the source code in submission_src
pull                Pulls the official container from Azure Container Registry
test-container      Ensures that your locally built image can import all the Python packages successfully when it runs
test-submission     Runs container using code from `submission/submission.zip` and data from `/code_execution/data/`
update-lockfile     Updates runtime environment lockfile using Docker

About

Runtime repository for Goodnight Moon, Hello Early Literacy Screening Challenge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Makefile 73.8%
  • Dockerfile 10.3%
  • Shell 9.9%
  • Python 6.0%