Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codechef experiments #16

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Codechef experiments #16

wants to merge 8 commits into from

Conversation

where-is-paul
Copy link
Collaborator

This pull request performs a set of tests on CodeChef contest data, along with rust utilities that help parse the data we received. The notebook is withing scripts/notebooks and can hopefully serve as a guideline for sets of tests to perform on contest data for other applications. Currently we do not have the right to release the CodeChef data, but I may make future commits to provide samples of data that the the notebook requires.

let mut players = std::collections::HashMap::new();
let mut avg_perf = compute_metrics_custom(&mut players, &[]);

// Get list of contest names to compare with Codechef's rating system
let paths = std::fs::read_dir("/home/work_space/elommr-data/ratings").unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to keep the organization consistent, could this be moved to "../data/codechef/old_ratings"?
(all binaries currently assume that they're being run from Elo-MMR/multi-skill)

let mut players = std::collections::HashMap::new();
let mut avg_perf = compute_metrics_custom(&mut players, &[]);

// Get list of contest names to compare with Codechef's rating system
let paths = std::fs::read_dir("/home/work_space/elommr-data/ratings").unwrap();
let mut checkpoints = std::collections::HashSet::<String>::new();
for path in paths {
if let Some(contest_name) = path.unwrap().path().file_stem() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention to abort on all failures? If so, we can do

let contest_name = path.unwrap().path().file_stem().unwrap().to_str().unwrap();
checkpoints.insert(contest_name.to_owned());

And just to avoid the path.path(), maybe path could be named to something like file_entry.


// Get list of contest names to compare with Codechef's rating system
let paths = std::fs::read_dir("/home/work_space/elommr-data/ratings").unwrap();
let mut checkpoints = std::collections::HashSet::<String>::new();
for path in paths {
if let Some(contest_name) = path.unwrap().path().file_stem() {
if let Some(string_name) = contest_name.to_os_string().into_string().ok() {
if let Ok(string_name) = contest_name.to_os_string().into_string() {
checkpoints.insert(string_name);
}
}
}

// Run the contest histories and measure
let dir = std::path::PathBuf::from("/home/work_space/elommr-data/elommr-checkpoints/codechef/");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just the usual contest data, shouldn't it be at cache/codechef?

let mut players = std::collections::HashMap::new();
let mut avg_perf = compute_metrics_custom(&mut players, &[]);

// Get list of contest names to compare with Codechef's rating system
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like an explanation here of what the files actually contain.

Maybe for all each of these binary files, we can clarify what the inputs and outputs are. Now would also be a good time if you want to revise the Rust file names.

@@ -71,13 +52,12 @@ fn main() {
}

// Now run the actual rating update
simulate_contest(&mut players, &contest, &*system, mu_noob, sig_noob, index);
simulate_contest(&mut players, &contest, &system, mu_noob, sig_noob, index);

if checkpoints.contains(&contest.name) {
let output_file = dir.join(contest.name.clone() + ".csv");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output could go somewhere like
let output_dir = "..data/codechef/informative_dir_name/"

num_contests: Option<usize>,
}

fn make_checkpoint(players: Vec<SimplePlayer>) -> PlayersByName {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe include a comment that this struct & function are duplicated in another file, so I might remember to extract them someday.

let dir =
std::path::PathBuf::from("/home/work_space/elommr-data/elommr-checkpoints/start-from-516/");
for (index, contest) in dataset.iter().enumerate() {
if index <= contest_cutoff {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this check with .skip(contest_cutoff + 1) on the iterator.

Btw, if you prefer indexing to start after the cutoff, you can exchange the order of .enumerate() and .skip().

@@ -0,0 +1,122 @@
use multi_skill::data_processing::{get_dataset_by_name, read_csv, try_write_slice_to_file};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my comments from codechef_checkpoints.rs apply to this file too, so I won't copy them here.

@EbTech EbTech force-pushed the master branch 3 times, most recently from 8a0b7d7 to e3baa13 Compare March 9, 2023 07:52
@EbTech EbTech force-pushed the master branch 2 times, most recently from 1f64d34 to 82c2a1c Compare October 23, 2023 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants