The dataset used in this project was taken from the NIH’s Integrated Human Genome Project. Due to limited computational resources, a randomly sampled subset of 1000 samples was taken from the original dataset for our analysis. Our dataset contains samples from 77 pre-diabetic patients. A patient is considered to be pre-diabetic when their fasting blood sugar level is higher than normal but is not high enough to be considered diabetic (this would mean a fasting blood sugar level of 100-125 mg/dL) (Maruthur, 2023). Samples were collected from the patients at two body sites: the feces and the nasal cavity. The dataset contained 465 fecal samples and 535 nasal samples. Most patients had samples taken during more than one timepoint, hence why there are many more samples than there are patients. The microbial DNA was extracted from these samples and underwent 16S amplicon sequencing (Stanford Medicine). As our dataset only contained data from pre-diabetic patients, with no set of control, healthy patients, the focus of our analysis was to compare the fecal and nasal microbiomes of these pre-diabetic patients.