Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get environmental metadata for HOT1 from HOTGOG #2

Open
raissameyer opened this issue Aug 6, 2024 · 2 comments
Open

Get environmental metadata for HOT1 from HOTGOG #2

raissameyer opened this issue Aug 6, 2024 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@raissameyer
Copy link
Contributor

raissameyer commented Aug 6, 2024

Following data trail of publication

Data Publication: Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176

It notes to go to https://hahana.soest.hawaii.edu/hot/hot-dogs/index.html and use the bottle ids to identify the data records we need.

image

Let's check out the tutorials
https://hahana.soest.hawaii.edu/hot/hot-dogs/documentation/tutorial.html
image

Looks like example 1 should fit
https://hahana.soest.hawaii.edu/hot/hot-dogs/documentation/example1.html
image

Let's look at the Data Extraction/bottle module they mention
https://hahana.soest.hawaii.edu/hot/hot-dogs/bextraction.html
image

Try for one date
image

And load into R

HOTDOG <- read.csv("data/HOT1_HOTDOG.txt", sep=",")

This is what it looks like: second row is units
image

If I don't want to go through this for every single date of every sample, I will have to scale up and just take a longer timeframe

  • 2003-01-01 - 2010-01-31

Note

To safe the html output I am getting, I will (in the Safari menu) click file -> save as -> (select destination) save
Then open that file with excel, remove the header lines, and save as -> csv, then import that to R

image
and submit

@raissameyer raissameyer added the documentation Improvements or additions to documentation label Aug 6, 2024
@raissameyer raissameyer self-assigned this Aug 6, 2024
@raissameyer
Copy link
Contributor Author

raissameyer commented Aug 6, 2024

get into R and combine with df

######## Let's do the real thing. 
# Load the necessary libraries
library(dplyr)

hotdog20032010_data <- read.csv("data/hotdogs20032010.csv", sep=",")
dim(hotdog20032010_data)
# [1] 9913   16

# Rename the column in hotdog20032010_data
colnames(hotdog20032010_data)[colnames(hotdog20032010_data) == "botid"] <- "bottle_id_pub"

# Add suffix '_dog' to all columns except the key column 'bottle_id_pub'
colnames(hotdog20032010_data)[colnames(hotdog20032010_data) != "bottle_id_pub"] <- paste0(colnames(hotdog20032010_data)[colnames(hotdog20032010_data) != "bottle_id_pub"], "_dog")

# Ensure the key columns are of the same type
HOT1_pub_merged$bottle_id_pub <- as.character(HOT1_pub_merged$bottle_id_pub)
hotdog20032010_data$bottle_id_pub <- as.character(hotdog20032010_data$bottle_id_pub)

# Perform a left join to keep all rows from HOT1_pub_merged
# (doing left join because the merge function removed all the data in the HOT1_pub_merged
# dataframe which did not have a corresponding row in hotdog_data. I don't want that. )
HOT1_pubNdog0310_merged <- left_join(HOT1_pub_merged, hotdog20032010_data, by = "bottle_id_pub")

# View the colnames of the new dataframe
colnames(HOT1_pubNdog0310_merged)

# View the first few rows of the new dataframe
head(HOT1_pubNdog0310_merged)

dim(HOT1_pubNdog0310_merged)
# 33 72

#### for some bottles, there is no information (including both 2009 samples)
# checked, and it's also not in the HOTDOGs file, so not a merger error, more data availability

HOT1_pubNdog0310_merged$date_dog
HOT1_pubNdog0310_merged$bottle_id_pub
HOT1_pubNdog0310_merged$collection_date

Caution

for some bottles, there is no information (including both 2009 samples) checked, and it's also not in the HOTDOGs file, so not a merger error, more data availability

see:

> HOT1_pubNdog0310_merged$date_dog
 [1] " 022404" " 022404" " 042004" " 051804" " 061504" " 081504" " 081504" " 092804" " 103104" " 112704" NA        " 122004" " 122004" " 011703"
[15] " 022503" " 032803" " 042303" " 052003" " 061903" " 061903" " 071903" " 082003" " 110903" NA        " 122003" " 122003" " 101403" NA       
[29] " 012104" " 012104" " 031904" NA        NA       
> HOT1_pubNdog0310_merged$bottle_id_pub
 [1] "1560200314" "1560200308" "1580200313" "1590200314" "1600200414" "1620200414" "1620200408" "1630200414" "1640201117" "1650200419"
[11] "1650200409" "1660200514" "1660200508" "1440200914" "1450200318" "1460200314" "1470200314" "1480200316" "1490200314" "1490200308"
[21] "1500200314" "1510200316" "1530200314" "1530200308" "1540201018" "1540201010" "1520200320" "1520200308" "1550200314" "1550200308"
[31] "1570200323" "2140200308" "2160200304"
> HOT1_pubNdog0310_merged$collection_date
 [1] "2004-02-24" "2004-02-24" "2004-04-20" "2004-05-18" "2004-06-15" "2004-08-15" "2004-08-15" "2004-09-28" "2004-10-10" "2004-11-10"
[11] "2004-11-10" "2004-12-14" "2004-12-14" "2003-01-17" "2003-02-25" "2003-03-28" "2003-04-23" "2003-05-20" "2003-06-19" "2003-06-19"
[21] "2003-07-19" "2003-08-20" "2003-11-09" "2003-11-09" "2003-12-20" "2003-12-20" "2003-10-14" "2003-10-14" "2004-01-21" "2004-01-21"
[31] "2004-03-19" "2009-08-19" "2009-11-04"

@raissameyer
Copy link
Contributor Author

raissameyer commented Aug 6, 2024

I noticed that for some of the samples, where we got information from HOTDOG, the date_dog and the collection_date_pub do not match and the date in date_dog is significantly later. So perhaps, I have to increase the date range to capture those. Example: collection_date_pub = 2004-11-10 WHILE date_dog 112704. Retry with that.

The ones I did not get info for are

I did :) got two more.

sample S0525, cruise HOT166, 5 m depth 1660200514	2004-12-14
sample S0526, cruise HOT166, 100 m depth 1660200508	2004-12-14

These are still missing

sample S0523, cruise HOT165, 100 m depth 1650200409	2004-11-10
sample S0553, cruise HOT153, 100 m depth 1530200308	2003-11-09
sample S0559, cruise HOT152, 100 m depth 1520200308	2003-10-14
sample S0627, cruise HOT214, 5 m depth 2140200308	2009-08-19
sample S0628, cruise HOT216, 100 m depth 2160200304	2009-11-04

I also spoke with Sarah and she mentioned that she also could not find metadata for the others when she worked with HOT data before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant