Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Data (includes tutorial, example) #28

Open
yvanlebras opened this issue Jun 25, 2016 · 15 comments
Open

Training Data (includes tutorial, example) #28

yvanlebras opened this issue Jun 25, 2016 · 15 comments

Comments

@yvanlebras
Copy link
Collaborator

yvanlebras commented Jun 25, 2016

Contributors: @jennaj @griffinp @kpoterlo @yvanlebras @BoughAida @ssander5 @devikaatgit @cschu @tnabtaf @kmurat1

This issue is dedicated to Training Data hackathon group. The idea is to gather sample data who can be used as example, tutorial, .... on Galaxy instances.

Please, don't hesitate to create a comment and add data links and description ;)

Example:

RADseq technology

Genetic map

-parents

female http://546969.197.189.163/datasets/bbbfa414ae315caf/display/
male http://546969.197.189.163/datasets/4467809fea030689/display/

-progeny

progeny 1 http://546969.197.189.163/datasets/ddf83cf807e6e774/display/
progeny 2 http://546969.197.189.163/datasets/30bf7a4ced2335cc/display/
....

Population genomics

barcode http://546969.197.189.163/datasets/6df0b7b066ddc4c9/display/
population map http://546969.197.189.163/datasets/d796ca8e1687a54b/display/
reference genome http://546969.197.189.163/datasets/06cf32e9aa8aad75/display/
FastQ file http://546969.197.189.163/datasets/34c3e3c01e1a37f4/display/

If data are not reachable through the web (personal data on your laptop, ...) , the best way is to upload the data on a https://usegalaxy.org/ Galaxy history

The idea can be to meet after having gathering data and discuss about which one are good / duplicate / too big before proposing actions like, data directly shareable, need to be reduced, ....

@ghost
Copy link

ghost commented Jun 25, 2016

follow

2 similar comments
@MoHeydarian
Copy link

follow

@kkamieniecka
Copy link

follow

@yvanlebras
Copy link
Collaborator Author

yvanlebras commented Jun 25, 2016

RADseq technology

Genetic map

Related usegalaxy.org history

Population genomics

Related usegalaxy.org history

There is a reference genome on the shared data so the analysis can be made through the denovo_map as the ref_map pipelines

Assemble read pairs

Related usegalaxy.org history

@devikaatgit
Copy link

How do I add the sample datasets that I have with me?

@yvanlebras
Copy link
Collaborator Author

@devikaatgit the best way is to upload the data on a https://usegalaxy.org/ Galaxy history. Then, share your history publicly. If you don't have an account, don't hesitate to create one, it's free ;)

@frederikcoppens
Copy link
Collaborator

frederikcoppens commented Jun 25, 2016

@devikaatgit If you need help, let us know
we can add it to our cloud instance too, this allows to put some structure in the data libraries (and share them later)

@Eduardo-Alves
Copy link
Collaborator

Tutorials for RNA-seq, Assembly and Variant calling using small publicly available dataasetsGalaxy_Walkthrough.pdf
Galaxy-based RNA-Seq Intro.pptx
Galaxy Variant Tutorial Mar16.pptx

@yvanlebras
Copy link
Collaborator Author

yvanlebras commented Jun 25, 2016

Thank you very much @Eduardo-Alves !

In the meantime, not sure I can use your material because of:

@yvanlebras
Copy link
Collaborator Author

@frederikcoppens Did you think there is a way to create Shared libraries for our group on the Galaxy main server ?

@frederikcoppens
Copy link
Collaborator

@yvanlebras That's one of the possibilities and my personal favorite, needs to be discussed

@BoughAida
Copy link

Bacterial RNA-seq data available at the following url http://54.158.166.52/u/aida/h/datahackathonab

@ssander5
Copy link
Collaborator

ssander5 commented Jun 26, 2016

I posted kind of "advanced training sets" on the #30 post. They include larger data sets for RNAseq and RADseq that are all from publically accessible data, that highlight typical issues in data analysis, and use published analyses. Might be good as a kind of second pass training set for each of these analyses, as they may not be as straight forward as a "toy set" since they are real data with real issues.

I did not upload the data yet, because I have to transfer it from the cluster to my computer and back up to galaxy (unless anyone knows a faster means of doing this?).

@devikaatgit
Copy link

2 condition datasets with (single replicates only) for RNA-seq of bacteria can be accesses at https://usegalaxy.org/u/devikasub/h/bacterial-rna-seq-2-condition-single-replicate-datasets

@jennaj
Copy link
Collaborator

jennaj commented Jul 11, 2016

thanks everyone!

What is next? We need a list ticket for to-do items. Can reference this ticket and others. Would someone like to draft one or should I?

Shared Lib on Main > assign to me. Moving the data above into that, organized and labeled, is also me (in collaboration with authors above and in master ticket). Use the hack mailing list to synch up for this and related?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants