Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various feature requests #19

Open
jielab opened this issue Jan 6, 2021 · 12 comments
Open

Various feature requests #19

jielab opened this issue Jan 6, 2021 · 12 comments

Comments

@jielab
Copy link

jielab commented Jan 6, 2021

Hi, there:

I am trying to run "tabix my.GWAS.gz" file. my.GWAS.gz file is tab delimited and it has columns such as CHR POS SNP REF ALT BETA SE P N. However, I got the error messages of "[E::get_intv] failed to parse TBX_GENERIC, was wrong -p [type] used?".

Also, I am requesting two features:

  1. display more columns (or all columns) of the original file in the Top Loci table. Right now, it only lists rsID, CHR:POS, -logP. It would be good to display fields in my original GWAS file such as SNP, P, REF, ALT, BETA. Andy suggested to use the SHA256 hash to implement this. That sounds great.

  2. display multiple GWAS Manhattan plots. For example, I run 3 BMI GWAS on the same data: 1. for males, 2. for females, 3, both sex. I would love to show these 3 Manhattan plots horizontally, and also the same top loci of the 3 GWAS on the same page (suc as for the FTO locus).

Thank you very much for your consideration.

Best regards,
Jie

@abought
Copy link
Member

abought commented Jan 12, 2021

A few notes from the emails- though it won't be worked on immediately, I'd like to jot down notes while context remains in my memory.

Requests broken down by type:

  • The first error/ thread title is an error with the tabix utility, which is not part of locuszoom-*. See tabix user manual for details. It sounds like you need to tell it where to find the chromosome (--sequence) and position (--begin and --end, usually the same thing in a gwas file). Possibly other options depending on your data, but we don't provide user support for third-party tools.
  • To verify the uploaded file is a specific one expected: the request is to display SHA256 of the uploaded file on the summary page
  • We'd prefer not to show raw pvalues (instead of -log10p) due to numerical underflow issues. However, showing more columns would be nice. We should start by enforcing display of ref and alt alleles, and eventually extend to allowing the parser to support other user-provided data.
  • I'd love to provide a way to compare studies, esp summary stats in the region plot view. This will require rethinking our search and metadata features, to help users discover relevant datasets that can be added.
    • LocalZoom provides a comparison feature, because all the datasets are on the user's hard drive. In a website with a mix of personal and public data, finding really good comparisons takes a bit more work.

@jielab
Copy link
Author

jielab commented Jan 15, 2021

Dear Andy:

Thank you very much!

I now made tabix work by using tabix -f -s 1 -b 2 -e 2 MY.gwas.gz
Previously, I used both -p bed and -b 2 -e 2, which are not compatible, because a bed format would mean the end position is the 3rd column and that could not be changed by -e 2.

One issue here is that I do need to a "#" in front of the first line in order for tabix to run. So, the first row of the first column is "#CHR" instead of "CHR". This creates a problem for other software. As you know, many software would not take a variable who value starting with a "#". Don't know if there is a good walkaround for this.

Thank you & best regards,
jie

@abought
Copy link
Member

abought commented Jan 15, 2021 via email

@jielab
Copy link
Author

jielab commented Jan 15, 2021

Dear Andy:

Yes, I do know of the -S option to skip first N lines. Then I thought that if I skip the first header line, locuszoom would not be able to read the header line. It turns out that I am wrong. I could now use tabix -f -S 1 -s 1 -b 2 -e 2 MY.gwas.gz successfully without the need to add a # to the header row, and locuszoom has no problem to read in the header row. This is really GREAT!

Now I am able to use LocalZoom to view my local MY.gwas.gz files. Please see the screenshot below.
11

I do have a few minor suggestions/feedback:

  1. I get an error "could not parse specified range", if i specify a range too big, such as 1:1000000-10000000. I thought that localzoom could present a Manhattan plot first, just like the uploading version of my.locuszoom.org. It would be nice to have Manhattan plot and a "Top Loci" table.

  2. when I click the "LD Population EUR" button, the popup window will not display after I made a selection.

  3. Please see the screenshot below. My input GWAS file actually has 4 columns regarding alleles: REF ALT A1 A2, where REF/ALT is based on reference human genome, while A1 is the effect allele. In most cases, A1 is the same as ALT, but not always. In Locuszoom data upload page, the first window "variant from columns" uses the term "Ref allele" and "Alt allele", but the second window (shown below) uses the term "effect allele". And I could only choose "Ref" or "Alt", but not A1 or A2. So, this is a bit confusion. Should I simply ignore the REF and ALT columns in my GWAS file, but only use A1 and A2 in this case?

Thank you & best regards,
Jie

@abought
Copy link
Member

abought commented Jan 15, 2021

As noted in the LocalZoom instructions,

"This service is designed to efficiently fetch only the data needed for the plot region of interest. Therefore, it cannot generate summary views that would require processing the entire file (eg Manhattan plots). "

Rather than maintain two different software codebases, advanced "summarize this file" features (like Manhattan plots and top loci) are explicitly provided only in my.locuszoom.org. LocalZoom is a viewer tool but is not meant a replacement for an analysis pipeline.

Likewise, the max region size of LocalZoom currently caps out at ~1MB. We may increase this to 2MB in the future, but no-upload client side localzoom is not intended to be a full multiscale genome viewer.

@abought
Copy link
Member

abought commented Jan 15, 2021

Note to self: we do need to clarify the terminology on the "allele frequency" section; thanks for catching that!

Essentially, conventions for specifying allele frequency vary widely. Some files give the allele frequency for a specific allele of interest (eg effect allele, "major/minor", etc), which may not be the same as the variant specified in the "alt" column. (Another common convention is to specify counts instead of frequencies)

Instead of assuming that AF = "alt" frequency, we allow people to use any of ~3 different conventions, and tell the parser how to align their data with a consistent harmonized reference. Our hope is to provide advanced tools for comparing your results to other public studies in the future, but doing that requires some rather fiddly and sometimes confusing UI to ensure that all uploaded files end up harmonized so that a given column means the same thing across files.

@jielab
Copy link
Author

jielab commented Jan 16, 2021

Dear Andy:

Thank you very much for clarification!

Maybe my.locuszoom.org could be designed similarly as PheWeb, so that users could get it set up in their local server.

For example, a group put all thousands of UK Biobank GWAS results at fastgwa.info. As shown on this link http://fastgwa.info/ukbimp/pheno/20015, there is also a Manhattan Plot followed by a "Top Loci" table. Users can also click on each locus of the Top Loci table. But of course users will get a phewas plot instead of a Locuszoom plot, since "The online tool was developed based on the source code modified from PheWeb" (http://fastgwa.info/ukbimp/about).

Since PheWeb was also developed at UMICH, you guys might know each other very well. It would be very nice to see these two tools working together. For the above "Top Loci" table that i mentioned, it would be really nice to have a link for a phewas plot, and another link for a locuszoom plot.

What do you think?

Best regards,
Jie

@abought abought changed the title failed to parse TBX_GENERIC, was wrong -p [type] used? Various feature requests Jan 16, 2021
@abought
Copy link
Member

abought commented Jan 16, 2021

We are indeed familiar with PheWeb- in fact, the code to prepare the manhattan and QQ plots is shared between the two projects.

However, when we built my.locuszoom.org, we consciously chose not to try to duplicate the core purpose or focus between the two services. PheWeb is aimed at presenting many different GWAS studies together in one place, whereas my.locuszoom.org is focused on letting users explore individual studies. By encouraging "bulk import" users to try PheWeb instead, we are able to provide a free and easy to use upload-your-own service to a large community: some PheWebs may involve terabytes of starting data and days of server-side processing, and I'm not sure that our research group could afford to host every pheweb for all genetics researchers in the world!

If we get enough high-quality public datasets with good metadata, I could see letting users request a phewas out of existing studies in the future. We aren't currently there yet, so we try to provide the same high quality annotations per study, but not generate a phewas from everything on the site.

If you really want to host your own my.locuszoom.org instance, code and notes are (mostly) in this repository where we are discussing and we always welcome contributions to help make deployment more streamlined. But I would absolutely start by defining the goals, as you might be able to get the customizations you want by creating a more focused tool with just the plotting code (LocusZoom.js) by itself.

@jielab
Copy link
Author

jielab commented Jan 17, 2021

Dear Andy:

Thank you very much again! I will not try to customize locusZoom, because you guys are the experts and I just want to be a good user :- ). I will try to come up with bug reporting and wise suggestions, while not wasting too much of your precious time to read my messages :- )

One minor feature if I could request: can the axis labels and the gene names on the Manhattan Plot in bold font and slightly larger size, just like that in the LocusZoom Plot? Also, it would be great to have a "save as PNG" option for the Manhattan Plot, just like the LocusZoom plot.

Your help is greatly appreciated.

Best regards,
Jie

2

1

@abought
Copy link
Member

abought commented Feb 6, 2021

This ticket has a lot of different things to unpack, and I'll try to distill into a more focused checklist in the near future.

Per initial discussions from the email list, several quality-of-life improvements have been shipped in the newest release:

  1. The "top loci" table will automatically show ref and alt alleles when such information is available.
  2. Sorting the top loci table by marker will now correctly sort by chromosome and position, instead of lexicographic
  3. To help users verify that they uploaded the correct file, a new "checksum" button has been added to the manhattan plot page (visible only to the person who uploaded the study). You can use this to view the SHA256 for what was originally uploaded. This is a hash value that distills everything into the file into a single short string that can be calculated locally.

@jielab
Copy link
Author

jielab commented Feb 7, 2021

thank you very much, Andy!

Best regadrs,
jie

@abought
Copy link
Member

abought commented Feb 17, 2021

I've gone through this ticket and tried to triage various actionable suggestions (which may not all get fixed or all at once). Checklist of major remaining items:

  • Easier to read font size/face for manhattan plot axis labels
  • Optionally show more columns for top loci table on manhattan plot / summary page
  • New "compare studies" view (requires many internal enhancements)
  • Clarify wording around the "effect allele" checkbox in the allele frequency "column picker" UI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants