Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Add STAMP clusters #3

Open
cbravo93 opened this issue Sep 29, 2021 · 4 comments
Open

[FEATURE REQUEST] Add STAMP clusters #3

cbravo93 opened this issue Sep 29, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@cbravo93
Copy link
Member

cbravo93 commented Sep 29, 2021

Is your feature request related to a problem? Please describe.
In i-cisTarget (web), we perform motif clustering with STAMP, which can help to reduce redundancy.

Describe the solution you'd like
Add motif clustering with STAMP on the results as an option. Code can be adapted from: /staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/STAMP/STAMP.py. This should be straight-forward for default databases.

For the already clustered databases, should we implement it too? Which motif to use per cluster then? At the moment we use the STAMP consensus motif for the logo (after clustering the whole collection with a Seurat-like approach and run STAMP in each cluster, this is what makes the metaclusters in the collection) , while we use all motifs in the cluster for scoring with cbust. Does it make sense to use the consensus motif here too? Or it will be rather noisy? Also for these we have already clustered the motifs before the analysis in a sense, not sure if it would add a lot.

@cbravo93 cbravo93 added the enhancement New feature or request label Sep 29, 2021
@cbravo93
Copy link
Member Author

Also, this will require access to the cb files. I would add it as an optional step (not by default), and then people can either download them or we could read them from the web-server. However, some collections are (partially) private (transfac?), are we allowed to share their PWMs?

@SeppeDeWinter
Copy link
Collaborator

SeppeDeWinter commented Sep 29, 2021

Is it possible to run this clustering once, generating a matrix containing the similarities between each of the motifs (similar to what is already done for the clustered motif collection)? Or is the result very dependent on which motifs are included in the analysis (i.e. is the similarity measurement relative to which motifs are included)?

[EDIT] Then we could simply read this matrix.

@cbravo93
Copy link
Member Author

- Is it possible to run this clustering once, generating a matrix containing the similarities between each of the motifs (similar to what is already done for the clustered motif collection)?
This would be the clustered collection indeed, we have a df with motif - clusterID, we could use it too. This we can already provide now, and we can make it default (since it does not require input data or calculations).
- Or is the result very dependent on which motifs are included in the analysis (i.e. is the similarity measurement relative to which motifs are included)?
The motif clustering will be different depending on which motifs you use as input. For example, if you have all AP-1 motifs it will look for subclusters; if you have AP-1 + other it may just group all the AP-1 into 1 cluster.

I like the idea of using the clusters from the clustered motif collections though, more elegant than running it per result file if we think about the AP-1 example; faster and even easier to add :).

@SeppeDeWinter
Copy link
Collaborator

SeppeDeWinter commented Oct 8, 2021

Another Idea:

If we do this we could also color code motifs based on other measurements, for example cluster based on the Jaccard index of the target regions. i.e. visualise which motifs are present on overlapping sets of regions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants