-
Notifications
You must be signed in to change notification settings - Fork 42
Google Summer of Code Ideas 2018
Google Summer of Code is a summer program that offers students stipends to develop software for open source projects.
Get familar with the UCSC Xena and it's codebase. Next, either develop your project proposal based on one of the ideas or come up with your own. If you are a prospective student interested in doing your Google Summer of Code (GSoC) project with us, please contact us as soon as possible. We will do our best to assist and guide you in the formulation of your GSoC project proposal.
If you have any questions about any of the ideas, please join our Google Group or send us a private email.
Refactor chart view An improvement project
Transcript View enhancement An improvement project
BRCAness View A new functionality project
Implement more extensive Google Analytics coverage An improvement project
Web interface to load data into local Xena hub on laptop An improvement project
Kaplan-Meier view enhancements An improvement project
GraphQL API for Xena server A new functionality project
Adapt UCSC Xena Browser as an Electron desktop application A new functionality project
Matrix data clustering in spreadsheet view A new functionality project
We have two main visualizations: our primary Visual Spreadsheet and the charts view which draws bar charts, box plots and scatter plots. Users select columns of data in the Visual Spreadsheet, which become options for the x- and y-axis in the chart view.
Currently we use highcharts.js and bootstrap css for our chart view, which does not follow the architecture of the rest of our site. Maintaining these differences takes time and energy from our team.
Refactor this code to instead use a react-based library and the Material Design CSS. The functionality will remain exactly the same.
An additional goal, if there is time, would be to add a violin plot to our current chart. This would likely require the incorporation of another library which would need to be veted. This github issue has some leads on libraries.
For GSoC 2017, Akhil Kamath coded the Transcript View. This new visualization has been very well received and users are now requesting enhancements. Specifically, users want to be able to define their own subgroups such as patients with a certain mutation vs patients without this mutation. They also want to see protein domain annotations on the transcripts.
- Allow users to define their own subgroups. This will most likely require integration with the Visual Spreadsheet and our filter capability.
- Display protein domain annotations. This will require loading the data, most likely from Uniprot into our Xena Hub and then developing an API to call this data and display it.
- Calculate statistical significance between two groups for each transcript, display the stats results.
We have been developing a new visualization to help clinicians determine what is high BRCAness. Here is some education about BRCAness and some education about our visualization.
Implement this visualization as a new view, similar to the Transcript View. Mock ups
While we have some general knowledge of how users move through our application, we need some more specifics. In particular, we need to determine which features and datasets users typically use to know where to focus our future development efforts and to give more detailed reports to grant agencies.
Implement more extensive Google Analytics coverage, including more functionality and determining which datasets are being viewed.
The way users visualize their own data in Xena is to download a Hub on to their laptop, load their data into the Hub, and then use a web browser pointed at their laptop hub to visualize it. Currently users get a bit confused about loading the data into the Hub on their laptop since they are visualizing using a browser. Additionally we do not have much error handling on loading files.
Move the data loading from the java hub into the browser and add in error handling.
Logo on our Hub
The Kaplan-Meier view allows the user to compare the survival rates of different groups of patients. This helps identify genomic or phenotypic traits that affect survival.
We currently support viewing a single KM plot at a time, via the column control drop-down. Some users need to view KM plots for a set of genes, probes, or transcripts. It is tedious work to view them one at a time.
Additionally, there are several different measures of survival. We currently show overall survival, but the user may instead need recurrence-free survival, progression-free survival, etc..
At last, showing CIs (confident intervals) for the KM lines, such as in https://i.stack.imgur.com/RyDau.png .
Extend the Kaplan-Meier view to support generating multiple plots at once. Multiple plots will be available when the user creates a column with a gene set, or a gene with multiple probes or transcripts. Whether the plots might be arranged in a grid, or placed on several pages, with "forward" and "back" controls, will depend on user testing.
The view should also support PDF download with the plots arranged on multiple pages.
Extend the Kaplan-Meier view to allow selection of different survival metrics, when they are available.
Query languages are far more powerful than REST APIs for data-heavy applications. The UCSC Xena Server was designed around a singular query language API, based on SQL. Since then, other projects have driven the development of similar query language API solutions, including Falcor and GraphQL. A lateral move from the UCSC Xena query language to GraphQL would allow easier development, access, and collaboration, by leveraging the large support base of GraphQL.
Develop a GraphQL endpoint for UCSC Xena Server. Assess the existing Xena queries and data types to develop GraphQL schemas that match the functionality and allow future growth. Of particular interest is handling restriction (predicates), column-oriented responses, and aggregations.
Assess GraphQL solutions for clojure, such as Lacinia, and implement the new endpoint in our clojure code base.
UCSC Xena Browser is a web application, running in a browser. This limits the functionality of the application, for example, making it harder to perform CPU-intensive analysis, and restricting access to the file system.
The electronjs project has made it possible to build cross-platform desktop applications using web technologies. Adapting UCSC Xena Browser to electron could provide a more powerful, and more secure platform for functional genomic analysis and visualization.
Create an electron version of UCSC Xena Browser. Integrate the build with the UCSC Xena Browser code repository, so the web application and electron application share their core functionality.
UCSC Xena supports viewing gene expression for sets of genes in a single column. This is useful for understanding the effects of disease on a cell signaling pathway.
Gene set columns are sorted by the expression of the left-most gene, sub-sorted by the next-left-most gene, and so-forth. This can help identify correlations, but is limited.
Hierarchical clustering could provide more insights, by grouping genes and samples by similarity.
Implement a "hierarchical clustering" sort option for matrix-oriented data (gene sets, probe sets, and probes or transcripts within a gene). This option will compute clustering over both dimensions (samples and genes), sort samples by the result (sorting order of the column is determined by the clustering result, currently, the sort order is the value of the leftmost sub-column), and re-order the gene list.
All the computation will be done in the client. The clustering algorithm could be a clean implementation, or from an existing javascript library, if the dependencies are not onerous. In either case it must be well-tested, and performant. Familiarity with chrome devtools profiler will be helpful.
The UI should convey to the user that the gene list is being re-ordered, which is not the case for our current sort options.