Google Summer of Code Ideas 2018

About Google Summer of Code

Google Summer of Code is a summer program that offers students stipends to develop software for open source projects.

How to apply

Get familar with the UCSC Xena and it's codebase. Next, either develop your project proposal based on one of the ideas or come up with your own. If you are a prospective student interested in doing your Google Summer of Code (GSoC) project with us, please contact us as soon as possible. We will do our best to assist and guide you in the formulation of your GSoC project proposal.

If you have any questions about any of the ideas, please join our Google Group or send us a private email.

Project ideas

Refactor chart view An improvement project

Transcript View enhancement An improvement project

BRCAness View A new functionality project

Implement more extensive Google Analytics coverage An improvement project

Web interface to load data into local Xena hub on laptop An improvement project

Kaplan-Meier view enhancements An improvement project

GraphQL API for Xena server A new functionality project

Adapt UCSC Xena Browser as an Electron desktop application A new functionality project

Matrix data clustering in spreadsheet view A new functionality project

Refactor chart view

Background

We have two main visualizations: our primary Visual Spreadsheet and the charts view which draws bar charts, box plots and scatter plots. Users select columns of data in the Visual Spreadsheet, which become options for the x- and y-axis in the chart view.

Currently we use highcharts.js and bootstrap css for our chart view, which does not follow the architecture of the rest of our site. Maintaining these differences takes time and energy from our team.

Goal

Refactor this code to instead use a react-based library and the Material Design CSS. The functionality will remain exactly the same.

An additional goal, if there is time, would be to add a violin plot to our current chart. This would likely require the incorporation of another library which would need to be veted. This github issue has some leads on libraries.

Difficulty: Difficult

Required Skills: Javascript, React.js, Rx.js, CSS Material Design

Mentors: Brian Craft, Mary Goldman

Transcript View enhancement

Background

For GSoC 2017, Akhil Kamath coded the Transcript View. This new visualization has been very well received and users are now requesting enhancements. Specifically, users want to be able to define their own subgroups such as patients with a certain mutation vs patients without this mutation. They also want to see protein domain annotations on the transcripts.

Goal

Allow users to define their own subgroups. This will most likely require integration with the Visual Spreadsheet and our filter capability.
Display protein domain annotations. This will require loading the data, most likely from Uniprot into our Xena Hub and then developing an API to call this data and display it.
Calculate statistical significance between two groups for each transcript, display the stats results.

Difficulty: Difficult

Required Skills: Javascript, React.js, Rx.js, H2 database

Mentors: Angela Brooks, Brian Craft

BRCAness View

Background

We have been developing a new visualization to help clinicians determine what is high BRCAness. Here is some education about BRCAness and some education about our visualization.

Goal

Implement this visualization as a new view, similar to the Transcript View. Mock ups

Difficulty: Difficult

Required Skills: Javascript, React.js, Rx.js

Mentors: Jing Zhu, Hannah Allegakoen, Brian Craft, Mary Goldman

Implement more extensive Google Analytics coverage

Background

While we have some general knowledge of how users move through our application, we need some more specifics. In particular, we need to determine which features and datasets users typically use to know where to focus our future development efforts and to give more detailed reports to grant agencies.

Goal

Implement more extensive Google Analytics coverage, including more functionality and determining which datasets are being viewed.

Difficulty: Easy

Required Skills: Javascript

Mentors: Brian Craft, Mary Goldman

Web interface to load data into local Xena hub on laptop

Background

The way users visualize their own data in Xena is to download a Hub on to their laptop, load their data into the Hub, and then use a web browser pointed at their laptop hub to visualize it. Currently users get a bit confused about loading the data into the Hub on their laptop since they are visualizing using a browser. Additionally we do not have much error handling on loading files.

Goal

Move the data loading from the java hub into the browser and add in error handling.

UCSC Xena Logo

Logo on our Hub

Difficulty: Difficult

Required Skills: Javascript, React.js, Rx.js,

Mentors: Brian Craft, Mary Goldman

Kaplan-Meier view enhancements

Background

The Kaplan-Meier view allows the user to compare the survival rates of different groups of patients. This helps identify genomic or phenotypic traits that affect survival.

We currently support viewing a single KM plot at a time, via the column control drop-down. Some users need to view KM plots for a set of genes, probes, or transcripts. It is tedious work to view them one at a time.

Additionally, there are several different measures of survival. We currently show overall survival, but the user may instead need recurrence-free survival, progression-free survival, etc..

At last, showing CIs (confident intervals) for the KM lines, such as in https://i.stack.imgur.com/RyDau.png .

Goals

Extend the Kaplan-Meier view to support generating multiple plots at once. Multiple plots will be available when the user creates a column with a gene set, or a gene with multiple probes or transcripts. Whether the plots might be arranged in a grid, or placed on several pages, with "forward" and "back" controls, will depend on user testing.

The view should also support PDF download with the plots arranged on multiple pages.

Extend the Kaplan-Meier view to allow selection of different survival metrics, when they are available.

Difficulty: Moderate

Required Skills: Javascript, React.js, Redux, CSS Material Design

Mentors: Brian Craft, Mary Goldman

GraphQL API for Xena server

Background

Query languages are far more powerful than REST APIs for data-heavy applications. The UCSC Xena Server was designed around a singular query language API, based on SQL. Since then, other projects have driven the development of similar query language API solutions, including Falcor and GraphQL. A lateral move from the UCSC Xena query language to GraphQL would allow easier development, access, and collaboration, by leveraging the large support base of GraphQL.

Goals

Develop a GraphQL endpoint for UCSC Xena Server. Assess the existing Xena queries and data types to develop GraphQL schemas that match the functionality and allow future growth. Of particular interest is handling restriction (predicates), column-oriented responses, and aggregations.

Assess GraphQL solutions for clojure, such as Lacinia, and implement the new endpoint in our clojure code base.

Difficulty: Difficult

Required Skills: clojure, JVM, functional programming, basics of query languages

Mentors: Brian Craft, Mary Goldman

Adapt UCSC Xena Browser as an Electron desktop application

Background

UCSC Xena Browser is a web application, running in a browser. This limits the functionality of the application, for example, making it harder to perform CPU-intensive analysis, and restricting access to the file system.

The electronjs project has made it possible to build cross-platform desktop applications using web technologies. Adapting UCSC Xena Browser to electron could provide a more powerful, and more secure platform for functional genomic analysis and visualization.

Goals

Create an electron version of UCSC Xena Browser. Integrate the build with the UCSC Xena Browser code repository, so the web application and electron application share their core functionality.

Difficulty: Difficult

Required Skills: javascript, reactjs, redux, node

Mentors: Brian Craft, Mary Goldman

Matrix data clustering in spreadsheet view

Background

UCSC Xena supports viewing gene expression for sets of genes in a single column. This is useful for understanding the effects of disease on a cell signaling pathway.

Gene set columns are sorted by the expression of the left-most gene, sub-sorted by the next-left-most gene, and so-forth. This can help identify correlations, but is limited.

Hierarchical clustering could provide more insights, by grouping genes and samples by similarity.

Goals

Implement a "hierarchical clustering" sort option for matrix-oriented data (gene sets, probe sets, and probes or transcripts within a gene). This option will compute clustering over both dimensions (samples and genes), sort samples by the result (sorting order of the column is determined by the clustering result, currently, the sort order is the value of the leftmost sub-column), and re-order the gene list.

All the computation will be done in the client. The clustering algorithm could be a clean implementation, or from an existing javascript library, if the dependencies are not onerous. In either case it must be well-tested, and performant. Familiarity with chrome devtools profiler will be helpful.

The UI should convey to the user that the gene list is being re-ordered, which is not the case for our current sort options.

Difficulty: Moderate

Required Skills: javascript, reactjs, redux, algebra, performance analysis

Mentors: Brian Craft, Mary Goldman

UCSC Xena Coding Guidelines

Coding Guidelines

UCSC Xena Code of Conduct

Code of Conduct

Mentorship for Underrepresented Students

Mentorship description

Current Project Ideas

Completed Student Projects

Previous Project Ideas

Resources

Google Summer of Code projects

2019: Update GDC Data Ingestion Pipeline and Run

2018: Xena web loader

2017: Transcript View

UCSC Xena Roadmap

Roadmap

Google Summer of Code Ideas 2018

About Google Summer of Code

How to apply

Project ideas

Refactor chart view

Background

Goal

Difficulty: Difficult

Required Skills: Javascript, React.js, Rx.js, CSS Material Design

Mentors: Brian Craft, Mary Goldman

Transcript View enhancement

Background

Goal

Difficulty: Difficult

Required Skills: Javascript, React.js, Rx.js, H2 database

Mentors: Angela Brooks, Brian Craft

BRCAness View

Background

Goal

Difficulty: Difficult

Required Skills: Javascript, React.js, Rx.js

Mentors: Jing Zhu, Hannah Allegakoen, Brian Craft, Mary Goldman

Implement more extensive Google Analytics coverage

Background

Goal

Difficulty: Easy

Required Skills: Javascript

Mentors: Brian Craft, Mary Goldman

Web interface to load data into local Xena hub on laptop

Background

Goal

Difficulty: Difficult

Required Skills: Javascript, React.js, Rx.js,

Mentors: Brian Craft, Mary Goldman

Kaplan-Meier view enhancements

Background

Goals

Difficulty: Moderate

Required Skills: Javascript, React.js, Redux, CSS Material Design

Mentors: Brian Craft, Mary Goldman

GraphQL API for Xena server

Background

Goals

Difficulty: Difficult

Required Skills: clojure, JVM, functional programming, basics of query languages

Mentors: Brian Craft, Mary Goldman

Adapt UCSC Xena Browser as an Electron desktop application

Background

Goals

Difficulty: Difficult

Required Skills: javascript, reactjs, redux, node

Mentors: Brian Craft, Mary Goldman

Matrix data clustering in spreadsheet view

Background

Goals

Difficulty: Moderate

Required Skills: javascript, reactjs, redux, algebra, performance analysis

Mentors: Brian Craft, Mary Goldman

UCSC Xena Coding Guidelines

UCSC Xena Code of Conduct

Mentorship for Underrepresented Students

Google Summer of Code projects

UCSC Xena Roadmap

Clone this wiki locally