Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Pathway Explorer: An Interactive Web-Based Visualization for LINCS Signatures

The CMap Data Explorer allows users to explore LINCS signatures based on their similarity to a query. The query can be a specific signature of interest, or a set of up- and down-regulated genes.
by

Jim McCusker

on 4 December 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Pathway Explorer: An Interactive Web-Based Visualization for LINCS Signatures

Using
known transcription patterns
to query the Connectivity Map
Pathway Explorer
An Interactive Web-Based Visualization for LINCS Signatures

James P. McCusker*, William FitzHugh*, Corey Flynn**, Rajiv Narayan**, Aravind Subramanian**, Bang Wong**,

*5AM Solutions, Rockville, MD
**Broad Institute, Cambridge, MA
View
connectivity score distribution
of all profiles

Now see how those signatures are represented in
pathways of interest
Drill down to investigate
specific signatures
There are hundreds of thousands of signatures in the Connectivity Map. Each is created by perturbing a biological sample with a small molecules, inhibiting expression with shRNA's, or inducing overexpression with the introduction of open reading frames. One method of sifting through this large data set is to start with information about known transcription patterns.

This can be thought of as a 'query'. A query can be a set of genes known to over- or under-expressed in some condition of interest, or the query can be a complete transcription profile.

In either case, the query can be used to calculate a connectivity score with all profiles in the Connectivity Map. This score will quantify how similar (or how different) each profile is from the query.
To hi-fidelity example...
How it was built collaboratively
From first prototype...
To interactive web visualization.
This graphic shows the distribution of connectivity scores that were calculated based on the query. Positive connectivity scores indicate a correlation between the expression of genes between the query and each profile. Negative connectivity scores indicate a negative correlation. A large number of profiles will have a connectivity score close to zero, so the distribution of those scores, in bins of positive, negative and 0, are summarized graphically.

This result will give an overall sense for how all the profiles relate to the query. This might give the user hints about the biology of the Connectivity Map samples.
The additional rows in the Pathway Explorer allow the user to understand more of the biological relevance of their results. Each row contains a subset of all profiles. In this example, each subset is based on a biochemical pathway from sources such as MSigDB or KEGG.

The subsets could also be created from other sources, such as the classe of small molecule that were used to perturb the samples.

The color of each row indicates whether the majority of the profiles had a positive correlation with the query (blue) or a negative one (red). This will allow users to see if, for instance, a particular small molecule tends to produce similar expression patterns to the inhibition of transcription of any gene in a pathway.
We are here.
A user can drill down to specific profiles by selecting a region. This will show the identity of each profile and the connectivity scores for those profiles. This will allow detailed investigation of the profiles of interest, especially in the regions of the highest and lowest scores.
Technical Details
The visualization uses data pre-binned into 2,001 bins (1000 positive, 1000 negative, and 1 for null values). All data, including query scores for each signature and gene set details, are loaded into a Virtuoso RDF Graph Database by the binning process. Data is then directly queried out of Virtuoso on-demand using SPARQL.
Linked Data Principles:
Things, not strings.
Identify things with URLs.
Get information about things from their URLs.
Link to other things using their URLs.
s
Full transcript