Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Galaxy

bioinformatics presentation
by

Urszula Czerwinska

on 26 June 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Galaxy

Take more out of it...
Taverna + Galaxy = Travaxy?
Bioinformatics tools
&
workflow environment
tools
Urszula Czerwinska
A0103915M
LSM 3241 BIOINFORMATICS & BIOCOMPUTING
what?
galaxyproject.org/
Reproductible Research System
Accesible
Reproductible
Transparent


Deploy Galaxy

Galaxy is open source for all organizations. Local Galaxy servers can be set up by downloading and customizing the Galaxy application.
Galaxy's public service web site makes analysis tools, genomic data, tutorial demonstrations, persistent workspaces, and publication services available to any scientist.
use
deploy
next
Main concept
An environment for performing and recording computational analyses and enabling the use or inclusion of these analyses when preparing documents for publications. Multiple systems provide an environment for recording and repeating computational analyses by automatically tracking the provenance of data and tool usage and enabling users to selectively run (and rerun) particular analyses, and one such system provides a means to integrate analyses in a word-processing document
outputs : datasets, analyses, workflows, tools
interactive
web-based
free
easy
no programming
intergrating tools
analyzes can be run repeatedly on different data
history of used tools
adding tags and annotations
share and communicate experimental results and outputs in a meaningful way
1. sharing model for Galaxy items - datasets, histories, and workflows - and public repositories of published items

2. web-based framework for displaying shared or published Galaxy items

3. Pages - custom web-based documents that enable users to communicate their experiment at every level of detail and in such a way that readers can view, reproduce, and extend their experiment without leaving Galaxy or their web browser
analysis modules
data browser
history
overlook
data
build-in converters between file formats

additional converters can be added
Format file conversion
Data formats
Sequence
Alignments
Interval/ feature formats
Others
Fasta
FastQ
ABI/SCF
SFF (454)
(S| B)AM
MAF
AXT
LAV
INTERVAL
BED
GFF
WIG
* .txt
HTML
LPED / PBED
other file types can be added on local Galaxy
Uploading
FTP
URL
data library
proxy
user submits proxy
request to Galaxy
Galaxy forwards request to remote service
service returns data
Galaxy infers data type and presents results
Send and Get data–Upload, fetch, send, submit
Data manipulation–Join , sort, filter
Format conversion–FASTA, other format operations
Statistics–Regressions, simulations, model tests
NGS–BAM, FASTQ, SOLiD, 454 file operations
RNA analysis–cufflinks, tophat
Evolution–branch lengths, NJ, HyPhy
visualisation
Histograms
Scatterplots
Box plots
Galaxy trackster
and many others...
Galaxy under the hood
issues command
1. Parses HTTP request
2. Identifies which tool use
3. Reads tool description
4. Queues tool
5. Parses result
6. Return HTML representation of result
Installing your own Galaxy
sudo apt-get install ttf-mscorefonts-installer
http://www.gmod.org/wiki/Galaxy_Tutorial_2010#Under_the_hood
Initial setup
gmod@ubuntu:~/work$ cd ~/work/galaxy-dist
gmod@ubuntu:~/work/galaxy-dist$ sh setup.sh
Running Galaxy
gmod@ubuntu:~/work/galaxy-dist$ sh run.sh
Access your Galaxy
Load a web browser and access
http://localhost:8080
http://galaxy.sb-roscoff.fr
gmod@ubuntu:~/work/galaxy-dist$ ls database/
compiled_templates files import info.txt job_working_directory pbs tmp tools universe.sqlite
all matadata tracked by Galaxy
raw datasets
based
SQLite
Essential indegredients
True story
Once upon a time....
there was Sophie the Sciencist working on LC/MS Metabolomics data of brown algae
She was using Xcms Online
and script R biocinductor: xcms
not enough flexible
lines of code
Objet<-xcmsSet(method="machedFilter", step=x,steps=x,mzdiff=x, fwhm=x, sntresh=x)
Objet<-group(xsetn, method="mzClust")
Objet<-retcor(xsetn, missing=0, extra=0, span=1, family="s", plottype="m")
Objet<-group(xsetn, method="mzClust")
Objet<-retcor(xsetn, missing=0, extra=0, span=1, family="s", plottype="m")
Objet<-group(xsetn, method="mzClust")
Objet<-fillPeaks(xsetn)
Order<-diffreport(xsetn, "A", "B", "A_vs_B", 500, meltin=0.15,h=480, w=640)
She got a lot of assistants
who didn't know how to use R
and she was tired with files format conversion, integrating digfferent software to post Xcms treatment
so she asked for help
who decided to solve the task on
community
Galaxy Pages are the main way for communicating your Galaxy analyses so that other people can easily view, reproduce, or extend your analyses. Pages represent a step towards the next generation of online publication or publication supplement
•include a mix of text and graphs describing the analyses you performed
•include embedded Galaxy items from your analyses--datasets, histories, and workflows--that readers can expand and view details of or copy into their analysis workspace and begin using immediately.

Pages makes reproducing an analysis simple: a reader can import a history and rerun it, or she can import a workflow and input datasets and run the workflow. Once a history or workflow is imported from a Page, a reader can also modify or extend the analysis as well or reuse a workflow in another analysis.
Galaxy Pages
Wiki - learn more, learn easily
tutorials, videos, interactive materials

galaxy - user
galaxy - dev
galaxy - announce
galaxy - commits
Mailing list
Who eats who?
R scritp
Objet<-xcmsSet(method="machedFilter", step=x,steps=x,mzdiff=x, fwhm=x, sntresh=x)
Objet<-group(xsetn, method="mzClust")
Objet<-retcor(xsetn, missing=0, extra=0, span=1, family="s", plottype="m")
Objet<-group(xsetn, method="mzClust")
Objet<-retcor(xsetn, missing=0, extra=0, span=1, family="s", plottype="m")
Objet<-group(xsetn, method="mzClust")
Objet<-fillPeaks(xsetn)
Order<-diffreport(xsetn, "A", "B", "A_vs_B", 500, meltin=0.15,h=480, w=640)
had to be prepared for a conection with XML
bash command
all arguments at once
order parameters
little bit of Python
building GUI interface
output files
should be version control
description
R
<when value="matchedFilter">
<param name="step" type="float" value="0.01" label="step" help="the peak detection algorithm creates extracted ion base peak chromatograms (EIBPC) on a fixed step size defined by the step argument" />
if (!is.null(listArguments[["input"]])) {
directory=unzip(listArguments[["input"]])
if (thefunction == "xcmsSet") {
listArguments=append(list(directory), listArguments)
x=dir(".")
classes=x[file_test("-d", x)]
}
listArguments[["input"]]=NULL
}
Replace commercial solutions
PCA
Hierarchical clustering
Sharing
The Galaxy Tool Shed enables sharing of Galaxy tools across the Galaxy community.
The intent of the main Galaxy tool shed is to enable sharing of functionally correct (already developed) Galaxy tools between the many local Galaxy instances around the world. The mercurial repositories that are available in the main Galaxy tool shed can be "hg cloned" individually or "installed" individually as a means of making their contents (Galaxy tools, workflows, data, etc) available to your local Galaxy instance. This provides flexibility to those hosting their own local Galaxy instances in that they can install only those tools in which they have interest, and are not forced to get all of them in order to get any one of them.
https://www.ohloh.net/p/galaxybx/
Community
code source
THANK YOU
any questions?
Bibliography
http://galaxyproject.org/
Giardine B, Riemer C, Hardison R, et al.: Galaxy: a platform for interactive large-scale genome analysis.Genome Res 2005, 15(10):1451-1455.
Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J: Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol 2010., Chapter 19: Unit 19.10.1-21
Goecks J,Nekrutenko A, Taylor J & The Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology 2010, 11:R86
Pericard P, Le Corguille G, Czerwinska U, Landi M, Giacomoni F, Duperier C, Martin JF, Goultiquer S, Pujous-Guillot E, Caron C. A Small Step into Galaxy, a Faster Pace for Metabolomics. Galaxy and the metabolomics analysis Universe. Jobim 2013.
Abouelhoda M, Issa SA, Ghanem M. Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support. BMC Bioinformatics 2012, 13:77
https://www.ohloh.net/p/galaxybx/
http://www.slideshare.net/rvosa/the-galaxy-bioinformatics-workflow-environment-12283216
https://main.g2.bx.psu.edu/u/aun1/p/mtdemo-getting-things-mapped
Final workflow
Full transcript