Introduction
Class video
Introductory Slides on Galaxy
Ways to access Galaxy
- Public servers - Some run by the Galaxy organization, some by other institutions. All may have different tools available depending on the intended use.
- Install yourself on your own hardware (difficult-ish)
- Install with CloudMan on a commercial cloud provider account (like AWS, Google Cloud, etc). Easy point and click, but you have to have an account with cloud provider and pay for the cloud resources. The cost depends on how computationally intensive your processes are.
Tool Shed - repository for publicly available galaxy tools. Anyone can contribute a tool (maybe some approval process to get into the repo). Not all tools are available in all galaxies. An admin of a particular server can add the ones they like. CloudMan has an easy GUI interface to add tools.
Developing Galaxy Tools
Tutorial
M. tuberculosis Variant Analysis - this is the tutorial we followed in the class
We skipped over the TB profiler step (I only showed the text manipulation part). This step in the tutorial is a little unclear (and partly incorrect), so here is an updated version:
- TB-Profiler profile Tool: with the following parameters
"Input File Type": BAM
"Bam": snippy on data XX, data XX, and data X mapped reads (bam)
TB Profiler produces 3 output files, it’s own VCF file, a report about the sample including it’s likely lineages and any AMR found. There is also a .json
formatted results file.
When snippy is run with Genbank format input it prepends GENE_
to gene names in the VCF annotation. This causes a problem for TB Variant Report, so we need to edit the output with sed.
TB Variant Report Tool: with the following parameters
- "Input SnpEff annotated M.tuberculosis VCF(s)":
Text transformation on data xx
Make sure you use the transformed TB Variant Filter data that you just made
- "TBProfiler Drug Resistance Report (Optional)":
TB-Profiler Profile on data XX: Results.json
Analysis Notes
- Always choose option to output log file!!
- Click on name of output of tool, then (i) details icon.
- This gives parameters and run details of the job.
- Scroll down and see what resources (CPU, memory) were requested and granted. The resource request is not controlled by the user currently. Could analyses be made more efficient by requesting different resources?
- Look at formula for fastqc FastQC formula https://github.com/galaxyproject/tools-iuc/tree/master/tools/fastqc
- Switch to MultiQC version 1.9
- Make dataset pair for input to trimmomatic
- Used Minikraken database for kraken2 for speed purposes. Works ok just to find contaminants.
- Make sure to use snippy version 4.5.0
- String together snippy step and the TB variant filter step using a Galaxy Workflow.
- Snippy and TB Variant Filter workflow I made in the class. Can import into your own workspace!
- See progress of running workflows (and record of finished ones): User menu at the top > Workflow Invocations
- Under Interactive Tools on left, try out some of the visualizations like bam.iobio and vcf.iobio
- Can also open an RStudio instance - when you're done, don't forget to close it (User > Active InteractiveTools > select tool and click Stop) and delete it from history
A work by Poorani Subramanian