class: center, middle # RNASeq analyses --- # RNASeq procedure Wang et al. Nat Rev Genetics. 2009. doi:[10.1038/nrg2484](http://dx.doi.org/10.1038/nrg2484) ![RNAseq_schema](images/RNASeq_schema.png "RNA Seq") --- # Multiple approaches 1. Genome sequenced, align RNAseq reads to genome 2. de novo Assembly of mRNA into transcripts 3. Quantify gene expression from reads aligned to genome or transcripts --- # Reads to Genome mapping ![SpliceAlign](images/dsv03901.jpeg "Spliced alignment") Tarraga et al 2017. DNA Research.[10.1093/dnares/dsv039](https://dx.doi.org/10.1093/dnares/dsv039) --- #Reads to Genome mapping Challenges: mRNA is spliced, genome contains introns Splice-aware short read aligners. Speed and accuracy tradeoffs * Tophat + Bowtie * HISAT/HISAT2 * GMAP/GSNAP * STAR --- # Quantify expression * Count reads overlapping exons * Table of total read counts per gene * Normalize counts for gene length and sequencing library depth * Gene expression then is FPKM - Fragments per Kilobase per Millions of reads * Tools: htseq-count, stringtie * BEDtools * R tools with iRanges --- # Evaluating expression differences Statistical tools for evaluating gene expression differences * Ballgown [bioconductor package](https://bioconductor.org/packages/release/bioc/html/ballgown.html) * DESeq [bioconductor package](https://bioconductor.org/packages/release/bioc/html/DESeq.html) * edgeR [bioconductor package](https://bioconductor.org/packages/release/bioc/html/edgeR.html) --- # Alternative approach for Quantifying Compare reads to __Transcripts__ instead of Genome * Kalisto and Sailfish are common tools * Bray et al 2016 "Near-optimal probabilistic RNA-seq quantification" doi:[10.1038/nbt.3519](http://dx.doi.org/10.1038/nbt.3519) * Patro et al 2014 "Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms" doi:[10.1038/nbt.2862](http://dx.doi.org/10.1038/nbt.2862) --- # Denovo assembly [Trinity Assembler](http://trinityrnaseq.github.io/) for RNASeq ```bash $ module load trinity-rnaseq $ module switch perl/5.22.0 $ Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 8 --max_memory 20G ``` --- #ORF identification [TransDecoder](https://github.com/TransDecoder/TransDecoder/wiki) * Finds Open Reading Frames in mRNA transcripts ```bash $ module load transdecoder $ TransDecoder.LongOrfs -t target_transcripts.fasta ``` --- #RNAseq read mapping Using HISAT2 ```bash # srun --ntasks 8 --pty bash -l $ mkdir rnaseq; cd rnaseq $ cp -r /bigdata/gen220/shared/projects/RNAseq ./ $ module load hisat2 $ cd genome $ ls -l $ hisat2-build yeast_genome.fasta yeast $ cd .. ``` --- # RNAseq read mapping ```bash $ hisat2 -x genome/yeast -1 fastq/yeast_RNASeq_1.fq -2 fastq/yeast_RNASeq_2.fq \ -S RNASeq_aln.sam -p 16 $ module load samtools $ samtools view -b RNASeq_aln.sam > RNASeq_aln.bam $ samtools sort RNASeq_aln.bam > RNASeq_aln.sort.bam $ samtools index RNASeq_aln.sort.bam $ samtools flagstat RNASeq_aln.sort.bam ``` --- # Process BAM files for other tools * give to htseq-count to get the read depth * process with stringtie ```bash $ module load stringtie GTF=genome/genes.gff $ stringtie -G $GTF -b stringtie_out -e -o stringtie.gtf -A stringtie.gene_abund.tab RNASeq_aln.sort.bam ```