[关闭]
@lakesea 2020-11-28T06:57:18.000000Z 字数 2196 阅读 473

RNA-seq analysis pipeline

Methods


Aligned the trimmed paired reads to reference genome:

  1. hisat2 -x SA_trinity_merged.fa -1 sample1_R1.fq.gz -2 sample1_R2.fq.gz -p 12 |samtools view -@ 3 -Sb - | samtools sort -@ 3 -o sample1.sort.bam'

Install R pacakge used for counting the reads:

  1. BioManager::install("Rsubread")
  2. BioManager::install("limma")
  3. BioManager::install("edgeR")

Using the featurecount pacakge to count the reads:

  1. Rscript run-featurecounts.R -b sample1.bam -g annotation.gtf -o sample1_RNA

Script of run-featurecounts.R

  1. #!/usr/bin/env Rscript
  2. # parse parameter ---------------------------------------------------------
  3. library(argparser, quietly=TRUE)
  4. # Create a parser
  5. p <- arg_parser("run featureCounts and calculate FPKM/TPM")
  6. # Add command line arguments
  7. p <- add_argument(p, "--bam", help="input: bam file", type="character")
  8. p <- add_argument(p, "--gtf", help="input: gtf file", type="character")
  9. p <- add_argument(p, "--output", help="output prefix", type="character")
  10. # Parse the command line arguments
  11. argv <- parse_args(p)
  12. library(Rsubread)
  13. library(limma)
  14. library(edgeR)
  15. bamFile <- argv$bam
  16. gtfFile <- argv$gtf
  17. nthreads <- 1
  18. outFilePref <- argv$output
  19. outStatsFilePath <- paste(outFilePref, '.log', sep = '');
  20. outCountsFilePath <- paste(outFilePref, '.count', sep = '');
  21. fCountsList = featureCounts(bamFile, annot.ext=gtfFile, isGTFAnnotationFile=TRUE, nthreads=nthreads, isPairedEnd=TRUE)
  22. dgeList = DGEList(counts=fCountsList$counts, genes=fCountsList$annotation)
  23. fpkm = rpkm(dgeList, dgeList$genes$Length)
  24. tpm = exp(log(fpkm) - log(sum(fpkm)) + log(1e6))
  25. write.table(fCountsList$stat, outStatsFilePath, sep="\t", col.names=FALSE, row.names=FALSE, quote=FALSE)
  26. featureCounts = cbind(fCountsList$annotation[,1], fCountsList$counts, fpkm, tpm)
  27. colnames(featureCounts) = c('gene_id', 'counts', 'fpkm','tpm')
  28. write.table(featureCounts, outCountsFilePath, sep="\t", col.names=TRUE, row.names=FALSE, quote=FALSE)

After couting the reads for each RNA sample, we megerd the RNA seq expression matrix from all samples:

  1. ls *.count >genes.quant_files.txt
  2. perl abundance_estimates_to_matix.pl --est_method featureCounts --quant_files genes.quant_files.txt --out_prefix genes

Using the matrix to analyse the RNA-seq exprssion under different conditions:

Here we used one of the package from trinity:

  1. conda install trinity

```
perl run_DE_analysis.pl --matrix genes.counts.matrix --method DESeq2 --samples_file samples.txt --contrasts contrasts.txt

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注