Cufflinks (http://cufflinks.cbcb.umd.edu/) is a transcript assembly program that includes a number of tools for analyzing RNA-Seq data. These tools assemble aligned RNA-Seq reads into transcripts, estimate their abundances, test for differential expression and regulation transcriptome-wide, and provide transcript quantification. Some of the tools part of Cufflinks can be run individually, while other are part of a larger workflow.
The basic usage of Cufflinks is:
where input_alignments.[sam|bam] is sorted input file of RNA-Seq read alignments in SAM/BAM format. The RNA-Seq read mapper TopHat/TopHat2 produces output in this format and is recommended to be used with Cufflinks, although SAM/BAM alignments produced from any aligner are accepted. More advanced Cufflinks options can be found in the manual: http://cufflinks.cbcb.umd.edu/manual.html, or by typing:
An example of how to run Cufflinks on Tusker with alignment file in SAM format, output directory cufflinks_output/ and 8 CPUs is shown below:
The program cufflinks produces number of files in its predefined output directory cufflinks_output/. Some of the generated files are:
- transcripts.gtf: The GTF file contains Cufflinks' assembled isoforms where there is one GTF record per row, and each record represents either a transcript or an exon within a transcript
- isoforms.fpkm_tracking: This file contains the estimated isoform-level expression values in the generic FPKM Tracking Format
- genes.fpkm_tracking: This file contains the estimated gene-level expression values in the generic FPKM Tracking Format
Beside cufflinks, the Cufflinks package includes the following programs:
Cuffcompare uses the Cufflinks' GTF output as an input file and compares the assembled transcripts to a reference annotation. An example of comparing the already annotated genome known_annotation.gtf with the new annotation new_annotation.gtf follows:
This tool reports various statistics about the transcripts, as well as a GTF file containing all transfrags in each sample.
This program allows merging of multiple Cufflinks GTF files. An example of merging multiple GTF files with full paths defined in the file list_GTF.txt and 8 CPUs is shown below:
The cuffmerge output is single unified transcript file.
Cuffdiff is used to identify differentially expressed transcripts. An example of cuffdiff for the annotated transcripts for the new genome, new_annotations.gtf, with 3 SAM alignment files generated from TopHat and 8 CPUs follows:
Cuffdiff prints multiple output files, such as: FPKM tracking files, count tracking files, read group tracking files, differential expression tests, differential splicing tests, differential coding output, differential promoter use, read group info, and run info.