How should I analyze TeloPrime IsoSeq data?
TeloPrime V2 IsoSeq data generated by Sequel instruments (Pacific Biosciences) can be best analyzed using a combination of the IsoSeq3 pipeline from PacBio (available here: https://github.com/PacificBiosciences/IsoSeq_SA3nUP/wiki/Tutorial:-Installing-and-Running-Iso-Seq-3-using-Conda) and the Transcriptome Annotation by Modular Algorithms (for Iso-Seq data), or "TAMA" software package (https://github.com/GenomeRIK/tama), which is specifically modified to run TeloPrime IsoSeq data. TAMA runs on Python 2.7 and there is no need for any installation.
TAMA is regularly updated so please check the GitHub repository regularly for the most up to date workflows and add-ons.
To run TAMA with Iso-Seq3 the following pipeline steps can be recommended:
CCS
LIMA; NOTE: do not run "isoseq3 refine –require-polya", this is the step with issues.
bamtools convert -format fasta -in unpolished.flnc.bam > flnc.fasta
Poly-A cleanup (run as: python tama_flnc_polya_cleanup.py {fasta} {outfile}; NOTE: the current version does not handle random non A insertions)
Minimap2 to the genome
Sort bam file and convert to sam
Run TAMA collapse