What sequences should be trimmed ?
QuantSeq raw sequencing reads should be trimmed to remove adapter sequences, poly(A) / poly(T) sequences, and low quality nucleotides. Reads that are too short (i.e., <20 nt) or have generally low quality scores should also be removed prior to alignment.
For adapter trimming, please, use the following sequence:
5' – A{18}AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC – 3'
In general we can recommend using cutadapt (v.1.18) with the following parameters for Read 1 trimming.
cutadapt -m 20 -O 20 -a "polyA=A{20}" -a "QUALITY=G{20}" -n 2 ${R1_raw} | \
cutadapt -m 20 -O 3 --nextseq-trim=10 -a "r1adapter=A{18}AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;min_overlap=3;max_error_rate=0.100000" – | \
cutadapt -m 20 -O 20 -g "r1adapter=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;min_overlap=20" --discard-trimmed -o ${R1_trimmed} –
As second strand synthesis is based on random priming, there may be a higher proportion of errors at the first nucleotides of the insert due to non-specific hybridization of the random primer to the cDNA template.
For QuantSeq FWD data we therefore recommend using an aligner that can perform soft-clipping of the read ends (e.g., STAR aligner) during alignment, or increasing the number of allowed mismatches to 14.
Alternatively, trimming the first 12 nt of Read 1 can be performed prior to alignment when using a more stringent aligner (e.g., HISAT2). While trimming the read can decrease the number of reads of suitable length for alignment, the absolute number of mapping reads may increase due to the improved read quality.