Skip to main content
Skip table of contents

Can I add new genome references for my data analysis with Kangooroo?

Yes.

There are two possible options:

  • We can add a new genome reference to the database for you. Please note this will come with a fee. For more information, please contact sales@lexogen.com.

  • You can upload your genome reference of interest directly on the platform. To do so, please follow the guidelines below.

1 - Download the genome and annotation references from your species of interest from Ensembl (https://ftp.ensembl.org/).

  • Select the directory ´´pub/´´.

  • Select the release folder of your choice (e.g., release-110).

  • Select ´´gtf/´´ folder or ´´dna/´´ folder to download annotation or genome files, respectively.

  • Select your organism of interest.

  • Download the gtf.gz file for the annotation reference.

  • Download the toplevel.fa.gz file for the genome reference.

Note: If the annotation is in gff/gff3 format, use AGAT (https://agat.readthedocs.io/en/latest/gff_to_gtf.html#agat) to convert gff3 to gtf.

2 - Rename gtf and fasta files as follow:

  • gtf file - annotation_organism_ercc_sirv_biotyped.gtf.

Example: Mus_musculus.GRCm39.110.gtf.gz renamed to annotation_organism_ercc_sirv_biotyped.gtf

  • fasta file - annotation_organism_ercc_sirv.fa.

Example: Mus_musculus.GRCm39.dna.toplevel.fa.gz renamed to annotation_organism_ercc_sirv.fa

Important: To ensure compatibility with our data analysis pipelines, the .gtf annotation file must include entries for “gene”, “transcript” and “exon” features. For gene entries, the attributes gene_id and gene_name are mandatory and transcripts entries must also include gene_id, transcript_id, gene_name and transcript_name.  In addition, each gene_id and transcript_id must be associated with at least one corresponding exon entry which is required to enable correct read assignment and counting during the analysis.

Example:

CODE
1  Source  gene        1000  5000  .  +  .  gene_id "GENE001"; gene_name "MYGENE";
1  Source  transcript  1000  5000  .  +  .  gene_id "GENE001"; transcript_id "TX001"; gene_name "MYGENE"; transcript_name "MYGENE-01";
1  Source  exon        1000  1200  .  +  .  gene_id "GENE001"; transcript_id "TX001"; exon_number "1";
1  Source  exon        1500  1800  .  +  .  gene_id "GENE001"; transcript_id "TX001"; exon_number "2";

3 - Generate a final directory containing both genome and annotation files.

First, copy the files:

CODE
cp annotation_organism_ercc_sirv.fa annotation_organism_ercc_sirv_biotyped.gtf ./Directory_folder/
  • Then, compress the resulting folder (tar.gz):

CODE
tar czfv Directory_folder.tar.gz Directory_folder

4 - Upload the directory folder in Kangooroo as described in the following FAQ: How do I upload my files in the Kangooroo platform?

5 - Tag the reference as a genome file.

By default, all uploaded files have a designated type marked as “File”. To be used as a reference by the pipeline, the type of the uploaded file must be changed to ´´Genome´´. Please watch our tutorial video on how to tag your file as a ´´Genome´´ type file here.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.