The Companion service requires input data to be in valid FASTA, GenBank or EMBL format. In files containing both annotations and sequences (such as EMBL and GenBank), only sequences will be considered. Any already existing annotation will be discarded and not included in the final result.
Some additional requirements to consider:
Once you have prepared the input sequence on your computer, you can navigate to the Companion job submission page to create a new annotation job. You will need to provide some information:
The job name is just a free text identifier used by Companion to denote your job. It will not be used in the annotated output and is solely meant to help you distinguish individual runs. In contrast, the species prefix will be used to construct all sorts of identifiers in the final result such as pseudochromosome or gene IDs. For example, if you have picked a species prefix of WXYZ then genes in your annotated genome will be assigned gene IDs prefixed with that string, e.g. WXXZ_00006700, with transcripts called WXXZ_00006700.1, etc. Pseudochromosome IDs constructed by Companion will be prefixed with this string, e.g. WXXZ_04, WXXZ_IV or WXXZ_00.
If you have already registered a new project with one of the public databases (such as ENA), then the prefix should be the 'locus tag prefix' assigned to you or chosen by you.
Simply select the sequence to annotate from your local disk. The FASTA, EMBL or GenBank file can be gzip- or bzip2-compressed. If it is a compressed file, it must have a .gz or .bz2 suffix.
Accuracy of de novo gene annotation (i.e. of genes with no reference counterpart) can be improved by adding extrinsic evidence to guide the process. Companion can use assembled transcripts in GTF format, for example as produced by Cufflinks, to improve the results. If you have prepared such a file, you can upload it to be used in gene finding (maximum size 128 MB). Please do not upload raw reads or alignments, the transcripts will have to be assembled in the coordinate space of your uploaded sequences. This is the case, for example, if you have run Cufflinks against the sequence you are submitting.
Companion will try to transfer information, such as gene structure or product information from a highly conserved ortholog in the reference genome. Also, predictor models trained on the genes of the reference will be used for de novo gene finding. We currently offer 110 reference data sets across many parasites and related species, imported from GeneDB and EuPathDB.
It might be helpful to order and orientate the input sequences according to the reference chromosomes, if they are known. This makes it possible to quickly check for structural variants, allows to number gene IDs by chromosome and also helps creating useful comparative graphs.
Companion uses ABACAS2, a successor to the original ABACAS tool, to perform this contiguation step. It will create new 'pseudochromosome' sequences as well as layout files describing how the input sequences were assembled into pseudochromosomes. All unassembled input sequences, e.g. scaffolds, will be concatenated into a single 'bin' pseudochromosome. This results in a manageable number of sequences even when faced with hundreds to thousands of input sequences.
After reviewing all chosen parameters, just click the 'Submit' button and your files will be uploaded and validated. When all information is confirmed to be OK, your job will be enqueued in the system.
After a job has been successfully enqueued, the system will assign your job an alphanumerical ID (e.g. 0f2bc4b4db3d0052d9627a60) and a URL to check your job status (e.g. http://protozoacompanion.gla.ac.uk/jobs/0f2bc4b4db3d0052d9627a60). If no other jobs have been enqueued before yours, then it will start processing right away and you can visit the URL regularly to check on the progress of your job. There are the following states:
When the job has finished successfully, the URL you were given by the system will point to the results page, which will allow you to download and browse the results of your annotation job. Please take a look at the example results page to get an idea of what the results will contain.