Biotypes¶
BamQuery biotype classification is based on Ensembl and Repeat_Masker annotations.
Protein-coding Regions: peptide harbored in a transcript containing an open reading frame (ORF)
In this category the following transcripts types in gencode annotations are considered as Protein-coding Regions:
Protein_coding, IG genes and TR genes.5’UTR: peptide harbored into the 5’ UnTranslated Region of the transcript
3’UTR: peptide harbored into the 3’ UnTranslated Region of the transcript
In_frame: peptide harbored in the CDS of the transcript, and for which at least 50% of its sequence is In Frame with the protein
Frameshift: peptide harbored in the CDS of the transcript, and for which at least 50% of its sequence is Frameshift with the protein
CDS: Peptide containing a SNP harbored in the CDS of the transcript. Peptide that is not In_Frame or Frameshift with the protein
Junctions: peptide harbored in the junctions (Intron-CDS, CDS-3’UTR, 5’UTR-CDS) of the transcript
Other coding regions: peptide harbored in exons into Immunoglobulin gene (IG_C_gene, IG_D_gene, IG_J_gene, IG_V_gene) or T cell receptor genes (TR_C_gene, TR_D_gene, TR_J_gene, TR_V_gene)
Non-coding RNAs: peptide harbored in a transcript that doesn’t containing an open reading frame (ORF)
In this category the following transcripts types in gencode annotations are considered as Non-coding RNAs:
IG_C_pseudogene, IG_J_pseudogene, IG_V_pseudogene, IG_pseudogene, Mt_rRNA, Mt_tRNA, TEC, TR_J_pseudogene, TR_V_pseudogene, lncRNA, miRNA, misc_RNA, non_stop_decay, nonsense_mediated_decay, polymorphic_pseudogene, processed_pseudogene, processed_transcript, pseudogene, rRNA, rRNA_pseudogene, retained_intron, ribozyme, sRNA, scRNA, scaRNA, snRNA, snoRNA, transcribed_processed_pseudogene, transcribed_unitary_pseudogene, transcribed_unprocessed_pseudogene, translated_processed_pseudogene, translated_unprocessed_pseudogene, unitary_pseudogene, unprocessed_pseudogene, vault_RNA.Non_coding Exons: peptide harbored in the exons of the transcript
Non_coding Junctions: peptide harbored in the junctions (Exon-Intron, Intron-Exon) of the transcript
Intergenic Regions: peptide harbored in a non-annotated region
Intergenic Regions: peptide harbored in a non-annotated region either in Ensembl or in Repeat Masker
Intronic Regions: peptide harbored in a intronic region of any type transcripts
Intronic Regions: peptide harbored in an intronic region of any type of transcript, whether protein-coding or non-coding RNA
- EREs: peptide harbored in a ERE (based on Repeat Masker annotations)
LINE: peptide harbored in a ERE class LINE
LTR: peptide harbored in a ERE class LTR
SINE: peptide harbored in a ERE class SINE
Antisense_EREs: peptide harbored in the antisense of a ERE of any class
Other EREs: peptide harbored in a ERE of other class (DNA, RC, RNA, Satellite, Simple_repeat, Unknown, Retroposon, Low_complexity, rRNA, scRNA, snRNA, srpRNA, tRNA)