gcell.rna.gencode.Gencode

Contents

gcell.rna.gencode.Gencode#

class gcell.rna.gencode.Gencode(assembly='hg38', version=None, is_basic=True, exclude_chrs=['chrM', 'chrY'])[source]#

A class to handle GENCODE gene annotations for different genome assemblies.

GENCODE (https://www.gencodegenes.org/) provides high-quality gene annotations for human and mouse genomes. This class handles downloading, parsing, and accessing GENCODE annotation data.

The class inherits from GTF and provides convenient methods to access gene information like strand, chromosome location, transcription start sites, etc.

Attributes

DEFAULT_VERSIONS

gene_to_chrom

Dict mapping gene names to their chromosome locations.

gene_to_id

Dict mapping gene names to their GENCODE IDs.

gene_to_strand

Dict mapping gene names to their strand orientation ('+' or '-').

gene_to_strand_numeric

Dict mapping gene names to numeric strand values (0 for '+', 1 for '-').

gene_to_tes

Dict mapping gene names to their transcription end sites (TES).

gene_to_tss

Dict mapping gene names to their transcription start sites (TSS).

gene_to_type

Dict mapping gene names to their gene types (e.g., protein_coding, lincRNA).

id_to_gene

Dict mapping GENCODE IDs to gene names.

original_gtf

Methods

from_config(config)

Create a Gencode instance from a configuration dictionary.

get_exp_feather(peaks[, extend_bp])

Get the expression data for the given peaks.

get_gene(gene_name)

Get a Gene object for the given gene name.

get_gene_id(gene_id)

Get a Gene object for the given gene ID.

get_genebodies([gene_names])

Get the gene bodies for the given gene names.

get_genes(gene_names)

Get a list of Gene objects for the given gene names.

query_region(chrom, start, end[, strand])

Query the GTF for regions matching the given parameters.