Measuring tissue specificity from single cell data with SPICEY
Georgina Fuentes-Páez
Source:vignettes/SPICEY.Rmd
SPICEY.Rmd
Introduction
SPICEY is an R package designed to infer cell type specificity from single-cell omics data. It supports input from widely used frameworks such as Seurat (for single-cell RNA-seq) and Signac (for single-cell ATAC-seq), and computes metrics of cell-type specificity at both the regulatory and transcriptional levels.
Preamble
SPICEY is designed to operate on the results of differential analyses performed on either single-cell ATAC-seq (for chromatin accessibility) or single-cell RNA-seq (for gene expression) data. These analyses would compare each feature—either a regulatory element (e.g., peak) or gene—across cell types to identify cell-type-specific activity or expression. SPICEY takes advantage of the resulting statistics, specifically the log2 fold-change and associated adjusted p-values, which reflect both the magnitude and statistical confidence of differential activity. By leveraging these pre-computed differential results, SPICEY quantifies cell-type specificity through composite metrics that capture both the strength and selectivity of feature enrichment in individual cell types.
Cell-Type Specificity Indices: RETSI and GETSI
SPICEY introduces two complementary metrics:
- RETSI (Regulatory Element Tissue Specificity Index): Quantifies the cell-type specificity of regulatory regions between cell types (e.g., ATAC-seq peaks).
- GETSI (Gene Expression Tissue Specificity Index): Quantifies the specificity of gene expression across distinct cell types.
Both RETSI and GETSI are calculated using a weighted specificity score for each feature (regulatory region or gene), defined as:
where:
- is a specific cell type
- , with for regulatory region (RETSI) and for gene (GETSI)
- is the log fold-change of feature in cell type
- is the weight term, defined as the rescaled value of between 0 and 1
Additionally, SPICEY supports the integration of regulatory elements with their putative target genes using multiple linking strategies, including:
- Nearest-gene assignment
- Co-accessibility-based linking (when co-accessibility links are available)
These linking approaches enable SPICEY to combine regulatory specificity (RETSI) and gene expression specificity (GETSI), facilitating integrated analyses of gene regulatory architecture at single-cell resolution.
This vignette demonstrates how to use SPICEY with preprocessed GRanges input files. These input files must be prepared in advance (not provided by the package).
Entropy-based specificity indices
To complement fold change-based indices, SPICEY also quantifies the distributional skewness of features using Shannon entropy. For each regulatory region or gene, activity or expression values across all NNN cell types are converted into a probability distribution:
where:
- is the normalized activity or expression of feature across cell types
- is the activity (for RETSI) or expression (for GETSI) value in cell type $$
- is the Shannon entropy, and Normalized Entropy ranges between 0 (cell type specific) and 1 (cell type shared).
Installation
Install the development version from GitHub:
install.packages("devtools")
devtools::install_github("georginafp/SPICEY")
Input Requirements
SPICEY requires carefully prepared input files representing processed single-cell omics data, specifically from differential analyses of chromatin accessibility (ATAC-seq) and/or gene expression (RNA-seq). These inputs are not provided within the package and must be generated externally beforehand. Below is a detailed description of each required input and the necessary metadata.
1. Single cell ATAC-seq data
Format:
GRanges
ordata.frame
convertible toGRanges
-
Required columns:
seqnames
,start
,end
— genomic coordinates of the peak.avg_log2FC
— average log2 fold-change of chromatin accessibility for the peak in a specific cell type (output from differential accessibility analysis).p_val
— raw or adjusted p-value corresponding to the differential accessibility test for the peak (output from differential accessibility analysis).cell_type
— cell type or cluster label associated with each differential measurement.
The ATAC peaks input must include multiple
rows per peak representing differential statistics for each
cell type. For example, the same peak may appear with different
avg_log2FC
, p_val
, and cell_type
values reflecting its differential accessibility in each cell type.
data("atac")
2. Single cell RNA-seq data
Format:
GRanges
ordata.frame
convertible toGRanges
-
Required columns:
symbol
— identifier of the gene.avg_log2FC
— average log2 fold-change of chromatin accessibility for the peak in a specific cell type (output from differential accessibility analysis).p_val
— raw or adjusted p-value corresponding to the differential accessibility test for the peak (output from differential accessibility analysis).cell_type
— cell type or cluster label associated with each differential measurement.
The RNA input must represent differential expression results, i.e., statistics computed for each gene across cell types.
data("rna")
3. Co-accessibility Links (optional, for region-to-gene linking mode)
Purpose: To link regulatory elements based on co-accessibility inferred from tools like Signac’s
LinkPeaks()
. This supports integrative analysis of regulatory elements and gene expression.-
Format:
- Either a
GInteractions
or aGRangesList
object with metadata columns.
- Either a
-
Required columns:
Peaks1
— genomic coordinate or ID of the first peak in the pair.Peaks2
— genomic coordinate or ID of the second peak linked by co-accessibility.coaccess
— co-accessibility score or correlation value.CCAN1
,CCAN2
— CCAN (co-accessibility network) identifiers for the peaks.
The peaks referenced here must match exactly those in the ATAC peaks input (same coordinates or peak IDs).
data("cicero_links")
head(cicero_links)
#> Peak1 Peak2 coaccess
#> 1 chr2-5506813-5507737 chr2-5693250-5693911 0.093227933
#> 2 chr3-111728040-111728766 chr3-111836220-111836612 -0.000154212
#> 3 chr3-54281710-54282000 chr3-54014148-54014347 0.000000000
#> 4 chr11-128734460-128734775 chr11-128298619-128299732 -0.002326294
#> 5 chr13-93284776-93284975 chr13-93397849-93398575 0.039018033
#> 6 chr8-79879783-79881132 chr8-80122041-80123202 -0.018005821
Example: Run SPICEY step-by-step
This section demonstrates how to run SPICEY using prepared input files, as described in earlier sections.
Computing RETSI
To calculate the Regulatory Element Tissue Specificity Index (RETSI), the following input is required:
-
atac
: AGRanges
object or data frame convertible toGRanges
containing differential accessibility results from scATAC-seq data. Required columns:seqnames
,start
,end
: Genomic coordinates of peaksavg_log2FC
: Average log2 fold-change of accessibility for each peakp_val
: Raw or adjusted p-value from differential accessibility testscell_type
: Cell type or cluster label
retsi <- spicey_retsi(atac)
head(retsi)
#> seqnames start end width strand cell_type p_val_adj p_val
#> 1 chr7 142748901 142749963 1063 * Acinar 1.158316e-300 0
#> 2 chr14 66378630 66379738 1109 * Acinar 1.158316e-300 0
#> 3 chr22 27241597 27243500 1904 * Acinar 1.158316e-300 0
#> 4 chr17 82909239 82910392 1154 * Acinar 1.158316e-300 0
#> 5 chr2 79119834 79120951 1118 * Acinar 1.158316e-300 0
#> 6 chr10 116587488 116588562 1075 * Acinar 1.158316e-300 0
#> avg_log2FC region RETSI norm_entropy
#> 1 6.025791 chr7:142748901-142749963 1 0.065607360
#> 2 5.742438 chr14:66378630-66379738 1 0.198069323
#> 3 6.111680 chr22:27241597-27243500 1 0.007724027
#> 4 6.066587 chr17:82909239-82910392 1 0.018789741
#> 5 6.160286 chr2:79119834-79120951 1 0.011755466
#> 6 6.299025 chr10:116587488-116588562 1 0.002236650
Computing GETSI
To calculate the Gene Expression Tissue Specificity Index (GETSI), the required input is:
-
rna
: AGRanges
object or data frame convertible toGRanges
containing differential expression results from scRNA-seq data. Required columns are similar to those foratac
but relevant for gene expression.symbol
— identifier of the gene.avg_log2FC
: Average log2 fold-change of accessibility for each peakp_val
: Raw or adjusted p-value from differential accessibility testscell_type
: Cell type or cluster label
getsi <- spicey_getsi(rna)
head(getsi)
#> gene_id p_val avg_log2FC pct.1 pct.2 p_val_adj cell_type
#> 1 REG1A 0.000000e+00 6.446878 0.392 0.012 6.967263e-216 Acinar
#> 2 PRSS1 3.793351e-218 5.111544 0.608 0.067 6.967263e-216 Acinar
#> 3 PNLIP 5.687535e-208 5.156786 0.471 0.038 9.660474e-206 Acinar
#> 4 ANKRD62 8.944741e-189 5.186987 0.529 0.056 1.320605e-186 Acinar
#> 5 PRSS2 1.241892e-185 4.545974 0.568 0.066 1.799054e-183 Acinar
#> 6 PTF1A 3.711279e-176 5.042912 0.379 0.028 5.000185e-174 Acinar
#> GETSI norm_entropy
#> 1 1.0000000 0.004473985
#> 2 1.0000000 0.095847822
#> 3 0.9528626 0.013732911
#> 4 0.8639239 0.024782578
#> 5 0.8493565 0.376959946
#> 6 0.8054632 0.011539343
Annotating Regulatory Elements to Target Genes
SPICEY allows you to annotate regulatory elements to their putative target genes using one of two methods:
1. Nearest Gene Annotation
This mode links each regulatory region to its nearest gene TSS.
Required inputs:
retsi
: Output fromspicey_retsi()
, containing RETSI valuesannot_method
: Set to"nearest"
txdb
: ATxDb
object with gene modelsannot_dbi
: AnnotationDbi object (e.g.,org.Hs.eg.db
) to add gene metadata
Optional parameters:
filter_promoter_distal
: Exclude promoter-proximal regions (default:TRUE
)filter_protein_coding
: Limit to protein-coding genes (default:TRUE
)keep_mito
: Retain mitochondrial peaks (default:FALSE
)add_tss_annotation
: Include TSS distance and related columns (default:FALSE
)link_spicey_measures
: Join RETSI and GETSI scores by gene (default:FALSE
)verbose
: Display progress (default:TRUE
)
retsi_gene_nearest <- annotate_with_nearest(
retsi = retsi,
txdb = TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene,
annot_dbi = org.Hs.eg.db::org.Hs.eg.db
)
head(retsi_gene_nearest)
#> seqnames start end width strand cell_type p_val_adj p_val
#> 1 chr7 142748901 142749963 1063 * Acinar 1.158316e-300 0
#> 2 chr14 66378630 66379738 1109 * Acinar 1.158316e-300 0
#> 3 chr22 27241597 27243500 1904 * Acinar 1.158316e-300 0
#> 4 chr17 82909239 82910392 1154 * Acinar 1.158316e-300 0
#> 5 chr2 79119834 79120951 1118 * Acinar 1.158316e-300 0
#> 6 chr10 116587488 116588562 1075 * Acinar 1.158316e-300 0
#> avg_log2FC region RETSI norm_entropy distanceToTSS
#> 1 6.025791 chr7:142748901-142749963 1 0.065607360 0
#> 2 5.742438 chr14:66378630-66379738 1 0.198069323 29965
#> 3 6.111680 chr22:27241597-27243500 1 0.007724027 546383
#> 4 6.066587 chr17:82909239-82910392 1 0.018789741 0
#> 5 6.160286 chr2:79119834-79120951 1 0.011755466 0
#> 6 6.299025 chr10:116587488-116588562 1 0.002236650 0
#> nearestGeneSymbol annotation
#> 1 PRSS1 Promoter
#> 2 CCDC196 Distal
#> 3 MN1 Distal
#> 4 TBCD Promoter
#> 5 REG1A Promoter
#> 6 PNLIPRP1 Promoter
2. Co-accessibility-Based Annotation
This mode links regulatory regions to genes via co-accessibility interactions, which must be computed externally (e.g., using Cicero or Signac).
Required inputs:
retsi
: Output fromspicey_retsi()
, containing RETSI valuesannot_method
: Set to"coaccessibility"
links
: Adata.frame
,GRangesList
, or Cicero/ArchR-style object with co-accessibility linkscoaccess_cutoff_override
: Minimum co-accessibility score (default:0.25
)
Optional parameters:
filter_promoter_distal
: Exclude promoter-proximal regions (default:TRUE
)filter_protein_coding
: Limit to protein-coding genes (default:TRUE
)keep_mito
: Retain mitochondrial peaks (default:FALSE
)add_tss_annotation
: Include TSS distance and related columns (default:FALSE
)link_spicey_measures
: Join RETSI and GETSI scores by gene (default:FALSE
)verbose
: Display progress (default:TRUE
)
# Filter links for high coaccessibility score
coacc_links <- cicero_links |>
dplyr::filter(coaccess > 0.5)
retsi_gene_coacc <- annotate_with_coaccessibility(
links = coacc_links,
retsi = retsi,
txdb = TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene,
annot_dbi = org.Hs.eg.db::org.Hs.eg.db,
coaccess_cutoff_override = 0.25
)
#> [1] "Coaccessibility cutoff used: 0.25"
head(retsi_gene_coacc)
#> seqnames start end width strand cell_type p_val_adj p_val
#> 1 chr7 142748901 142749963 1063 * Acinar 1.158316e-300 0
#> 2 chr14 66378630 66379738 1109 * Acinar 1.158316e-300 0
#> 3 chr22 27241597 27243500 1904 * Acinar 1.158316e-300 0
#> 4 chr17 82909239 82910392 1154 * Acinar 1.158316e-300 0
#> 5 chr2 79119834 79120951 1118 * Acinar 1.158316e-300 0
#> 6 chr10 116587488 116588562 1075 * Acinar 1.158316e-300 0
#> avg_log2FC region RETSI norm_entropy distanceToTSS
#> 1 6.025791 chr7:142748901-142749963 1 0.065607360 0
#> 2 5.742438 chr14:66378630-66379738 1 0.198069323 29965
#> 3 6.111680 chr22:27241597-27243500 1 0.007724027 546383
#> 4 6.066587 chr17:82909239-82910392 1 0.018789741 0
#> 5 6.160286 chr2:79119834-79120951 1 0.011755466 0
#> 6 6.299025 chr10:116587488-116588562 1 0.002236650 0
#> annotation gene_coacc in_TSS TSS_gene
#> 1 Promoter NA FALSE <NA>
#> 2 Distal NA FALSE <NA>
#> 3 Distal NA FALSE <NA>
#> 4 Promoter NA FALSE <NA>
#> 5 Promoter NA TRUE REG1A
#> 6 Promoter NA FALSE <NA>
Linking RETSI and GETSI Scores to Build Tissue-Specific Regulatory Networks
To construct an integrated regulatory network that links RETSI and GETSI scores, SPICEY allows you to associate both scores at the gene level using either annotation strategy:
If using nearest-gene annotation:
retsi_gene_nearest
: Output fromannotate_with_nearest()
, linking peaks to nearest genes with RETSI scoresgetsi
: Output fromspicey_getsi()
, providing GETSI scores for genes
spicey_nearest <- link_spicey_nearest(retsi_gene_nearest, getsi)
head(spicey_nearest)
#> seqnames start end width strand cell_type p_val_adj p_val
#> 1 chr7 142748901 142749963 1063 * Acinar 1.158316e-300 0
#> 2 chr22 27241597 27243500 1904 * Acinar 1.158316e-300 0
#> 3 chr17 82909239 82910392 1154 * Acinar 1.158316e-300 0
#> 4 chr2 79119834 79120951 1118 * Acinar 1.158316e-300 0
#> 5 chr10 116587488 116588562 1075 * Acinar 1.158316e-300 0
#> 6 chr10 116553223 116553959 737 * Acinar 1.158316e-300 0
#> avg_log2FC region RETSI RETSI_entropy distanceToTSS
#> 1 6.025791 chr7:142748901-142749963 1 0.065607360 0
#> 2 6.111680 chr22:27241597-27243500 1 0.007724027 546383
#> 3 6.066587 chr17:82909239-82910392 1 0.018789741 0
#> 4 6.160286 chr2:79119834-79120951 1 0.011755466 0
#> 5 6.299025 chr10:116587488-116588562 1 0.002236650 0
#> 6 6.849885 chr10:116553223-116553959 1 0.001301460 5130
#> gene_nearest annotation GETSI GETSI_entropy
#> 1 PRSS1 Promoter 1.0000000000 0.095847822
#> 2 MN1 Distal 0.0001055331 0.681930617
#> 3 TBCD Promoter 0.0060596138 0.491472040
#> 4 REG1A Promoter 1.0000000000 0.004473985
#> 5 PNLIPRP1 Promoter 0.7329113167 0.017352521
#> 6 PNLIP Distal 0.9528625893 0.013732911
If using co-accessibility annotation:
retsi_gene_coacc
: Output fromannotate_with_coaccessibility()
, linking peaks to co-accessible genes with RETSI scoresgetsi
: Output fromspicey_getsi()
, providing GETSI scores for genes
By integrating RETSI and GETSI, SPICEY enables comprehensive analysis of tissue-specific regulatory logic at both the epigenetic and transcriptional levels.
spicey_coacc <- link_spicey_coaccessible(retsi_gene_coacc, getsi)
head(spicey_coacc)
#> seqnames start end width strand cell_type p_val_adj p_val avg_log2FC region
#> 1 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> 2 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> 3 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> 4 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> 5 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> 6 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> RETSI RETSI_entropy distanceToTSS annotation gene_coacc in_TSS TSS_gene
#> 1 NA NA NA <NA> REG1A NA <NA>
#> 2 NA NA NA <NA> PRSS1 NA <NA>
#> 3 NA NA NA <NA> PNLIP NA <NA>
#> 4 NA NA NA <NA> ANKRD62 NA <NA>
#> 5 NA NA NA <NA> PRSS2 NA <NA>
#> 6 NA NA NA <NA> PTF1A NA <NA>
#> GETSI GETSI_entropy
#> 1 1.0000000 0.004473985
#> 2 1.0000000 0.095847822
#> 3 0.9528626 0.013732911
#> 4 0.8639239 0.024782578
#> 5 0.8493565 0.376959946
#> 6 0.8054632 0.011539343
Example: Run SPICEY whole pipeline
While SPICEY provides modular functions—such as
spicey_retsi()
, spicey_getsi()
, and
annotate_with_nearest()
—for users who prefer fine-grained
control, it also includes a wrapper function, run_spicey()
,
that streamlines the full workflow into a single, reproducible step.
This wrapper automatically determines which operations to perform based on the input data and parameters provided, making it ideal for users seeking a quick, end-to-end analysis without manually managing each intermediate step.
When supplied with differential accessibility (ATAC) and/or gene
expression (RNA) data, and a gene-linking strategy (“nearest” or
“coaccessibility”), run_spicey()
performs the following
operations as needed:
RETSI computation (if ATAC data is provided)
GETSI computation (if RNA data is provided)
-
Annotation of regulatory regions to target genes, via:
"nearest"
gene TSS or"coaccessibility"
links (e.g., from Cicero or ArchR)
Linking RETSI and GETSI scores by gene (if
link_spicey_measures = TRUE
)
The function returns a structured list containing all intermediate and final outputs, making it easy to inspect and reuse individual components of the pipeline.
For users who prefer to run each step independently—for example, to apply custom filters or transformations—each module remains fully accessible and can be run separately.
Overview: run_spicey()
Depending on the inputs provided, run_spicey()
performs:
- RETSI calculation – from differential accessibility data (ATAC-seq)
GETSI calculation – from differential expression data (RNA-seq)
-
Annotation of regulatory regions to genes (optional):
"nearest"
: Assign to closest gene"coaccessibility"
: Use user-supplied co-accessibility links
Linking RETSI and GETSI scores by gene (optional)
Arguments
-
atac
: AGRanges
object or data frame convertible toGRanges
, containing scATAC-seq differential accessibility results. Required columns:seqnames
,start
,end
: Genomic coordinatesavg_log2FC
: Average log2 fold-change in accessibilityp_val
: Raw or adjusted p-value from DA analysiscell_type
: Cell type or cluster label
-
rna
: AGRanges
object or data frame convertible toGRanges
, containing scRNA-seq differential expression results. Same structure asatac
:symbol
— identifier of the gene.avg_log2FC
: Average log2 fold-change in accessibilityp_val
: Raw or adjusted p-value from DA analysiscell_type
: Cell type or cluster label
annot_method
: One of"nearest"
(default) or"coaccessibility"
, indicating the method to link peaks to genes.links
: Required ifannot_method = "coaccessibility"
. Adata.frame
,GRangesList
, or Cicero/ArchR-style object with co-accessibility links (peak-pairs and scores).coaccess_cutoff_override
: (default = 0.25) Minimum co-accessibility score to retain a link.filter_promoter_distal
: (default = TRUE) Whether to exclude promoter-proximal interactions when using co-accessibility.filter_protein_coding
: (default = TRUE) Whether to restrict gene annotations to protein-coding genes.-
txdb
: ATxDb
object with gene models used for annotation.annot_dbi
: AnAnnotationDbi
object (e.g.,org.Hs.eg.db
) for adding gene symbols or additional metadata. keep_mito
: (default = FALSE) Whether to retain mitochondrial peaks.add_tss_annotation
: (default = FALSE) Whether to include TSS-related metadata (e.g., distance to TSS) in the output.link_spicey_measures
: (default = FALSE) Whether to merge RETSI and GETSI scores by gene after annotation.verbose
: (default = TRUE) Whether to display progress messages during execution.
spicey_nearest <- run_spicey(
atac = atac,
rna = rna,
annot_method = "nearest",
txdb = TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene,
annot_dbi = org.Hs.eg.db::org.Hs.eg.db,
link_spicey_measures = TRUE
)
head(spicey_nearest)
#> seqnames start end width strand cell_type p_val_adj p_val
#> 1 chr7 142748901 142749963 1063 * Acinar 1.158316e-300 0
#> 2 chr22 27241597 27243500 1904 * Acinar 1.158316e-300 0
#> 3 chr17 82909239 82910392 1154 * Acinar 1.158316e-300 0
#> 4 chr2 79119834 79120951 1118 * Acinar 1.158316e-300 0
#> 5 chr10 116587488 116588562 1075 * Acinar 1.158316e-300 0
#> 6 chr10 116553223 116553959 737 * Acinar 1.158316e-300 0
#> avg_log2FC region RETSI RETSI_entropy distanceToTSS
#> 1 6.025791 chr7:142748901-142749963 1 0.065607360 0
#> 2 6.111680 chr22:27241597-27243500 1 0.007724027 546383
#> 3 6.066587 chr17:82909239-82910392 1 0.018789741 0
#> 4 6.160286 chr2:79119834-79120951 1 0.011755466 0
#> 5 6.299025 chr10:116587488-116588562 1 0.002236650 0
#> 6 6.849885 chr10:116553223-116553959 1 0.001301460 5130
#> gene_nearest annotation GETSI GETSI_entropy
#> 1 PRSS1 Promoter 1.0000000000 0.095847822
#> 2 MN1 Distal 0.0001055331 0.681930617
#> 3 TBCD Promoter 0.0060596138 0.491472040
#> 4 REG1A Promoter 1.0000000000 0.004473985
#> 5 PNLIPRP1 Promoter 0.7329113167 0.017352521
#> 6 PNLIP Distal 0.9528625893 0.013732911
spicey_coacc <- run_spicey(
atac = atac,
rna = rna,
annot_method = "coaccessibility",
links = coacc_links,
txdb = TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene,
annot_dbi = org.Hs.eg.db::org.Hs.eg.db,
coaccess_cutoff_override = 0.25,
link_spicey_measures = TRUE
)
#> [1] "Coaccessibility cutoff used: 0.25"
head(spicey_coacc)
#> seqnames start end width strand cell_type p_val_adj p_val avg_log2FC region
#> 1 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> 2 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> 3 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> 4 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> 5 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> 6 <NA> NA NA NA <NA> Acinar NA NA NA <NA>
#> RETSI RETSI_entropy distanceToTSS annotation gene_coacc GETSI
#> 1 NA NA NA <NA> REG1A 1.0000000
#> 2 NA NA NA <NA> PRSS1 1.0000000
#> 3 NA NA NA <NA> PNLIP 0.9528626
#> 4 NA NA NA <NA> ANKRD62 0.8639239
#> 5 NA NA NA <NA> PRSS2 0.8493565
#> 6 NA NA NA <NA> PTF1A 0.8054632
#> GETSI_entropy
#> 1 0.004473985
#> 2 0.095847822
#> 3 0.013732911
#> 4 0.024782578
#> 5 0.376959946
#> 6 0.011539343
Output
The following output is a data frame
object containing
genomic regions, each representing a regulatory element
(e.g., enhancer or promoter) annotated with multiple metadata columns.
These columns provide statistical, biological, and functional
information derived from chromatin accessibility and gene expression
data.
Metadata Columns Description
Column | Description |
---|---|
seqnames |
Chromosome name (e.g., chr7 , chr17 ) where
the region is located. |
ranges |
Genomic coordinates (start–end) of the regulatory region. |
strand |
Strand information (* means unstranded, typical for
regulatory elements). |
width |
Length of the genomic region in base pairs. |
cell_type |
The specific cell type where the region shows significant
accessibility (e.g., "Acinar" ). |
p_val_adj |
Adjusted p-value from differential accessibility testing (multiple testing corrected). Small values indicate high significance. |
p_val |
Raw p-value from differential accessibility testing. |
avg_log2FC |
Average log2 fold-change in accessibility in the indicated cell type versus others; higher values indicate greater specificity. |
region |
Genomic interval as a string (e.g.,
"chr7:142748901-142749963" ). |
RETSI |
Regulatory Element Tissue Specificity Index; values near 1 indicate high specificity of accessibility to this cell type. |
RETSI_entropy |
Entropy-based measure of specificity, with lower values indicating more cell-type-specific accessibility. |
genes_HPAP, genes_nearest |
Names of genes linked to the regulatory region using co-accessibility or nearest-gene annotation. |
GETSI |
Gene Expression Tissue Specificity Index for linked genes, indicating how specific gene expression is to the cell type. |
GETSI_entropy |
Entropy-based measure of gene expression specificity, similar to RETSI entropy but applied at the transcript level. |
This annotated GRanges
object thus serves as a
comprehensive resource combining genomic location, cell type-specific
accessibility, gene linkage, and expression specificity — facilitating
downstream functional genomics analyses.