Skip to contents

Introduction

SPICEY is an R package designed to infer cell type specificity from single-cell omics data. It supports input from widely used frameworks such as Seurat (for single-cell RNA-seq) and Signac (for single-cell ATAC-seq), and computes metrics of cell-type specificity at both the regulatory and transcriptional levels.

Preamble

SPICEY is designed to operate on the results of differential analyses performed on either single-cell ATAC-seq (for chromatin accessibility) or single-cell RNA-seq (for gene expression) data. These analyses would compare each feature—either a regulatory element (e.g., peak) or gene—across cell types to identify cell-type-specific activity or expression. SPICEY takes advantage of the resulting statistics, specifically the log2 fold-change and associated adjusted p-values, which reflect both the magnitude and statistical confidence of differential activity. By leveraging these pre-computed differential results, SPICEY quantifies cell-type specificity through composite metrics that capture both the strength and selectivity of feature enrichment in individual cell types.

Cell-Type Specificity Indices: RETSI and GETSI

SPICEY introduces two complementary metrics:

  • RETSI (Regulatory Element Tissue Specificity Index): Quantifies the cell-type specificity of regulatory regions between cell types (e.g., ATAC-seq peaks).
  • GETSI (Gene Expression Tissue Specificity Index): Quantifies the specificity of gene expression across distinct cell types.

Both RETSI and GETSI are calculated using a weighted specificity score for each feature (regulatory region or gene), defined as:

TSIi,x=FCi,xmax(FC)xwi,x TSI_{i,x} = \frac{FC_{i,x}}{\max(FC)_x} \cdot w_{i,x}

wi,x=rescale(log10(padji,x),[0,1]) w_{i,x} = \text{rescale}\left( -\log_{10}(padj_{i,x}),\ [0, 1] \right)

where:

  • ii is a specific cell type
  • x{r,g}x \in \{r, g\}, with rr for regulatory region (RETSI) and gg for gene (GETSI)
  • FCi,x\text{FC}_{i,x} is the log fold-change of feature xx in cell type ii
  • wi,xw_{i,x} is the weight term, defined as the rescaled value of log10(adjusted p-value)-\log_{10}(\text{adjusted } p\text{-value}) between 0 and 1

Additionally, SPICEY supports the integration of regulatory elements with their putative target genes using multiple linking strategies, including:

  • Nearest-gene assignment
  • Co-accessibility-based linking (when co-accessibility links are available)

These linking approaches enable SPICEY to combine regulatory specificity (RETSI) and gene expression specificity (GETSI), facilitating integrated analyses of gene regulatory architecture at single-cell resolution.

This vignette demonstrates how to use SPICEY with preprocessed GRanges input files. These input files must be prepared in advance (not provided by the package).

Entropy-based specificity indices

To complement fold change-based indices, SPICEY also quantifies the distributional skewness of features using Shannon entropy. For each regulatory region or gene, activity or expression values across all NNN cell types are converted into a probability distribution:

H=i=1Npilog2(pi) H = - \sum_{i=1}^{N} p_i \log_2(p_i)

Hnorm=1eH H_{norm} = 1 - e^{H}

where:

  • pi,x=ai,xj=1Naj,xp_{i,x} = \frac{a_{i,x}}{\sum_{j=1}^{N} a_{j,x}} is the normalized activity or expression of feature xx across NN cell types
  • ai,xa_{i,x} is the activity (for RETSI) or expression (for GETSI) value in cell type $$
  • HxH_x is the Shannon entropy, and Normalized Entropy ranges between 0 (cell type specific) and 1 (cell type shared).

Installation

Install the development version from GitHub:

install.packages("devtools")
devtools::install_github("georginafp/SPICEY")

Input Requirements

SPICEY requires carefully prepared input files representing processed single-cell omics data, specifically from differential analyses of chromatin accessibility (ATAC-seq) and/or gene expression (RNA-seq). These inputs are not provided within the package and must be generated externally beforehand. Below is a detailed description of each required input and the necessary metadata.

1. Single cell ATAC-seq data

  • Format: GRanges or data.frame convertible to GRanges

  • Required columns:

    • seqnames, start, end — genomic coordinates of the peak.

    • avg_log2FC — average log2 fold-change of chromatin accessibility for the peak in a specific cell type (output from differential accessibility analysis).

    • p_val — raw or adjusted p-value corresponding to the differential accessibility test for the peak (output from differential accessibility analysis).

    • cell_type — cell type or cluster label associated with each differential measurement.

The ATAC peaks input must include multiple rows per peak representing differential statistics for each cell type. For example, the same peak may appear with different avg_log2FC, p_val, and cell_type values reflecting its differential accessibility in each cell type.

data("atac")

2. Single cell RNA-seq data

  • Format: GRanges or data.frame convertible to GRanges

  • Required columns:

    • symbol — identifier of the gene.

    • avg_log2FC — average log2 fold-change of chromatin accessibility for the peak in a specific cell type (output from differential accessibility analysis).

    • p_val — raw or adjusted p-value corresponding to the differential accessibility test for the peak (output from differential accessibility analysis).

    • cell_type — cell type or cluster label associated with each differential measurement.

The RNA input must represent differential expression results, i.e., statistics computed for each gene across cell types.

data("rna")
  • Purpose: To link regulatory elements based on co-accessibility inferred from tools like Signac’s LinkPeaks(). This supports integrative analysis of regulatory elements and gene expression.

  • Format:

    • Either a GInteractions or a GRangesList object with metadata columns.
  • Required columns:

    • Peaks1 — genomic coordinate or ID of the first peak in the pair.

    • Peaks2 — genomic coordinate or ID of the second peak linked by co-accessibility.

    • coaccess — co-accessibility score or correlation value.

    • CCAN1, CCAN2 — CCAN (co-accessibility network) identifiers for the peaks.

The peaks referenced here must match exactly those in the ATAC peaks input (same coordinates or peak IDs).

data("cicero_links")
head(cicero_links)
#>                       Peak1                     Peak2     coaccess
#> 1      chr2-5506813-5507737      chr2-5693250-5693911  0.093227933
#> 2  chr3-111728040-111728766  chr3-111836220-111836612 -0.000154212
#> 3    chr3-54281710-54282000    chr3-54014148-54014347  0.000000000
#> 4 chr11-128734460-128734775 chr11-128298619-128299732 -0.002326294
#> 5   chr13-93284776-93284975   chr13-93397849-93398575  0.039018033
#> 6    chr8-79879783-79881132    chr8-80122041-80123202 -0.018005821

Example: Run SPICEY step-by-step

This section demonstrates how to run SPICEY using prepared input files, as described in earlier sections.

Computing RETSI

To calculate the Regulatory Element Tissue Specificity Index (RETSI), the following input is required:

  • atac : A GRanges object or data frame convertible to GRanges containing differential accessibility results from scATAC-seq data. Required columns:

    • seqnames, start, end: Genomic coordinates of peaks

    • avg_log2FC: Average log2 fold-change of accessibility for each peak

    • p_val: Raw or adjusted p-value from differential accessibility tests

    • cell_type: Cell type or cluster label

retsi <- spicey_retsi(atac)
head(retsi)
#>   seqnames     start       end width strand cell_type     p_val_adj p_val
#> 1     chr7 142748901 142749963  1063      *    Acinar 1.158316e-300     0
#> 2    chr14  66378630  66379738  1109      *    Acinar 1.158316e-300     0
#> 3    chr22  27241597  27243500  1904      *    Acinar 1.158316e-300     0
#> 4    chr17  82909239  82910392  1154      *    Acinar 1.158316e-300     0
#> 5     chr2  79119834  79120951  1118      *    Acinar 1.158316e-300     0
#> 6    chr10 116587488 116588562  1075      *    Acinar 1.158316e-300     0
#>   avg_log2FC                    region RETSI norm_entropy
#> 1   6.025791  chr7:142748901-142749963     1  0.065607360
#> 2   5.742438   chr14:66378630-66379738     1  0.198069323
#> 3   6.111680   chr22:27241597-27243500     1  0.007724027
#> 4   6.066587   chr17:82909239-82910392     1  0.018789741
#> 5   6.160286    chr2:79119834-79120951     1  0.011755466
#> 6   6.299025 chr10:116587488-116588562     1  0.002236650

Computing GETSI

To calculate the Gene Expression Tissue Specificity Index (GETSI), the required input is:

  • rna: A GRanges object or data frame convertible to GRanges containing differential expression results from scRNA-seq data. Required columns are similar to those for atac but relevant for gene expression.

    • symbol — identifier of the gene.

    • avg_log2FC: Average log2 fold-change of accessibility for each peak

    • p_val: Raw or adjusted p-value from differential accessibility tests

    • cell_type: Cell type or cluster label

getsi <- spicey_getsi(rna)
head(getsi)
#>   gene_id         p_val avg_log2FC pct.1 pct.2     p_val_adj cell_type
#> 1   REG1A  0.000000e+00   6.446878 0.392 0.012 6.967263e-216    Acinar
#> 2   PRSS1 3.793351e-218   5.111544 0.608 0.067 6.967263e-216    Acinar
#> 3   PNLIP 5.687535e-208   5.156786 0.471 0.038 9.660474e-206    Acinar
#> 4 ANKRD62 8.944741e-189   5.186987 0.529 0.056 1.320605e-186    Acinar
#> 5   PRSS2 1.241892e-185   4.545974 0.568 0.066 1.799054e-183    Acinar
#> 6   PTF1A 3.711279e-176   5.042912 0.379 0.028 5.000185e-174    Acinar
#>       GETSI norm_entropy
#> 1 1.0000000  0.004473985
#> 2 1.0000000  0.095847822
#> 3 0.9528626  0.013732911
#> 4 0.8639239  0.024782578
#> 5 0.8493565  0.376959946
#> 6 0.8054632  0.011539343

Annotating Regulatory Elements to Target Genes

SPICEY allows you to annotate regulatory elements to their putative target genes using one of two methods:

1. Nearest Gene Annotation

This mode links each regulatory region to its nearest gene TSS.

Required inputs:

  • retsi: Output from spicey_retsi(), containing RETSI values

  • annot_method: Set to "nearest"

  • txdb: A TxDb object with gene models

  • annot_dbi: AnnotationDbi object (e.g., org.Hs.eg.db) to add gene metadata

Optional parameters:

  • filter_promoter_distal: Exclude promoter-proximal regions (default: TRUE)

  • filter_protein_coding: Limit to protein-coding genes (default: TRUE)

  • keep_mito: Retain mitochondrial peaks (default: FALSE)

  • add_tss_annotation: Include TSS distance and related columns (default: FALSE)

  • link_spicey_measures: Join RETSI and GETSI scores by gene (default: FALSE)

  • verbose: Display progress (default: TRUE)

retsi_gene_nearest <- annotate_with_nearest(
  retsi = retsi,
  txdb = TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene, 
  annot_dbi = org.Hs.eg.db::org.Hs.eg.db
)

head(retsi_gene_nearest)
#>   seqnames     start       end width strand cell_type     p_val_adj p_val
#> 1     chr7 142748901 142749963  1063      *    Acinar 1.158316e-300     0
#> 2    chr14  66378630  66379738  1109      *    Acinar 1.158316e-300     0
#> 3    chr22  27241597  27243500  1904      *    Acinar 1.158316e-300     0
#> 4    chr17  82909239  82910392  1154      *    Acinar 1.158316e-300     0
#> 5     chr2  79119834  79120951  1118      *    Acinar 1.158316e-300     0
#> 6    chr10 116587488 116588562  1075      *    Acinar 1.158316e-300     0
#>   avg_log2FC                    region RETSI norm_entropy distanceToTSS
#> 1   6.025791  chr7:142748901-142749963     1  0.065607360             0
#> 2   5.742438   chr14:66378630-66379738     1  0.198069323         29965
#> 3   6.111680   chr22:27241597-27243500     1  0.007724027        546383
#> 4   6.066587   chr17:82909239-82910392     1  0.018789741             0
#> 5   6.160286    chr2:79119834-79120951     1  0.011755466             0
#> 6   6.299025 chr10:116587488-116588562     1  0.002236650             0
#>   nearestGeneSymbol annotation
#> 1             PRSS1   Promoter
#> 2           CCDC196     Distal
#> 3               MN1     Distal
#> 4              TBCD   Promoter
#> 5             REG1A   Promoter
#> 6          PNLIPRP1   Promoter
2. Co-accessibility-Based Annotation

This mode links regulatory regions to genes via co-accessibility interactions, which must be computed externally (e.g., using Cicero or Signac).

Required inputs:

  • retsi: Output from spicey_retsi(), containing RETSI values

  • annot_method: Set to "coaccessibility"

  • links: A data.frame, GRangesList, or Cicero/ArchR-style object with co-accessibility links

  • coaccess_cutoff_override: Minimum co-accessibility score (default: 0.25)

Optional parameters:

  • filter_promoter_distal: Exclude promoter-proximal regions (default: TRUE)

  • filter_protein_coding: Limit to protein-coding genes (default: TRUE)

  • keep_mito: Retain mitochondrial peaks (default: FALSE)

  • add_tss_annotation: Include TSS distance and related columns (default: FALSE)

  • link_spicey_measures: Join RETSI and GETSI scores by gene (default: FALSE)

  • verbose: Display progress (default: TRUE)


# Filter links for high coaccessibility score
coacc_links <- cicero_links |> 
  dplyr::filter(coaccess > 0.5)

retsi_gene_coacc <- annotate_with_coaccessibility(
  links = coacc_links,
  retsi = retsi,
  txdb = TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene, 
  annot_dbi = org.Hs.eg.db::org.Hs.eg.db,
  coaccess_cutoff_override = 0.25
  )
#> [1] "Coaccessibility cutoff used: 0.25"

head(retsi_gene_coacc)
#>   seqnames     start       end width strand cell_type     p_val_adj p_val
#> 1     chr7 142748901 142749963  1063      *    Acinar 1.158316e-300     0
#> 2    chr14  66378630  66379738  1109      *    Acinar 1.158316e-300     0
#> 3    chr22  27241597  27243500  1904      *    Acinar 1.158316e-300     0
#> 4    chr17  82909239  82910392  1154      *    Acinar 1.158316e-300     0
#> 5     chr2  79119834  79120951  1118      *    Acinar 1.158316e-300     0
#> 6    chr10 116587488 116588562  1075      *    Acinar 1.158316e-300     0
#>   avg_log2FC                    region RETSI norm_entropy distanceToTSS
#> 1   6.025791  chr7:142748901-142749963     1  0.065607360             0
#> 2   5.742438   chr14:66378630-66379738     1  0.198069323         29965
#> 3   6.111680   chr22:27241597-27243500     1  0.007724027        546383
#> 4   6.066587   chr17:82909239-82910392     1  0.018789741             0
#> 5   6.160286    chr2:79119834-79120951     1  0.011755466             0
#> 6   6.299025 chr10:116587488-116588562     1  0.002236650             0
#>   annotation gene_coacc in_TSS TSS_gene
#> 1   Promoter         NA  FALSE     <NA>
#> 2     Distal         NA  FALSE     <NA>
#> 3     Distal         NA  FALSE     <NA>
#> 4   Promoter         NA  FALSE     <NA>
#> 5   Promoter         NA   TRUE    REG1A
#> 6   Promoter         NA  FALSE     <NA>

Linking RETSI and GETSI Scores to Build Tissue-Specific Regulatory Networks

To construct an integrated regulatory network that links RETSI and GETSI scores, SPICEY allows you to associate both scores at the gene level using either annotation strategy:

If using nearest-gene annotation:
spicey_nearest <- link_spicey_nearest(retsi_gene_nearest, getsi)
head(spicey_nearest)
#>   seqnames     start       end width strand cell_type     p_val_adj p_val
#> 1     chr7 142748901 142749963  1063      *    Acinar 1.158316e-300     0
#> 2    chr22  27241597  27243500  1904      *    Acinar 1.158316e-300     0
#> 3    chr17  82909239  82910392  1154      *    Acinar 1.158316e-300     0
#> 4     chr2  79119834  79120951  1118      *    Acinar 1.158316e-300     0
#> 5    chr10 116587488 116588562  1075      *    Acinar 1.158316e-300     0
#> 6    chr10 116553223 116553959   737      *    Acinar 1.158316e-300     0
#>   avg_log2FC                    region RETSI RETSI_entropy distanceToTSS
#> 1   6.025791  chr7:142748901-142749963     1   0.065607360             0
#> 2   6.111680   chr22:27241597-27243500     1   0.007724027        546383
#> 3   6.066587   chr17:82909239-82910392     1   0.018789741             0
#> 4   6.160286    chr2:79119834-79120951     1   0.011755466             0
#> 5   6.299025 chr10:116587488-116588562     1   0.002236650             0
#> 6   6.849885 chr10:116553223-116553959     1   0.001301460          5130
#>   gene_nearest annotation        GETSI GETSI_entropy
#> 1        PRSS1   Promoter 1.0000000000   0.095847822
#> 2          MN1     Distal 0.0001055331   0.681930617
#> 3         TBCD   Promoter 0.0060596138   0.491472040
#> 4        REG1A   Promoter 1.0000000000   0.004473985
#> 5     PNLIPRP1   Promoter 0.7329113167   0.017352521
#> 6        PNLIP     Distal 0.9528625893   0.013732911

If using co-accessibility annotation:

By integrating RETSI and GETSI, SPICEY enables comprehensive analysis of tissue-specific regulatory logic at both the epigenetic and transcriptional levels.

spicey_coacc <- link_spicey_coaccessible(retsi_gene_coacc, getsi)
head(spicey_coacc)
#>   seqnames start end width strand cell_type p_val_adj p_val avg_log2FC region
#> 1     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#> 2     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#> 3     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#> 4     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#> 5     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#> 6     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#>   RETSI RETSI_entropy distanceToTSS annotation gene_coacc in_TSS TSS_gene
#> 1    NA            NA            NA       <NA>      REG1A     NA     <NA>
#> 2    NA            NA            NA       <NA>      PRSS1     NA     <NA>
#> 3    NA            NA            NA       <NA>      PNLIP     NA     <NA>
#> 4    NA            NA            NA       <NA>    ANKRD62     NA     <NA>
#> 5    NA            NA            NA       <NA>      PRSS2     NA     <NA>
#> 6    NA            NA            NA       <NA>      PTF1A     NA     <NA>
#>       GETSI GETSI_entropy
#> 1 1.0000000   0.004473985
#> 2 1.0000000   0.095847822
#> 3 0.9528626   0.013732911
#> 4 0.8639239   0.024782578
#> 5 0.8493565   0.376959946
#> 6 0.8054632   0.011539343

Example: Run SPICEY whole pipeline

While SPICEY provides modular functions—such as spicey_retsi(), spicey_getsi(), and annotate_with_nearest()—for users who prefer fine-grained control, it also includes a wrapper function, run_spicey(), that streamlines the full workflow into a single, reproducible step.

This wrapper automatically determines which operations to perform based on the input data and parameters provided, making it ideal for users seeking a quick, end-to-end analysis without manually managing each intermediate step.

When supplied with differential accessibility (ATAC) and/or gene expression (RNA) data, and a gene-linking strategy (“nearest” or “coaccessibility”), run_spicey() performs the following operations as needed:

  • RETSI computation (if ATAC data is provided)

  • GETSI computation (if RNA data is provided)

  • Annotation of regulatory regions to target genes, via:

    • "nearest" gene TSS or

    • "coaccessibility" links (e.g., from Cicero or ArchR)

Linking RETSI and GETSI scores by gene (if link_spicey_measures = TRUE)

The function returns a structured list containing all intermediate and final outputs, making it easy to inspect and reuse individual components of the pipeline.

For users who prefer to run each step independently—for example, to apply custom filters or transformations—each module remains fully accessible and can be run separately.

Overview: run_spicey()

Depending on the inputs provided, run_spicey() performs:

  • RETSI calculation – from differential accessibility data (ATAC-seq)
  • GETSI calculation – from differential expression data (RNA-seq)

  • Annotation of regulatory regions to genes (optional):

    • "nearest": Assign to closest gene

    • "coaccessibility": Use user-supplied co-accessibility links

  • Linking RETSI and GETSI scores by gene (optional)

Arguments

  • atac: A GRanges object or data frame convertible to GRanges, containing scATAC-seq differential accessibility results. Required columns:

    • seqnames, start, end: Genomic coordinates

    • avg_log2FC: Average log2 fold-change in accessibility

    • p_val: Raw or adjusted p-value from DA analysis

    • cell_type: Cell type or cluster label

  • rna: A GRanges object or data frame convertible to GRanges, containing scRNA-seq differential expression results. Same structure as atac:

    • symbol — identifier of the gene.

    • avg_log2FC: Average log2 fold-change in accessibility

    • p_val: Raw or adjusted p-value from DA analysis

    • cell_type: Cell type or cluster label

  • annot_method: One of "nearest" (default) or "coaccessibility", indicating the method to link peaks to genes.

  • links: Required if annot_method = "coaccessibility". A data.frame, GRangesList, or Cicero/ArchR-style object with co-accessibility links (peak-pairs and scores).

  • coaccess_cutoff_override: (default = 0.25) Minimum co-accessibility score to retain a link.

  • filter_promoter_distal: (default = TRUE) Whether to exclude promoter-proximal interactions when using co-accessibility.

  • filter_protein_coding: (default = TRUE) Whether to restrict gene annotations to protein-coding genes.

  • txdb: A TxDb object with gene models used for annotation.

    annot_dbi: An AnnotationDbi object (e.g., org.Hs.eg.db) for adding gene symbols or additional metadata.

  • keep_mito: (default = FALSE) Whether to retain mitochondrial peaks.

  • add_tss_annotation: (default = FALSE) Whether to include TSS-related metadata (e.g., distance to TSS) in the output.

  • link_spicey_measures: (default = FALSE) Whether to merge RETSI and GETSI scores by gene after annotation.

  • verbose: (default = TRUE) Whether to display progress messages during execution.


spicey_nearest <- run_spicey(
  atac = atac, 
  rna = rna, 
  annot_method = "nearest", 
  txdb = TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene, 
  annot_dbi = org.Hs.eg.db::org.Hs.eg.db,
  link_spicey_measures = TRUE
)
head(spicey_nearest)
#>   seqnames     start       end width strand cell_type     p_val_adj p_val
#> 1     chr7 142748901 142749963  1063      *    Acinar 1.158316e-300     0
#> 2    chr22  27241597  27243500  1904      *    Acinar 1.158316e-300     0
#> 3    chr17  82909239  82910392  1154      *    Acinar 1.158316e-300     0
#> 4     chr2  79119834  79120951  1118      *    Acinar 1.158316e-300     0
#> 5    chr10 116587488 116588562  1075      *    Acinar 1.158316e-300     0
#> 6    chr10 116553223 116553959   737      *    Acinar 1.158316e-300     0
#>   avg_log2FC                    region RETSI RETSI_entropy distanceToTSS
#> 1   6.025791  chr7:142748901-142749963     1   0.065607360             0
#> 2   6.111680   chr22:27241597-27243500     1   0.007724027        546383
#> 3   6.066587   chr17:82909239-82910392     1   0.018789741             0
#> 4   6.160286    chr2:79119834-79120951     1   0.011755466             0
#> 5   6.299025 chr10:116587488-116588562     1   0.002236650             0
#> 6   6.849885 chr10:116553223-116553959     1   0.001301460          5130
#>   gene_nearest annotation        GETSI GETSI_entropy
#> 1        PRSS1   Promoter 1.0000000000   0.095847822
#> 2          MN1     Distal 0.0001055331   0.681930617
#> 3         TBCD   Promoter 0.0060596138   0.491472040
#> 4        REG1A   Promoter 1.0000000000   0.004473985
#> 5     PNLIPRP1   Promoter 0.7329113167   0.017352521
#> 6        PNLIP     Distal 0.9528625893   0.013732911

spicey_coacc <- run_spicey(
  atac = atac,
  rna = rna,
  annot_method = "coaccessibility",
  links = coacc_links,
  txdb = TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene, 
  annot_dbi = org.Hs.eg.db::org.Hs.eg.db,
  coaccess_cutoff_override = 0.25,
  link_spicey_measures = TRUE
)
#> [1] "Coaccessibility cutoff used: 0.25"
head(spicey_coacc)
#>   seqnames start end width strand cell_type p_val_adj p_val avg_log2FC region
#> 1     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#> 2     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#> 3     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#> 4     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#> 5     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#> 6     <NA>    NA  NA    NA   <NA>    Acinar        NA    NA         NA   <NA>
#>   RETSI RETSI_entropy distanceToTSS annotation gene_coacc     GETSI
#> 1    NA            NA            NA       <NA>      REG1A 1.0000000
#> 2    NA            NA            NA       <NA>      PRSS1 1.0000000
#> 3    NA            NA            NA       <NA>      PNLIP 0.9528626
#> 4    NA            NA            NA       <NA>    ANKRD62 0.8639239
#> 5    NA            NA            NA       <NA>      PRSS2 0.8493565
#> 6    NA            NA            NA       <NA>      PTF1A 0.8054632
#>   GETSI_entropy
#> 1   0.004473985
#> 2   0.095847822
#> 3   0.013732911
#> 4   0.024782578
#> 5   0.376959946
#> 6   0.011539343

Output

The following output is a data frame object containing genomic regions, each representing a regulatory element (e.g., enhancer or promoter) annotated with multiple metadata columns. These columns provide statistical, biological, and functional information derived from chromatin accessibility and gene expression data.

Metadata Columns Description

Column Description
seqnames Chromosome name (e.g., chr7, chr17) where the region is located.
ranges Genomic coordinates (start–end) of the regulatory region.
strand Strand information (* means unstranded, typical for regulatory elements).
width Length of the genomic region in base pairs.
cell_type The specific cell type where the region shows significant accessibility (e.g., "Acinar").
p_val_adj Adjusted p-value from differential accessibility testing (multiple testing corrected). Small values indicate high significance.
p_val Raw p-value from differential accessibility testing.
avg_log2FC Average log2 fold-change in accessibility in the indicated cell type versus others; higher values indicate greater specificity.
region Genomic interval as a string (e.g., "chr7:142748901-142749963").
RETSI Regulatory Element Tissue Specificity Index; values near 1 indicate high specificity of accessibility to this cell type.
RETSI_entropy Entropy-based measure of specificity, with lower values indicating more cell-type-specific accessibility.
genes_HPAP, genes_nearest Names of genes linked to the regulatory region using co-accessibility or nearest-gene annotation.
GETSI Gene Expression Tissue Specificity Index for linked genes, indicating how specific gene expression is to the cell type.
GETSI_entropy Entropy-based measure of gene expression specificity, similar to RETSI entropy but applied at the transcript level.

This annotated GRanges object thus serves as a comprehensive resource combining genomic location, cell type-specific accessibility, gene linkage, and expression specificity — facilitating downstream functional genomics analyses.

Visualization example

References