Advances in single-cell sequencing and CRISPR technologies have enabled detailed case-control comparisons and experimental perturbations at single-cell resolution. However, uncovering causal relationships in observational genomic data remains challenging due to selection bias and inadequate adjustment for unmeasured confounders, particularly in heterogeneous datasets. To address these challenges, we introduce causarray [Du25], a doubly robust causal inference framework for analyzing array-based genomic data at both bulk-cell and single-cell levels. causarray integrates a generalized confounder adjustment method to account for unmeasured confounders and employs semiparametric inference with flexible machine learning techniques to ensure robust statistical estimation of treatment effects.
We recommend using causarray in a conda environment:
# create a new conda environment and install the necessary packages
conda create -n causarray python=3.12 -y
# activate the environment
conda activate causarrayThe module can be installed via PyPI:
pip install causarrayFor optimal parallel performance, we recommend installing llvm-openmp if using conda:
conda install -c conda-forge llvm-openmpFor R users, reticulate can be used to call causarray from R.
The documentation and tutorials using both Python and R are available at causarray.readthedocs.io.
For screens with hundreds to thousands of perturbations, use the batch API so that peak memory is bounded by one batch at a time:
from causarray import gcate_lfc_batch
df_res = gcate_lfc_batch(
Y, X, A, r,
batch_size=10, # perturbations per batch (or use n_batches= for a fixed count)
max_cells=2000, # max pert cells per batch (ctrl added on top)
n_ctrl=2000, # fixed ctrl subsample shared across batches
cache_path='results.h5', # resume if interrupted
verbose=True,
)See the Replogle-E-K562 tutorial for a demonstration on 200 perturbations from a genome-wide CRISPRi screen.
- (2025-01-30) Python package released on PyPI
- (2025-02-01) Code for reproducing figures in paper
- (2025-02-02) Tutorial for Python and R
- (2026-05-31) Batch fitting API (
gcate_lfc_batch) for large-scale screens - (2026-05-31) Documentation at causarray.readthedocs.io
[Du25] Jin-Hong Du, Maya Shen, Hansruedi Mathys, and Kathryn Roeder (2025). Causal differential expression analysis under unmeasured confounders with causarray. bioRxiv, 2025-01.