Skip to content

jaydu1/causarray

Repository files navigation

Documentation Status PyPI PyPI-Downloads

causarray

Advances in single-cell sequencing and CRISPR technologies have enabled detailed case-control comparisons and experimental perturbations at single-cell resolution. However, uncovering causal relationships in observational genomic data remains challenging due to selection bias and inadequate adjustment for unmeasured confounders, particularly in heterogeneous datasets. To address these challenges, we introduce causarray [Du25], a doubly robust causal inference framework for analyzing array-based genomic data at both bulk-cell and single-cell levels. causarray integrates a generalized confounder adjustment method to account for unmeasured confounders and employs semiparametric inference with flexible machine learning techniques to ensure robust statistical estimation of treatment effects.

Usage

We recommend using causarray in a conda environment:

# create a new conda environment and install the necessary packages
conda create -n causarray python=3.12 -y

# activate the environment
conda activate causarray

The module can be installed via PyPI:

pip install causarray

For optimal parallel performance, we recommend installing llvm-openmp if using conda:

conda install -c conda-forge llvm-openmp

For R users, reticulate can be used to call causarray from R. The documentation and tutorials using both Python and R are available at causarray.readthedocs.io.

Batch fitting for large-scale screens

For screens with hundreds to thousands of perturbations, use the batch API so that peak memory is bounded by one batch at a time:

from causarray import gcate_lfc_batch

df_res = gcate_lfc_batch(
    Y, X, A, r,
    batch_size=10,    # perturbations per batch (or use n_batches= for a fixed count)
    max_cells=2000,   # max pert cells per batch (ctrl added on top)
    n_ctrl=2000,      # fixed ctrl subsample shared across batches
    cache_path='results.h5',   # resume if interrupted
    verbose=True,
)

See the Replogle-E-K562 tutorial for a demonstration on 200 perturbations from a genome-wide CRISPRi screen.

Changelog

  • (2025-01-30) Python package released on PyPI
  • (2025-02-01) Code for reproducing figures in paper
  • (2025-02-02) Tutorial for Python and R
  • (2026-05-31) Batch fitting API (gcate_lfc_batch) for large-scale screens
  • (2026-05-31) Documentation at causarray.readthedocs.io

References

[Du25] Jin-Hong Du, Maya Shen, Hansruedi Mathys, and Kathryn Roeder (2025). Causal differential expression analysis under unmeasured confounders with causarray. bioRxiv, 2025-01.

About

causarray is a Python module for simultaneous causal inference with an array of outcomes.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors