Similar to #3 but wondering if things have changed.
Running cellSNP v0.1.7 as
cellSNP --samFile ${CELLRANGERDIR}/"${SAMPLE}"/outs/possorted_genome_bam.bam \
--outDir ${OUTDIR} \
--regionsVCF genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf.gz \
--barcodeFile ${PROJECT_ROOT}/data/emptyDrops/"${SAMPLE}".barcodes.txt \
--nproc 20 \
--minMAF 0.1 \
--minCOUNT 20
with 31,707 barcodes on a 25G BAM file has been going for > 18 days!
It's still writing output, too (as of 2020-02-03 5PM):
% ll -t data/cellSNP/cellSNP.cells.vcf.gz.temp_*
-rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_17_
-rw-r----- 1 hickey grpu_mritchie_1 2.4G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_3_
-rw-r----- 1 hickey grpu_mritchie_1 2.3G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_11_
-rw-r----- 1 hickey grpu_mritchie_1 2.2G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_15_
-rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 17:01 data/cellSNP/cellSNP.cells.vcf.gz.temp_16_
-rw-r----- 1 hickey grpu_mritchie_1 1.1G Feb 3 16:59 data/cellSNP/cellSNP.cells.vcf.gz.temp_19_
-rw-r----- 1 hickey grpu_mritchie_1 1.9G Feb 3 16:56 data/cellSNP/cellSNP.cells.vcf.gz.temp_12_
-rw-r----- 1 hickey grpu_mritchie_1 2.2G Feb 3 16:55 data/cellSNP/cellSNP.cells.vcf.gz.temp_8_
-rw-r----- 1 hickey grpu_mritchie_1 2.4G Feb 3 16:54 data/cellSNP/cellSNP.cells.vcf.gz.temp_6_
-rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 16:54 data/cellSNP/cellSNP.cells.vcf.gz.temp_9_
-rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 16:51 data/cellSNP/cellSNP.cells.vcf.gz.temp_10_
-rw-r----- 1 hickey grpu_mritchie_1 2.1G Feb 3 16:50 data/cellSNP/cellSNP.cells.vcf.gz.temp_1_
-rw-r----- 1 hickey grpu_mritchie_1 2.2G Feb 3 16:46 data/cellSNP/cellSNP.cells.vcf.gz.temp_2_
-rw-r----- 1 hickey grpu_mritchie_1 2.3G Feb 3 16:38 data/cellSNP/cellSNP.cells.vcf.gz.temp_14_
-rw-r----- 1 hickey grpu_mritchie_1 2.3G Feb 3 16:37 data/cellSNP/cellSNP.cells.vcf.gz.temp_13_
-rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 16:32 data/cellSNP/cellSNP.cells.vcf.gz.temp_4_
-rw-r----- 1 hickey grpu_mritchie_1 2.5G Feb 3 16:19 data/cellSNP/cellSNP.cells.vcf.gz.temp_7_
-rw-r----- 1 hickey grpu_mritchie_1 2.1G Feb 3 15:04 data/cellSNP/cellSNP.cells.vcf.gz.temp_18_
-rw-r----- 1 hickey grpu_mritchie_1 1.9G Feb 3 13:44 data/cellSNP/cellSNP.cells.vcf.gz.temp_0_
-rw-r----- 1 hickey grpu_mritchie_1 1.8G Feb 2 23:52 data/cellSNP/cellSNP.cells.vcf.gz.temp_5_
I've run cellSNP before and although it took a few days it certainly didn't take this long.
I'm wondering:
- What particular parts of this (e.g., size of BAM, number of barcodes, number of loci in
--regionVCF, ...) might be causing this huge runtime?
- What might I do to speed cellSNP up for subsequent datasets (I'm anticipating several datasets, many larger than this, over the course of the year)?
- How can I estimate how much longer this particular process has to run?
Thanks,
Pete
Similar to #3 but wondering if things have changed.
Running cellSNP v0.1.7 as
with 31,707 barcodes on a 25G BAM file has been going for > 18 days!
It's still writing output, too (as of 2020-02-03 5PM):
% ll -t data/cellSNP/cellSNP.cells.vcf.gz.temp_* -rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_17_ -rw-r----- 1 hickey grpu_mritchie_1 2.4G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_3_ -rw-r----- 1 hickey grpu_mritchie_1 2.3G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_11_ -rw-r----- 1 hickey grpu_mritchie_1 2.2G Feb 3 17:02 data/cellSNP/cellSNP.cells.vcf.gz.temp_15_ -rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 17:01 data/cellSNP/cellSNP.cells.vcf.gz.temp_16_ -rw-r----- 1 hickey grpu_mritchie_1 1.1G Feb 3 16:59 data/cellSNP/cellSNP.cells.vcf.gz.temp_19_ -rw-r----- 1 hickey grpu_mritchie_1 1.9G Feb 3 16:56 data/cellSNP/cellSNP.cells.vcf.gz.temp_12_ -rw-r----- 1 hickey grpu_mritchie_1 2.2G Feb 3 16:55 data/cellSNP/cellSNP.cells.vcf.gz.temp_8_ -rw-r----- 1 hickey grpu_mritchie_1 2.4G Feb 3 16:54 data/cellSNP/cellSNP.cells.vcf.gz.temp_6_ -rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 16:54 data/cellSNP/cellSNP.cells.vcf.gz.temp_9_ -rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 16:51 data/cellSNP/cellSNP.cells.vcf.gz.temp_10_ -rw-r----- 1 hickey grpu_mritchie_1 2.1G Feb 3 16:50 data/cellSNP/cellSNP.cells.vcf.gz.temp_1_ -rw-r----- 1 hickey grpu_mritchie_1 2.2G Feb 3 16:46 data/cellSNP/cellSNP.cells.vcf.gz.temp_2_ -rw-r----- 1 hickey grpu_mritchie_1 2.3G Feb 3 16:38 data/cellSNP/cellSNP.cells.vcf.gz.temp_14_ -rw-r----- 1 hickey grpu_mritchie_1 2.3G Feb 3 16:37 data/cellSNP/cellSNP.cells.vcf.gz.temp_13_ -rw-r----- 1 hickey grpu_mritchie_1 2.0G Feb 3 16:32 data/cellSNP/cellSNP.cells.vcf.gz.temp_4_ -rw-r----- 1 hickey grpu_mritchie_1 2.5G Feb 3 16:19 data/cellSNP/cellSNP.cells.vcf.gz.temp_7_ -rw-r----- 1 hickey grpu_mritchie_1 2.1G Feb 3 15:04 data/cellSNP/cellSNP.cells.vcf.gz.temp_18_ -rw-r----- 1 hickey grpu_mritchie_1 1.9G Feb 3 13:44 data/cellSNP/cellSNP.cells.vcf.gz.temp_0_ -rw-r----- 1 hickey grpu_mritchie_1 1.8G Feb 2 23:52 data/cellSNP/cellSNP.cells.vcf.gz.temp_5_I've run cellSNP before and although it took a few days it certainly didn't take this long.
I'm wondering:
--regionVCF, ...) might be causing this huge runtime?Thanks,
Pete