Releases: sanger-tol/genomeassembly
v0.50.0 - Threadtail
[0.50.0] - Threadtail - [2025-12-16]
This is an interrim development release, tagging all the major changes to the pipeline since the last release, and
is likely the final release before the release of version 1.0.
In this release, the main codebase of the pipeline has been heavily overhauled to be more stable, and to meet newer
Nextflow standards, while trying to maintain general compatibility with the interface of the previous version, where possible.
However, note that the input YAML format has been changed heavily, so old YAML input files will need to be re-written.
The full 1.0 release will encompass further changes, particularly another overhaul of the input schema such that the data
input schema is separated from the assembly input schema, allowing multiple assemblies to be parameterised simultaneously
using specific data resources.
The full changelog is as follows:
Added
General:
- Genericise file output prefixies to
asmorasm.{hap}(by @prototaxites) - Update nf-core template to 3.5.1 (by @prototaxites).
Overhaul pipeline input YAML structure (by @prototaxites):
- Validate the input YAML file with nf-schema
- Re-structure the input YAML a bit to be clearer
- Move all species-level metadata to a "metadata" subsection
- Move all input sequence data to a "sequencing_data" subsection
- Each sequence data readset has an optional FastK input section (directory, basename, kmer size)
- The long reads have an optional "coverage" input allowing pre-specification of the 1n coverage
- Move the database stuff (busco + oatk) to a "databases" subsection
Overhaul kmer subworkfow (by @prototaxites):
- Pass any read set (currently long reads, maternal + paternal reads) without a pre-supplied FastK db through FastK
- For long reads, estimate coverage with Genomescope2 only if coverage is not supplied
- For maternal and paternal reads, build Yak DBs and make the hapmer dbs using merquryfk/hapmaker
Overhaul raw assembly subworkflow (by @prototaxites):
- Hifiasm is first run in --bin-only mode to generate .bin files of error corrected reads
- These .bin files are then used to populate all assemblies
- Add a --hifiasm_options parameter to tune hifiasm command line arguments
- Pipeline then uses channel logic to decide
- run base assembly (always) - if the assembly is run without --primary, then hap1/hap2 are selected - otherwise pri/alt are used
- if hic_phasing is enabled AND hic data is available - run hic_phased assembly
- if trio_binning is enabled AND trio data is available - run trio_binned assembly
- Replace GFA_TO_FASTA with GAWK
Overhaul purging subworkflow (by @prototaxites):
- Use sanger-tol/fasta_purge_retained_haplotype subworkflow
- Add parameter purging_reads_chunk_size which defines how many reads to map in a single chunked job
- Add parameter purging_assemblytypes which takes a comma-separated list of types of assembly ("primary", "hic_phased", "trio_binned") to define which assembly types to purge. (default "primary")
- Add parameter purging_purge_middle which if set removes the -e arg from the purgedups/getseqs command, allowing purging in the middle of contigs
- Add parameter purging_cutoffs which if set overrides the automatic setting of the purge_dups calcuts cutoff parameters. Must be a comma-separated string with three entries - e.g. "5,20,100".
- Remove GET_CALCUTS_PARAMS and calculate these args directly from the coverage meta value
Overhaul polishing subworkflow (by @prototaxites):
- Subworkflow can take multiple assemblies for polishing
- Tidy up code to use modern Nextflow practices; avoid name collisions using meta objects for prefix setting
- Replace BED_CHUNKS module with GAWK
- Remove LONGRANGER_COVERAGE and just pull this value out the summary file with some Groovy
- Move the hap1/hap2 splitting inside the subworkflow and accomplish this with Seqkit grep + known regular expressions
- Update all nf-core modules used by the subworkflow
Overhaul Hi-C mapping subworkflow (by @prototaxites):
- Use new sanger-tol shared Hi-C mapping subworkflow
- This can map with both bwa-mem2 or minimap + implements CRAM chunking
- only runs if --enable_scaffolding set
- use nf-core bam_stats_samtools subworkflow for mapping statistics
Overhaul scaffolding subworkflow (by @prototaxites):
- use new sanger-tol scaffolding subworkflow
- Update YaHS to newest version
- new module YAHS_MAKEPAIRSFILE which creates a pairs file of reads mapping to scaffolds - replaces PREPARE_PRETEXTMAP_INPUT
- Update PretextMap + PretextSnapshot to latest versions
- Update Cooler modules, overhauling their inputs
- Use new JUICERTOOLS_PRE nf-core module, so we don't have to ship the jar any more
- Supports multi-threading
- This module has
errorStragegy = "ignore"set to skip failures.
Overhaul genome statistics subworkflow (by @prototaxites):
- Use new sanger-tol shared genome statistics subworkflow
- Update to BUSCO 6.0
Overhaul organelle assembly subworkflow (by @prototaxites)
- Added parameters to control behaviour of oatk and mitohifi
- sets
errorStragegy = "ignore"for mitohifi/findmitoreference and mitohifi/mitohifi itself --oatk_kmer_sizeand--oatk_coverage--mitohifi_min_ref_len,--mitohifi_reads_argsand--mitohifi_contigs_args
- sets
Fixed
Dependencies
| Module | Tool | Old version | New version |
|---|---|---|---|
| bcftools/concat | bcftools | 1.16 | 1.21 |
| bcftools/consensus | bcftools | 1.16 | 1.21 |
| bcftools/index | bcftools | 1.16 | 1.21 |
| bcftools/norm | bcftools | 1.16 | 1.21 |
| bcftools/sort | bcftools | 1.16 | 1.21 |
| bcftools/view | bcftools | 1.16 | 1.21 |
| busco/busco | busco | 5.4.3 | 6.0.0 |
| bwamem2/index | bwa-mem2 | 2.2.1 | 2.3 |
| cooler/cload | cooler | 0.8.11 | 0.10.4 |
| cooler/zoomify | cooler | 0.8.11 | 0.10.4 |
| fastk/fastk | fastk | f18a4e6d2207539f7b84461daebc54530a9559b0 | 1.1.0 |
| fastk/histex | fastk | f18a4e6d2207539f7b84461daebc54530a9559b0 | 1.1.0 |
| gatk4/mergevcfs | gatk4 | 4.4.0.0 | 4.6.1.0 |
| gatk4/mergevcfs | gcnvkernel | 0.9 | |
| gawk | gawk | - | 5.3.0 |
| genescopefk | genescopefk | 380815c420f50171f9234a0fd1ff426b39829b91 | - |
| genomescope2 | genomescope2 | - | 2.0 |
| gfastats | gfastats | 1.3.5 | 1.3.10 |
| juicertools/pre | juicertools | 1.9.9 | 2.20.00 |
| merquryfk/hapmaker | merquryfk | - | 1.1.1 |
| merquryfk/merquryfk | merquryfk | 8ae344092df5dcaf83cfb7f90f662597a9b1fc61 | 1.1.1 |
| minimap2/align | minimap2 | 2.24 | 2.30 |
| minimap2/align | samtools | 1.14 | 1.21 |
| minimap2/index | minimap2 | 2.24 | 2.30 |
| mitohifi/findmitoreference | mitohifi | 3.2.0 | 3.2.3 |
| mitohifi/mitohifi | mitohifi | 3.2.0 | 3.2.3 |
| pretextmap | pretextmap | 0.1.8 | 0.1.9 |
| pretextmap | samtools | 1.14 | 1.17 |
| purgedups/histplot | purgedups | - | 1.2.6 |
| samtools/faidx | samtools | 1.17 | 1.22.1 |
| samtools/flagstat | samtools | 1.17 | 1.22.1 |
| samtools/idxstats | samtools | 1.17 | 1.22.1 |
| samtools/index | samtools | 1.17 | 1.22.1 |
| samtools/merge | samtools | 1.17 | 1.22.1 |
| samtools/splitheader | samtools | 1.17 | 1.22.1 |
| samtools/stats | samtools | 1.17 | 1.22.1 |
| seqkit/grep | seqkit | - | 2.9.0 |
| tabix/bgzip | tabix | - | 1.21 |
| yahs | yahs | 1.2a.2 | 1.2.2 |
| yak/count | yak | - ... |
v0.10.0 – Hideous Zippleback
Enhancements & fixes
- OATK module is added into the ORGANELLES subworkflow
- ORGANELLES subworkflow is now called once in the main workflow and runs MITOHIFI in read and assembly mode along with OATK
- ORGANELLES module is now tested in github CI
- NCBI API secret introduced to run MITOHIFI_FINDMITOREFERENCE module
- hifiasm haplotigs are not purged anymore
- Longranger container version is updated
Software dependencies
Note, since the pipeline is using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.
| Dependency | Old version | New version |
|---|---|---|
| mitohifi | 3.0.0 | 3.1.1 |
| oatk | 1.0 |
NB: Dependency has been added if just the new version information is present.
v0.9.0 – Night Fury
Initial release of sanger-tol/genomeassembly, created with the nf-core template.
This pipeline performs genome and organelles assembly following Tree of Life assembly practices.
What's changed:
Enhancements & fixes
- Created with nf-core/tools template v2.8.0.
- Implements the basic genome assembly pipeline used in ToL
- Subworkflow to generate Genomescope model
- Subworkflow to generate a raw genome assembly
- Subworkflow to purge haplotigs from primary
- Subworkflow for polishing with 10X data
- Subworkflow for HiC mapping
- Subworkflow for scaffolding
- Subworkflow to run organelles assembly
- Subworkflow to estimate statistics using gfastats, BUSCO, MerquryFK.
Parameters
| Old parameter | New parameter |
|---|---|
| --input | |
| --bed_chunks_polishing | |
| --cool_bin | |
| --polishing_on | |
| --hifiasm_hic_on | |
| --organelles_on |
NB: Parameter has been added if just the new parameter information is present.
Software dependencies
Note, since the pipeline is using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.
| Dependency | Old version | New version |
|---|---|---|
| bcftools | 1.17, 1.16 | |
| bed_chunks.sh | 1.0 | |
| bedtools | 2.31.0 | |
| busco | 5.4.3 | |
| bwamem2 | 2.2.1 | |
| coverage.sh | 1.0 | |
| cooler | 0.8.11 | |
| cut | 8.30 | |
| fastk | f18a4e6d2207539f7b84461daebc54530a9559b0 | |
| freebayes | 1.3.6 | |
| gatk4 | 4.4.0.0 | |
| gfastats | 1.3.5 | |
| GNU Awk | 5.1.0 | |
| hifiasm | 0.19.3-r572 | |
| juicer | 1.1 | |
| juicer tools | 1.9.9 | |
| longranger | 2.2.2 | |
| merquryfk | 8ae344092df5dcaf83cfb7f90f662597a9b1fc61 | |
| minimap2 | 2.24-r1122 | |
| mitohifi | 3.0.0 | |
| pigz | 2.3.4 | |
| pretextmap | 0.1.9 | |
| pretextsnapshot | 0.0.4 | |
| purge_dups | 1.2.5 | |
| python | 3.11.0, 3.8.1 | |
| R | 4.2.0 | |
| samtools | 1.17, 1.16.1 | |
| seqtk | 1.3-r106 | |
| yahs | 1.2a.1 | |
| yaml | 6.0 |
NB: Dependency has been added if just the new version information is present.