Pipeline Integration Tutorialο
This tutorial walks through running siRNAforgeβs embedded Nextflow pipeline for large-scale off-target analysis and explains how it maps back to the Python entrypoints documented in API Reference.
When to Use the Pipelineο
Use the Nextflow workflow when you need to:
Score hundreds of candidate genes or transcripts in batch
Run transcriptome + miRNA off-target analysis with BWA-MEM2
Leverage Docker/Conda environments reproducibly on HPC or cloud infrastructure
For small jobs or interactive exploration, the CLI (sirnaforge workflow ...) is sufficient.
Prerequisitesο
Docker (recommended) or a local environment with Nextflow β₯ 25.04
4+ CPU cores and enough RAM for your transcriptomes (human can require 32GB+ during indexing)
siRNA candidate FASTA/CSV generated by
sirnaforge workflowor the Python API
Tip: The Docker image
ghcr.io/austin-s-h/sirnaforge:latestalready contains Nextflow, BWA-MEM2, SAMtools, and ViennaRNA.
Step 1 β Prepare Candidatesο
Use the CLI to generate candidates for one or more genes:
uv run sirnaforge workflow TP53 --output-dir results/tp53 --top-n 50
Collect the candidate FASTA (
sirna_candidates.fasta) from each workflow output directory and concatenate them if you are processing multiple genes.
Step 2 β Configure Referencesο
siRNAforge manages reference transcriptomes and miRNA databases via its cache.
Common knobs:
--species: drives both transcriptome fetching and miRNA seed matching defaults.--mirna-db/--mirna-species: control miRNA reference selection.--transcriptome-fasta: override/extend the transcriptome reference.--offtarget-indices: override alignment indices for specific species (species:/abs/path/index_prefix).
Step 3 β Run Off-Target Analysis (embedded Nextflow)ο
For most users, run the off-target step via the CLI; it executes the embedded Nextflow workflow under the hood.
# Full workflow (includes embedded Nextflow off-target analysis)
uv run sirnaforge workflow TP53 --output-dir results/tp53
# Off-target only (if you already have a candidates FASTA)
uv run sirnaforge offtarget \
--input-candidates-fasta results/tp53/off_target/input_candidates.fasta \
--output-dir results/tp53/off_target \
--species "human,rat"
Important parameters:
--candidates: FASTA file with siRNA sequences (one per record)--outdir: Output directory for per-species TSV/JSON summaries--genome_species: Comma-separated list for miRNA genome lookups (not genomic DNA alignment)--max_hits,--bwa_k,--bwa_T,--seed_start,--seed_end: Override alignment sensitivity (see pipeline config)
Running Nextflow Directly (advanced)ο
If you need to integrate the pipeline into an external Nextflow setup (HPC, custom profiles), you can discover the embedded main.nf path and run it directly:
PIPELINE_NF=$(uv run python -c "from sirnaforge.pipeline.nextflow.runner import NextflowRunner; print(NextflowRunner().get_main_workflow())")
nextflow run "$PIPELINE_NF" --help
Docker + local profile (no Docker-in-Docker)ο
Inside the ghcr.io/austin-s-h/sirnaforge:latest image, use the local profile so Nextflow does not try to start nested containers.
The minimum mounts are:
workspace:
-v $(pwd):/workspace -w /workspacepersistent cache:
-v ~/.cache/sirnaforge:/home/sirnauser/.cache/sirnaforge
You can also use the Makefile helper:
make docker-nextflow-help
Step 4 β Inspect Outputsο
The pipeline mirrors the CLI output structure:
nextflow_results/
βββ logs/
βββ off_target/
β βββ TP53_human_analysis.tsv
β βββ TP53_rat_analysis.tsv
βββ sirnaforge/
β βββ manifest.json
βββ workflow_summary.json
Each TSV/JSON pair originates from the corresponding Python entrypoints in sirnaforge.core.off_target, invoked by the embedded workflow.
Troubleshootingο
Missing genome files β verify paths in
genomes.yamland that containers can reach the host paths (bind mount the directories when using Docker).Slow throughput β lower
--genome_species, reduce--top-nduring candidate generation, or increase parallelism with-process.maxForksinnextflow.config.Tool not found β ensure you are using the provided Docker image or run
make docker-buildto rebuild locally.
Additional Resourcesο
docs/developer/testing_guide.md: Guidance on pipeline-related test markers
With these steps you can orchestrate the full off-target pipeline at scale while reusing the same validated Python components exposed in the API reference.