Pipeline Integration Tutorial

This tutorial walks through running siRNAforge’s embedded Nextflow pipeline for large-scale off-target analysis and explains how it maps back to the Python entrypoints documented in API Reference.

When to Use the Pipeline

Use the Nextflow workflow when you need to:

Score hundreds of candidate genes or transcripts in batch
Run transcriptome + miRNA off-target analysis with BWA-MEM2
Leverage Docker/Conda environments reproducibly on HPC or cloud infrastructure

For small jobs or interactive exploration, the CLI (sirnaforge workflow ...) is sufficient.

Prerequisites

Docker (recommended) or a local environment with Nextflow ≥ 25.04
4+ CPU cores and enough RAM for your transcriptomes (human can require 32GB+ during indexing)
siRNA candidate FASTA/CSV generated by sirnaforge workflow or the Python API

Tip: The Docker image ghcr.io/austin-s-h/sirnaforge:latest already contains Nextflow, BWA-MEM2, SAMtools, and ViennaRNA.

Step 1 – Prepare Candidates

Use the CLI to generate candidates for one or more genes:

uv run sirnaforge workflow TP53 --output-dir results/tp53 --top-n 50

Collect the candidate FASTA (sirna_candidates.fasta) from each workflow output directory and concatenate them if you are processing multiple genes.

Step 2 – Configure References

siRNAforge manages reference transcriptomes and miRNA databases via its cache.

Common knobs:

--species: drives both transcriptome fetching and miRNA seed matching defaults.
--mirna-db / --mirna-species: control miRNA reference selection.
--transcriptome-fasta: override/extend the transcriptome reference.
--offtarget-indices: override alignment indices for specific species (species:/abs/path/index_prefix).

Step 3 – Run Off-Target Analysis (embedded Nextflow)

For most users, run the off-target step via the CLI; it executes the embedded Nextflow workflow under the hood.

# Full workflow (includes embedded Nextflow off-target analysis)
uv run sirnaforge workflow TP53 --output-dir results/tp53

# Off-target only (if you already have a candidates FASTA)
uv run sirnaforge offtarget \
  --input-candidates-fasta results/tp53/off_target/input_candidates.fasta \
  --output-dir results/tp53/off_target \
  --species "human,rat"

Important parameters:

--candidates: FASTA file with siRNA sequences (one per record)
--outdir: Output directory for per-species TSV/JSON summaries
--genome_species: Comma-separated list for miRNA genome lookups (not genomic DNA alignment)
--max_hits, --bwa_k, --bwa_T, --seed_start, --seed_end: Override alignment sensitivity (see pipeline config)

Running Nextflow Directly (advanced)

If you need to integrate the pipeline into an external Nextflow setup (HPC, custom profiles), you can discover the embedded main.nf path and run it directly:

PIPELINE_NF=$(uv run python -c "from sirnaforge.pipeline.nextflow.runner import NextflowRunner; print(NextflowRunner().get_main_workflow())")
nextflow run "$PIPELINE_NF" --help

Docker + local profile (no Docker-in-Docker)

Inside the ghcr.io/austin-s-h/sirnaforge:latest image, use the local profile so Nextflow does not try to start nested containers.

The minimum mounts are:

workspace: -v $(pwd):/workspace -w /workspace
persistent cache: -v ~/.cache/sirnaforge:/home/sirnauser/.cache/sirnaforge

You can also use the Makefile helper:

make docker-nextflow-help

Step 4 – Inspect Outputs

The pipeline mirrors the CLI output structure:

nextflow_results/
├── logs/
├── off_target/
│   ├── TP53_human_analysis.tsv
│   └── TP53_rat_analysis.tsv
├── sirnaforge/
│   └── manifest.json
└── workflow_summary.json

Each TSV/JSON pair originates from the corresponding Python entrypoints in sirnaforge.core.off_target, invoked by the embedded workflow.

Troubleshooting

Missing genome files → verify paths in genomes.yaml and that containers can reach the host paths (bind mount the directories when using Docker).
Slow throughput → lower --genome_species, reduce --top-n during candidate generation, or increase parallelism with -process.maxForks in nextflow.config.
Tool not found → ensure you are using the provided Docker image or run make docker-build to rebuild locally.

Additional Resources

docs/developer/testing_guide.md: Guidance on pipeline-related test markers

With these steps you can orchestrate the full off-target pipeline at scale while reusing the same validated Python components exposed in the API reference.