Pipeline Integration Tutorial

This tutorial walks through running siRNAforge’s embedded Nextflow pipeline for large-scale off-target analysis and explains how it maps back to the Python entrypoints documented in API Reference.

When to Use the Pipeline

Use the Nextflow workflow when you need to:

  • Score hundreds of candidate genes or transcripts in batch

  • Run transcriptome + miRNA off-target analysis with BWA-MEM2

  • Leverage Docker/Conda environments reproducibly on HPC or cloud infrastructure

For small jobs or interactive exploration, the CLI (sirnaforge workflow ...) is sufficient.

Prerequisites

  • Docker (recommended) or a local environment with Nextflow β‰₯ 25.04

  • 4+ CPU cores and enough RAM for your transcriptomes (human can require 32GB+ during indexing)

  • siRNA candidate FASTA/CSV generated by sirnaforge workflow or the Python API

Tip: The Docker image ghcr.io/austin-s-h/sirnaforge:latest already contains Nextflow, BWA-MEM2, SAMtools, and ViennaRNA.

Step 1 – Prepare Candidates

  1. Use the CLI to generate candidates for one or more genes:

    uv run sirnaforge workflow TP53 --output-dir results/tp53 --top-n 50
    
  2. Collect the candidate FASTA (sirna_candidates.fasta) from each workflow output directory and concatenate them if you are processing multiple genes.

Step 2 – Configure References

siRNAforge manages reference transcriptomes and miRNA databases via its cache.

Common knobs:

  • --species: drives both transcriptome fetching and miRNA seed matching defaults.

  • --mirna-db / --mirna-species: control miRNA reference selection.

  • --transcriptome-fasta: override/extend the transcriptome reference.

  • --offtarget-indices: override alignment indices for specific species (species:/abs/path/index_prefix).

Step 3 – Run Off-Target Analysis (embedded Nextflow)

For most users, run the off-target step via the CLI; it executes the embedded Nextflow workflow under the hood.

# Full workflow (includes embedded Nextflow off-target analysis)
uv run sirnaforge workflow TP53 --output-dir results/tp53

# Off-target only (if you already have a candidates FASTA)
uv run sirnaforge offtarget \
  --input-candidates-fasta results/tp53/off_target/input_candidates.fasta \
  --output-dir results/tp53/off_target \
  --species "human,rat"

Important parameters:

  • --candidates: FASTA file with siRNA sequences (one per record)

  • --outdir: Output directory for per-species TSV/JSON summaries

  • --genome_species: Comma-separated list for miRNA genome lookups (not genomic DNA alignment)

  • --max_hits, --bwa_k, --bwa_T, --seed_start, --seed_end: Override alignment sensitivity (see pipeline config)

Running Nextflow Directly (advanced)

If you need to integrate the pipeline into an external Nextflow setup (HPC, custom profiles), you can discover the embedded main.nf path and run it directly:

PIPELINE_NF=$(uv run python -c "from sirnaforge.pipeline.nextflow.runner import NextflowRunner; print(NextflowRunner().get_main_workflow())")
nextflow run "$PIPELINE_NF" --help

Docker + local profile (no Docker-in-Docker)

Inside the ghcr.io/austin-s-h/sirnaforge:latest image, use the local profile so Nextflow does not try to start nested containers.

The minimum mounts are:

  • workspace: -v $(pwd):/workspace -w /workspace

  • persistent cache: -v ~/.cache/sirnaforge:/home/sirnauser/.cache/sirnaforge

You can also use the Makefile helper:

make docker-nextflow-help

Step 4 – Inspect Outputs

The pipeline mirrors the CLI output structure:

nextflow_results/
β”œβ”€β”€ logs/
β”œβ”€β”€ off_target/
β”‚   β”œβ”€β”€ TP53_human_analysis.tsv
β”‚   └── TP53_rat_analysis.tsv
β”œβ”€β”€ sirnaforge/
β”‚   └── manifest.json
└── workflow_summary.json

Each TSV/JSON pair originates from the corresponding Python entrypoints in sirnaforge.core.off_target, invoked by the embedded workflow.

Troubleshooting

  • Missing genome files β†’ verify paths in genomes.yaml and that containers can reach the host paths (bind mount the directories when using Docker).

  • Slow throughput β†’ lower --genome_species, reduce --top-n during candidate generation, or increase parallelism with -process.maxForks in nextflow.config.

  • Tool not found β†’ ensure you are using the provided Docker image or run make docker-build to rebuild locally.

Additional Resources

With these steps you can orchestrate the full off-target pipeline at scale while reusing the same validated Python components exposed in the API reference.