Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

No unreleased changes yet.

[0.3.4] - 2025-12-31

Added

  • Transcript Annotation Provider Layer: New data provider interface for fetching genomic transcript annotations

    • Added AbstractTranscriptAnnotationClient interface in src/sirnaforge/data/base.py

    • Implemented EnsemblTranscriptModelClient using Ensembl REST API (lookup/id and overlap/region endpoints)

    • Added VepConsequenceClient stub for optional VEP enrichment (behind config flag, placeholder implementation)

    • New Pydantic models: Interval, TranscriptAnnotation, and TranscriptAnnotationBundle in src/sirnaforge/models/transcript_annotation.py

    • In-memory LRU cache with TTL for transcript annotations

    • Support for fetching by stable IDs or genomic regions

    • Comprehensive unit tests with mocked REST responses

    • Integration tests for real Ensembl REST API (gated by @pytest.mark.requires_network)

Improvements

  • Extensible Architecture: Transcript annotation provider follows the same layered pattern as existing data providers (gene search, ORF analysis, transcriptome management)

  • Reference Tracking: Annotations include provenance metadata (provider, endpoint, reference choice) for reproducibility

  • Error Handling: Robust handling of unresolved IDs and network errors with fallback to unresolved list

Documentation

  • Added comprehensive docstrings for all new transcript annotation classes and methods

  • Unit and integration tests serve as usage examples

[0.3.3] - 2025-12-15

Bug Fixes

  • Docker Login Shell PATH: Fixed issue where login shells reset PATH and dropped /opt/conda/bin, making sirnaforge and nextflow unavailable

    • Added /etc/profile.d/conda-path.sh to preserve conda toolchain paths in login shells

    • Added regression test test_docker_login_shell_path() to container test suite

    • Added standalone test script scripts/test-docker-login-shell.sh for manual verification

  • Nextflow Off-target Aggregation: Fixed a Groovy/DSL2 runtime crash during final aggregation (No signature of method ... call(LinkedList)) by correcting channel collection/defaulting semantics in the embedded workflow

    • Replaced invalid ifEmpty([])/ifEmpty('') usage with ifEmpty { [] }/ifEmpty { '' }

    • Switched from collect() to toList() for explicit channel materialization before combining genome + miRNA result lists

[0.3.1] - 2025-12-04

Added

  • Dirty Control Candidates: workflow.py now reuses the harshest rejected guides as β€œdirty controls” (see sirnaforge/utils/control_candidates.py) so every workflow/Nextflow run emits at least one known-bad sequence for health monitoring.

Improvements

  • Resilient Aggregation: Embedded Nextflow modules (src/sirnaforge/pipeline/nextflow/workflows/modules/local/aggregate_results.nf, mirna_seed_analysis.nf, etc.) plus src/sirnaforge/pipeline/nextflow_cli.py keep TSV/JSON artefacts in sync even when some per-genome or miRNA analyses are skipped.

  • Deterministic Cache & Defaults: Centralized cache helpers document where transcriptome/miRNA assets live, and the CLI now enforces valid GC/length ranges with automatic transcriptome fallbacks when --transcriptome is omitted.

Bug Fixes

  • Pipeline Reliability: Aggregation guards against missing TSVs, writes explicit workdir breadcrumbs, and standardizes BWA-MEM2 index prep plus scoped retries to eliminate intermittent Nextflow crashes.

Documentation

  • Docs v2 Launch: Introduced docs_v2/ with a new Sphinx build, autogenerated CLI/API pages, and live command output so the published docs always match the Typer surface.

  • Nextflow Tutorial & Guides: docs/getting_started.md, docs/usage_examples.md, and the expanded Nextflow tutorial now explain dirty controls, cache directories, Docker execution, and expected artefacts end-to-end.

[0.3.0] - 2025-11-21

Improvements

  • Documentation Standardization: Added sphinx-design tab sets across all guides so every example shows UV vs Docker commands side-by-side, reducing copy/paste errors.

  • GC Content Defaults: Raised the default --gc-max to 60% (from 52%) and documented the change across CLI help and tutorials to match wet-lab guidance.

  • CI/CD Enhancements: Release workflow now runs test-release, publishes coverage summaries, and properly sequences lint β†’ test tiers for consistent PR validation.

Bug Fixes

  • Off-target/miRNA Search: Fixed the regression that prevented miRNA seed matches from being emitted in combined reports when Nextflow was invoked from the CLI.

  • Docker Test Environment: Updated make docker-test docs and scripts to avoid uv sync conflicts inside the container image.

Testing

  • Test Tier Documentation: make test-release now produces a single narrative across dev/ci/release markers with expected run times and coverage outputs, making it easier for contributors to reproduce CI locally.

Build & Infrastructure

  • Make Targets: Documented the revamped test-dev, test-ci, and test-release targets along with their coverage/JUnit outputs so users know which tier to run before opening a PR.

0.2.2 - 2025-10-26

New Features

  • miRNA Design Mode: New --design-mode mirna option for microRNA-specific siRNA design

    • Specialized MiRNADesigner subclass with miRNA-biogenesis-aware scoring

    • Enhanced CSV schema with miRNA-specific columns (strand_role, biogenesis_score)

    • CLI support via --design-mode flag with automatic parameter adjustment

  • miRNA Seed Match Analysis: Integrated miRNA off-target screening in Nextflow pipeline

    • Lightweight seed region matching (positions 2-8) against miRNA databases

    • Automatic miRNA database download and caching from MirGeneDB

    • Per-candidate and aggregated miRNA hit reports in TSV/JSON formats

    • Configurable via --mirna-db and --mirna-species flags

  • Species Registry System: Canonical species name mapping and normalization

    • Unified species identifiers across genome and miRNA databases

    • Automatic species alias resolution (e.g., β€œhuman” β†’ β€œHomo sapiens” β†’ mirgenedb slug)

    • Support for multi-species analysis with consistent naming

Improvements

  • Nextflow Pipeline Enhancements:

    • Reduced memory requirements for Docker-constrained environments (2GB β†’ 1GB for most processes)

    • Added miRNA seed analysis module with BWA-based matching

    • Improved error handling and progress reporting

    • Better resource allocation with memory/CPU buffers

  • Data Validation: Extended Pandera schemas for miRNA-specific columns

  • CSV Output: New columns transcript_hit_count and transcript_hit_fraction track guide specificity

  • miRNA Database Manager: Enhanced with species normalization and canonical name mapping

Bug Fixes

  • Fixed Nextflow Docker configuration for resource-constrained CI environments

  • Resolved schema validation errors for miRNA columns in mixed-mode workflows

  • Fixed typing issues in pipeline CLI functions

Documentation

  • Major Documentation Consolidation: Reorganized structure for improved user experience

    • Simplified navigation from 4 to 3 main sections (Getting Started, User Guide, Reference, Developer)

    • Consolidated getting_started.md and quick_reference.md into comprehensive guide

    • Streamlined tutorials to 2 focused guides (pipeline integration, custom scoring)

    • Created dedicated developer section for advanced documentation

  • Complete API Reference: Added 18 previously missing modules

    • Comprehensive coverage of all 27 sirnaforge modules

    • Auto-generated Sphinx documentation with proper cross-references

  • Quality Improvements: Configured ruff D rules for docstring validation

    • Fixed 116 docstring formatting issues automatically

    • Clean Sphinx builds with no warnings

  • Usage Examples: Added miRNA seed analysis workflow documentation

Testing

  • New Test Coverage: 232 new tests for miRNA design mode

    • Comprehensive unit tests for MiRNADesigner scoring

    • Schema validation tests for miRNA-specific columns

    • Integration tests for miRNA database functionality

  • Test Organization: Normalized test markers for consistent CI/CD workflows

  • Documentation Tests: Verified all doc builds and cross-references work correctly

Dependencies

  • No new runtime dependencies (leverages existing httpx, pydantic, pandera)

  • Enhanced development dependencies for documentation generation


0.2.1 - 2025-10-24

New Features

  • Chemical Modification System: Comprehensive infrastructure for siRNA chemical modifications

    • Default modification patterns automatically applied to designed siRNAs (standard_2ome, minimal_terminal, maximal_stability)

    • New --modifications and --overhang CLI flags for workflow and design commands

    • FDA-approved Patisiran (Onpattro) pattern included in example library

  • Modification Metadata Models: Pydantic models for StrandMetadata, ChemicalModification, Provenance tracking

  • FASTA Annotation System: Merge modification metadata into FASTA headers with full roundtrip support

  • Remote FASTA Inputs: Workflow supports --input-fasta with automatic HTTP download and caching

  • Enhanced Pandera Schemas: Runtime DataFrame validation with @pa.check_types decorators, automatic addition of modification columns

Improvements

  • Modification columns (guide/passenger overhangs and modifications) now included in CSV outputs

  • CLI sequences show command with JSON/FASTA/table output formats

  • CLI sequences annotate command for merging metadata into FASTA files

  • Standardized + separators in modification headers (backward compatible with |)

  • Resource resolver for flexible input handling (local files, HTTP URLs)

  • Improved type safety with Pandera schema validation on DesignResult.save_csv() and _generate_orf_report()

Bug Fixes

  • Fixed JSON metadata loading regression with StrandMetadata subscripting

  • Resolved mypy typing issues for optional FASTA descriptions

  • Fixed CLI output handling for modification metadata

Documentation

  • Chemical Modification Review (551 lines): Comprehensive analysis and integration guide

  • Modification Integration Guide (543 lines): Developer documentation with code examples

  • Modification Annotation Spec (381 lines): Complete FASTA header specification

  • Example Patterns Library: 4 production-ready modification patterns with usage guide

  • Updated README with chemical modifications feature documentation

  • Remote FASTA usage documented in CLI and gene search guides

Testing

  • 18 new tests for chemical modifications (100% passing):

    • 11 integration tests for workflow roundtrip validation

    • 7 tests validating example pattern files

  • Added resource resolver unit tests (local paths, HTTP downloads, schemes)

  • Extended modification metadata tests for delimiter compatibility

  • All 164 tests passing with enhanced Pandera validation

Dependencies

  • No new runtime dependencies added (uses existing Pydantic, Pandera, httpx)

Performance

  • Removed Bowtie indexing (standardized on BWA-MEM2)

  • Streamlined off-target analysis pipeline configuration


0.2.0 - 2025-09-27

New Features

  • miRNA Database Cache System (sirnaforge cache) - Local caching and management of miRNA databases from multiple sources with automatic updates

  • Comprehensive Data Validation - Pandera DataFrameSchemas for type-safe output validation ensuring consistent CSV/TSV report formatting

  • Enhanced Thermodynamic Scoring - Modified composite score to heavily favor (90%) duplex binding energy for improved siRNA selection accuracy

  • Workflow Input Flexibility - Added FASTA file input support for custom transcript analysis workflows

  • Embedded Nextflow Pipeline - Integrated Nextflow execution directly within Python API for scalable processing

Improvements

  • Performance Optimization - Parallelized off-target analysis and improved memory efficiency for large transcript sets

  • CLI Enhancement - Better Unicode support, cleaner help text, and improved error reporting

  • Data Schema Validation - Robust output validation with detailed error messages using modern Pandera 0.26.1 patterns

  • Documentation Overhaul - Comprehensive testing guide, thermodynamic documentation, and improved API references

  • Development Workflow - Enhanced Makefile with Docker testing categories, release validation, and conda environment support

οΏ½ Bug Fixes

  • Security Improvements - Resolved security linting issues and improved dependency management

  • Off-target Analysis - Fixed alignment indexing and improved multi-species database handling

  • CI/CD Pipeline - Resolved build failures, improved test categorization, and enhanced release automation

  • Unicode Handling - Fixed CLI display issues in various terminal environments

πŸ“Š Performance

  • 10-100x Faster Dependencies - Full migration to uv package manager for ultra-fast installs and environment management

  • Optimized Algorithms - Improved thermodynamic calculation efficiency with better filtering strategies

  • Parallel Processing - Enhanced concurrent execution for off-target analysis across multiple genomes

Testing & Infrastructure

  • Enhanced Test Categories - Smoke tests (256MB), integration tests (2GB), and full CI validation

  • Docker Improvements - Multi-stage builds, intelligent entrypoint, and resource-aware testing

  • Release Automation - Comprehensive GitHub Actions workflow with quality gates and artifact management

Documentation

  • Testing Guide - Comprehensive documentation for all test categories and Docker workflows

  • Thermodynamic Guide - Detailed explanation of scoring algorithms and parameter optimization

  • CLI Reference - Auto-generated command documentation with examples

  • Development Setup - Streamlined onboarding with conda environment and uv integration

Dependencies & Architecture

  • Modern Python Support - Maintained compatibility across Python 3.9-3.12 with improved type safety

  • Pydantic Integration - Enhanced data models with validation middleware and error handling

  • Containerization - Production-ready Docker images with conda bioinformatics stack

  • Package Management - Full uv adoption for dependency resolution and virtual environment management

0.1.0 - 2025-09-06

Added

  • Initial release of siRNAforge toolkit

  • Core siRNA design algorithms with thermodynamic scoring

  • Multi-database gene search (Ensembl, RefSeq, GENCODE)

  • Rich command-line interface with Typer and Rich

  • Comprehensive siRNA candidate scoring system

  • Off-target prediction framework

  • Nextflow pipeline integration for scalable analysis

  • Docker containerization for reproducible environments

  • Python API with Pydantic data models

  • Comprehensive test suite with unit and integration tests

  • Modern development tooling with uv, black, ruff, mypy

Core Features

  • Gene Search: Multi-database transcript retrieval

  • siRNA Design: Algorithm-driven candidate generation

  • Quality Control: GC content, structure, and specificity filters

  • Scoring System: Composite scoring with multiple components

  • Workflow Orchestration: End-to-end gene-to-siRNA pipeline

  • CLI Interface: Rich, user-friendly command-line tools

  • Python API: Programmatic access for automation

Supported Operations

  • sirnaforge workflow: Complete gene-to-siRNA analysis

  • sirnaforge search: Gene and transcript search

  • sirnaforge design: siRNA candidate generation

  • sirnaforge validate: Input file validation

  • sirnaforge config: Configuration display

  • sirnaforge version: Version information

Technical Stack

  • Language: Python 3.9-3.12

  • Package Management: uv for fast dependency resolution

  • Data Models: Pydantic for type-safe data handling

  • CLI Framework: Typer with Rich for beautiful output

  • Testing: pytest with comprehensive coverage

  • Code Quality: black, ruff, mypy for consistency

  • Containerization: Multi-stage Docker builds

  • Pipeline: Nextflow integration for scalability

  • Documentation: Sphinx with MyST parser, Read the Docs theme