Changelogο
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleasedο
No unreleased changes yet.
[0.3.4] - 2025-12-31ο
Addedο
Transcript Annotation Provider Layer: New data provider interface for fetching genomic transcript annotations
Added
AbstractTranscriptAnnotationClientinterface insrc/sirnaforge/data/base.pyImplemented
EnsemblTranscriptModelClientusing Ensembl REST API (lookup/id and overlap/region endpoints)Added
VepConsequenceClientstub for optional VEP enrichment (behind config flag, placeholder implementation)New Pydantic models:
Interval,TranscriptAnnotation, andTranscriptAnnotationBundleinsrc/sirnaforge/models/transcript_annotation.pyIn-memory LRU cache with TTL for transcript annotations
Support for fetching by stable IDs or genomic regions
Comprehensive unit tests with mocked REST responses
Integration tests for real Ensembl REST API (gated by
@pytest.mark.requires_network)
Improvementsο
Extensible Architecture: Transcript annotation provider follows the same layered pattern as existing data providers (gene search, ORF analysis, transcriptome management)
Reference Tracking: Annotations include provenance metadata (provider, endpoint, reference choice) for reproducibility
Error Handling: Robust handling of unresolved IDs and network errors with fallback to unresolved list
Documentationο
Added comprehensive docstrings for all new transcript annotation classes and methods
Unit and integration tests serve as usage examples
[0.3.3] - 2025-12-15ο
Bug Fixesο
Docker Login Shell PATH: Fixed issue where login shells reset PATH and dropped
/opt/conda/bin, makingsirnaforgeandnextflowunavailableAdded
/etc/profile.d/conda-path.shto preserve conda toolchain paths in login shellsAdded regression test
test_docker_login_shell_path()to container test suiteAdded standalone test script
scripts/test-docker-login-shell.shfor manual verification
Nextflow Off-target Aggregation: Fixed a Groovy/DSL2 runtime crash during final aggregation (
No signature of method ... call(LinkedList)) by correcting channel collection/defaulting semantics in the embedded workflowReplaced invalid
ifEmpty([])/ifEmpty('')usage withifEmpty { [] }/ifEmpty { '' }Switched from
collect()totoList()for explicit channel materialization before combining genome + miRNA result lists
[0.3.1] - 2025-12-04ο
Addedο
Dirty Control Candidates:
workflow.pynow reuses the harshest rejected guides as βdirty controlsβ (seesirnaforge/utils/control_candidates.py) so every workflow/Nextflow run emits at least one known-bad sequence for health monitoring.
Improvementsο
Resilient Aggregation: Embedded Nextflow modules (
src/sirnaforge/pipeline/nextflow/workflows/modules/local/aggregate_results.nf,mirna_seed_analysis.nf, etc.) plussrc/sirnaforge/pipeline/nextflow_cli.pykeep TSV/JSON artefacts in sync even when some per-genome or miRNA analyses are skipped.Deterministic Cache & Defaults: Centralized cache helpers document where transcriptome/miRNA assets live, and the CLI now enforces valid GC/length ranges with automatic transcriptome fallbacks when
--transcriptomeis omitted.
Bug Fixesο
Pipeline Reliability: Aggregation guards against missing TSVs, writes explicit workdir breadcrumbs, and standardizes BWA-MEM2 index prep plus scoped retries to eliminate intermittent Nextflow crashes.
Documentationο
Docs v2 Launch: Introduced
docs_v2/with a new Sphinx build, autogenerated CLI/API pages, and live command output so the published docs always match the Typer surface.Nextflow Tutorial & Guides:
docs/getting_started.md,docs/usage_examples.md, and the expanded Nextflow tutorial now explain dirty controls, cache directories, Docker execution, and expected artefacts end-to-end.
[0.3.0] - 2025-11-21ο
Improvementsο
Documentation Standardization: Added sphinx-design tab sets across all guides so every example shows UV vs Docker commands side-by-side, reducing copy/paste errors.
GC Content Defaults: Raised the default
--gc-maxto 60% (from 52%) and documented the change across CLI help and tutorials to match wet-lab guidance.CI/CD Enhancements: Release workflow now runs
test-release, publishes coverage summaries, and properly sequences lint β test tiers for consistent PR validation.
Bug Fixesο
Off-target/miRNA Search: Fixed the regression that prevented miRNA seed matches from being emitted in combined reports when Nextflow was invoked from the CLI.
Docker Test Environment: Updated
make docker-testdocs and scripts to avoiduv syncconflicts inside the container image.
Testingο
Test Tier Documentation:
make test-releasenow produces a single narrative across dev/ci/release markers with expected run times and coverage outputs, making it easier for contributors to reproduce CI locally.
Build & Infrastructureο
Make Targets: Documented the revamped
test-dev,test-ci, andtest-releasetargets along with their coverage/JUnit outputs so users know which tier to run before opening a PR.
0.2.2 - 2025-10-26ο
New Featuresο
miRNA Design Mode: New
--design-mode mirnaoption for microRNA-specific siRNA designSpecialized
MiRNADesignersubclass with miRNA-biogenesis-aware scoringEnhanced CSV schema with miRNA-specific columns (strand_role, biogenesis_score)
CLI support via
--design-modeflag with automatic parameter adjustment
miRNA Seed Match Analysis: Integrated miRNA off-target screening in Nextflow pipeline
Lightweight seed region matching (positions 2-8) against miRNA databases
Automatic miRNA database download and caching from MirGeneDB
Per-candidate and aggregated miRNA hit reports in TSV/JSON formats
Configurable via
--mirna-dband--mirna-speciesflags
Species Registry System: Canonical species name mapping and normalization
Unified species identifiers across genome and miRNA databases
Automatic species alias resolution (e.g., βhumanβ β βHomo sapiensβ β mirgenedb slug)
Support for multi-species analysis with consistent naming
Improvementsο
Nextflow Pipeline Enhancements:
Reduced memory requirements for Docker-constrained environments (2GB β 1GB for most processes)
Added miRNA seed analysis module with BWA-based matching
Improved error handling and progress reporting
Better resource allocation with memory/CPU buffers
Data Validation: Extended Pandera schemas for miRNA-specific columns
CSV Output: New columns
transcript_hit_countandtranscript_hit_fractiontrack guide specificitymiRNA Database Manager: Enhanced with species normalization and canonical name mapping
Bug Fixesο
Fixed Nextflow Docker configuration for resource-constrained CI environments
Resolved schema validation errors for miRNA columns in mixed-mode workflows
Fixed typing issues in pipeline CLI functions
Documentationο
Major Documentation Consolidation: Reorganized structure for improved user experience
Simplified navigation from 4 to 3 main sections (Getting Started, User Guide, Reference, Developer)
Consolidated
getting_started.mdandquick_reference.mdinto comprehensive guideStreamlined tutorials to 2 focused guides (pipeline integration, custom scoring)
Created dedicated developer section for advanced documentation
Complete API Reference: Added 18 previously missing modules
Comprehensive coverage of all 27 sirnaforge modules
Auto-generated Sphinx documentation with proper cross-references
Quality Improvements: Configured ruff D rules for docstring validation
Fixed 116 docstring formatting issues automatically
Clean Sphinx builds with no warnings
Usage Examples: Added miRNA seed analysis workflow documentation
Testingο
New Test Coverage: 232 new tests for miRNA design mode
Comprehensive unit tests for MiRNADesigner scoring
Schema validation tests for miRNA-specific columns
Integration tests for miRNA database functionality
Test Organization: Normalized test markers for consistent CI/CD workflows
Documentation Tests: Verified all doc builds and cross-references work correctly
Dependenciesο
No new runtime dependencies (leverages existing httpx, pydantic, pandera)
Enhanced development dependencies for documentation generation
0.2.1 - 2025-10-24ο
New Featuresο
Chemical Modification System: Comprehensive infrastructure for siRNA chemical modifications
Default modification patterns automatically applied to designed siRNAs (standard_2ome, minimal_terminal, maximal_stability)
New
--modificationsand--overhangCLI flags for workflow and design commandsFDA-approved Patisiran (Onpattro) pattern included in example library
Modification Metadata Models: Pydantic models for StrandMetadata, ChemicalModification, Provenance tracking
FASTA Annotation System: Merge modification metadata into FASTA headers with full roundtrip support
Remote FASTA Inputs: Workflow supports
--input-fastawith automatic HTTP download and cachingEnhanced Pandera Schemas: Runtime DataFrame validation with @pa.check_types decorators, automatic addition of modification columns
Improvementsο
Modification columns (guide/passenger overhangs and modifications) now included in CSV outputs
CLI
sequences showcommand with JSON/FASTA/table output formatsCLI
sequences annotatecommand for merging metadata into FASTA filesStandardized
+separators in modification headers (backward compatible with|)Resource resolver for flexible input handling (local files, HTTP URLs)
Improved type safety with Pandera schema validation on DesignResult.save_csv() and _generate_orf_report()
Bug Fixesο
Fixed JSON metadata loading regression with StrandMetadata subscripting
Resolved mypy typing issues for optional FASTA descriptions
Fixed CLI output handling for modification metadata
Documentationο
Chemical Modification Review (551 lines): Comprehensive analysis and integration guide
Modification Integration Guide (543 lines): Developer documentation with code examples
Modification Annotation Spec (381 lines): Complete FASTA header specification
Example Patterns Library: 4 production-ready modification patterns with usage guide
Updated README with chemical modifications feature documentation
Remote FASTA usage documented in CLI and gene search guides
Testingο
18 new tests for chemical modifications (100% passing):
11 integration tests for workflow roundtrip validation
7 tests validating example pattern files
Added resource resolver unit tests (local paths, HTTP downloads, schemes)
Extended modification metadata tests for delimiter compatibility
All 164 tests passing with enhanced Pandera validation
Dependenciesο
No new runtime dependencies added (uses existing Pydantic, Pandera, httpx)
Performanceο
Removed Bowtie indexing (standardized on BWA-MEM2)
Streamlined off-target analysis pipeline configuration
0.2.0 - 2025-09-27ο
New Featuresο
miRNA Database Cache System (
sirnaforge cache) - Local caching and management of miRNA databases from multiple sources with automatic updatesComprehensive Data Validation - Pandera DataFrameSchemas for type-safe output validation ensuring consistent CSV/TSV report formatting
Enhanced Thermodynamic Scoring - Modified composite score to heavily favor (90%) duplex binding energy for improved siRNA selection accuracy
Workflow Input Flexibility - Added FASTA file input support for custom transcript analysis workflows
Embedded Nextflow Pipeline - Integrated Nextflow execution directly within Python API for scalable processing
Improvementsο
Performance Optimization - Parallelized off-target analysis and improved memory efficiency for large transcript sets
CLI Enhancement - Better Unicode support, cleaner help text, and improved error reporting
Data Schema Validation - Robust output validation with detailed error messages using modern Pandera 0.26.1 patterns
Documentation Overhaul - Comprehensive testing guide, thermodynamic documentation, and improved API references
Development Workflow - Enhanced Makefile with Docker testing categories, release validation, and conda environment support
οΏ½ Bug Fixesο
Security Improvements - Resolved security linting issues and improved dependency management
Off-target Analysis - Fixed alignment indexing and improved multi-species database handling
CI/CD Pipeline - Resolved build failures, improved test categorization, and enhanced release automation
Unicode Handling - Fixed CLI display issues in various terminal environments
π Performanceο
10-100x Faster Dependencies - Full migration to uv package manager for ultra-fast installs and environment management
Optimized Algorithms - Improved thermodynamic calculation efficiency with better filtering strategies
Parallel Processing - Enhanced concurrent execution for off-target analysis across multiple genomes
Testing & Infrastructureο
Enhanced Test Categories - Smoke tests (256MB), integration tests (2GB), and full CI validation
Docker Improvements - Multi-stage builds, intelligent entrypoint, and resource-aware testing
Release Automation - Comprehensive GitHub Actions workflow with quality gates and artifact management
Documentationο
Testing Guide - Comprehensive documentation for all test categories and Docker workflows
Thermodynamic Guide - Detailed explanation of scoring algorithms and parameter optimization
CLI Reference - Auto-generated command documentation with examples
Development Setup - Streamlined onboarding with conda environment and uv integration
Dependencies & Architectureο
Modern Python Support - Maintained compatibility across Python 3.9-3.12 with improved type safety
Pydantic Integration - Enhanced data models with validation middleware and error handling
Containerization - Production-ready Docker images with conda bioinformatics stack
Package Management - Full uv adoption for dependency resolution and virtual environment management
0.1.0 - 2025-09-06ο
Addedο
Initial release of siRNAforge toolkit
Core siRNA design algorithms with thermodynamic scoring
Multi-database gene search (Ensembl, RefSeq, GENCODE)
Rich command-line interface with Typer and Rich
Comprehensive siRNA candidate scoring system
Off-target prediction framework
Nextflow pipeline integration for scalable analysis
Docker containerization for reproducible environments
Python API with Pydantic data models
Comprehensive test suite with unit and integration tests
Modern development tooling with uv, black, ruff, mypy
Core Featuresο
Gene Search: Multi-database transcript retrieval
siRNA Design: Algorithm-driven candidate generation
Quality Control: GC content, structure, and specificity filters
Scoring System: Composite scoring with multiple components
Workflow Orchestration: End-to-end gene-to-siRNA pipeline
CLI Interface: Rich, user-friendly command-line tools
Python API: Programmatic access for automation
Supported Operationsο
sirnaforge workflow: Complete gene-to-siRNA analysissirnaforge search: Gene and transcript searchsirnaforge design: siRNA candidate generationsirnaforge validate: Input file validationsirnaforge config: Configuration displaysirnaforge version: Version information
Technical Stackο
Language: Python 3.9-3.12
Package Management: uv for fast dependency resolution
Data Models: Pydantic for type-safe data handling
CLI Framework: Typer with Rich for beautiful output
Testing: pytest with comprehensive coverage
Code Quality: black, ruff, mypy for consistency
Containerization: Multi-stage Docker builds
Pipeline: Nextflow integration for scalability
Documentation: Sphinx with MyST parser, Read the Docs theme