Comprehensive Review: Chemical Modification Handling in siRNAforge

Date: 2025-10-24 Status: Review Complete Purpose: Evaluate current chemical modification infrastructure and recommend integration strategies


Executive Summary

siRNAforge has a well-designed, production-ready chemical modification system that is currently underutilized in the main workflow. This review outlines the existing infrastructure, identifies integration opportunities, and provides recommendations for incorporating modification metadata throughout the design pipeline.

Key Strengths

βœ… Robust Data Models - Pydantic-based validation with comprehensive error handling βœ… Flexible Storage - JSON sidecar files and FASTA header encoding βœ… Standards Compliant - Position numbering (1-based), provenance tracking βœ… Well Tested - 35+ unit tests with 100% pass rate βœ… Documented - Complete specification in modification_annotation_spec.md

Current Gaps

⚠️ No Workflow Integration - Modifications not included in pipeline outputs ⚠️ Manual Annotation Required - No automated addition to designed siRNAs ⚠️ Limited Discoverability - Features exist but aren’t in primary workflows


Current Infrastructure

1. Data Models (src/sirnaforge/models/modifications.py)

ChemicalModification

ChemicalModification(
    type: str,              # e.g., "2OMe", "2F", "PS", "LNA"
    positions: list[int]    # 1-based positions
)

Validation Features:

  • Non-empty modification type (whitespace stripped)

  • Positions must be >= 1 (1-based indexing)

  • Duplicate positions automatically removed and sorted

  • Position validation against sequence length

StrandMetadata

StrandMetadata(
    id: str,
    sequence: str,
    overhang: Optional[str] = None,           # e.g., "dTdT", "UU"
    chem_mods: list[ChemicalModification] = [],
    notes: Optional[str] = None,
    provenance: Optional[Provenance] = None,
    confirmation_status: ConfirmationStatus = PENDING
)

Key Features:

  • Sequence validation (ATCGU only, case-normalized to uppercase)

  • Modification positions validated against sequence length

  • FASTA header generation with metadata encoding

  • Dict-like access for backward compatibility

Provenance

Provenance(
    source_type: SourceType,  # patent, publication, clinical_trial, database, designed, other
    identifier: str,           # Patent #, DOI, PubMed ID, etc.
    url: Optional[str] = None
)

Use Cases:

  • Track FDA-approved siRNAs (e.g., Patisiran from patent US10060921B2)

  • Link to publications/clinical trials

  • Maintain audit trail for designed sequences

SequenceRecord

SequenceRecord(
    target_gene: str,
    strand_role: StrandRole,  # guide, sense, antisense, passenger
    metadata: StrandMetadata
)

2. Utility Functions (src/sirnaforge/modifications.py)

Function

Purpose

Input

Output

load_metadata()

Load JSON metadata with validation

JSON file path

dict[str, StrandMetadata]

parse_header()

Extract metadata from FASTA header

BioPython SeqRecord

Metadata dict

parse_chem_mods()

Parse modification string

β€œ2OMe(1,4,6)+2F()”

List of ChemicalModifications

parse_provenance()

Parse provenance string

β€œPatent:US10060921B2”

Provenance object

merge_metadata_into_fasta()

Merge JSON into FASTA headers

FASTA + JSON paths

Updated FASTA file

save_metadata_json()

Save metadata to JSON

Metadata dict

JSON file

3. Integration Points in SiRNACandidate

The SiRNACandidate model includes optional metadata fields:

class SiRNACandidate(BaseModel):
    # ... standard fields ...

    # Optional chemical modification metadata
    guide_metadata: Optional[StrandMetadata] = None
    passenger_metadata: Optional[StrandMetadata] = None

    def to_fasta(self, include_metadata: bool = False) -> str:
        """Generate FASTA with optional metadata in header"""

Current Usage: Fields exist but are not populated during design workflow.

4. FASTA Header Encoding Format

Standard key-value pairs separated by semicolons:

>sirna_001 Target=TTR; Role=guide; Confirmed=pending; Overhang=dTdT; ChemMods=2OMe(1,4,6,11)+2F(); Provenance=Patent:US10060921B2; URL=https://example.com
AUCGAUCGAUCGAUCGAUCGA

Features:

  • Human-readable and machine-parseable

  • Supports multiple modification types

  • Optional fields (gracefully handles missing data)

  • Backward compatible (works with unmodified FASTA files)

5. JSON Sidecar Format

Metadata can be stored separately for easier curation:

{
  "sirna_001": {
    "id": "sirna_001",
    "sequence": "AUCGAUCGAUCGAUCGAUCGA",
    "target_gene": "TTR",
    "strand_role": "guide",
    "overhang": "dTdT",
    "chem_mods": [
      {"type": "2OMe", "positions": [1, 4, 6, 11]}
    ],
    "provenance": {
      "source_type": "designed",
      "identifier": "sirnaforge_v1.0_TP53_candidate_001"
    },
    "confirmation_status": "pending",
    "notes": "Top candidate for TP53 targeting"
  }
}

Integration Opportunities

A. Workflow Output Integration

Current State: Workflow generates CSV and FASTA outputs without modification metadata.

Integration Points:

  1. Post-Design Annotation (workflow.py::step4_generate_reports)

    • After candidate generation, annotate top candidates with recommended modifications

    • Save metadata JSON alongside CSV outputs

    • Generate FASTA with modification headers for synthesis

  2. CSV Export Enhancement

    • Add columns for modification metadata in candidate CSV

    • Include provenance information for traceability

Recommended Approach:

# In workflow.py after candidate selection
def _annotate_candidates_with_modifications(
    candidates: list[SiRNACandidate],
    modification_strategy: str = "standard_2ome"
) -> dict[str, StrandMetadata]:
    """Apply standard modification patterns to candidates."""
    metadata = {}
    for candidate in candidates:
        # Example: Standard 2'-O-methyl pattern for stability
        guide_mods = apply_modification_pattern(
            candidate.guide_sequence,
            pattern=modification_strategy
        )

        metadata[candidate.id] = StrandMetadata(
            id=f"{candidate.id}_guide",
            sequence=candidate.guide_sequence,
            overhang="dTdT",  # Standard DNA overhang
            chem_mods=guide_mods,
            provenance=Provenance(
                source_type=SourceType.DESIGNED,
                identifier=f"sirnaforge_v{__version__}_{candidate.transcript_id}_{candidate.id}"
            ),
            confirmation_status=ConfirmationStatus.PENDING
        )
    return metadata

B. Annotation as Optional Output

User Control: Allow users to specify modification patterns via CLI/config.

# In DesignParameters model
modification_pattern: Optional[str] = Field(
    default=None,
    description="Chemical modification pattern to apply (standard_2ome, minimal, maximal, custom)"
)
modification_config: Optional[Path] = Field(
    default=None,
    description="Path to custom modification configuration JSON"
)

CLI Integration:

# Design with automatic modification annotation
sirnaforge design input.fasta -o output/ --modifications standard_2ome

# Custom modification pattern
sirnaforge design input.fasta -o output/ --modifications custom --modification-config mods.json

C. Separate Annotation Command

Keep design and annotation separate for flexibility:

# Design workflow (no modifications)
sirnaforge design input.fasta -o output/

# Annotate top candidates afterward
sirnaforge sequences annotate \
  output/sirnaforge/TP53_pass.fasta \
  modifications.json \
  -o output/sirnaforge/TP53_pass_annotated.fasta

Advantages:

  • Maintains separation of concerns

  • Users can experiment with different modification patterns

  • No workflow slowdown for users who don’t need modifications



Implementation Patterns

Pattern 2: Integrated Scoring (Advanced)

Description: Modification metadata influences candidate scoring.

Pros:

  • Holistic optimization (sequence + chemistry)

  • Can optimize for synthesis feasibility

  • Reflects real-world constraints

Cons:

  • Increased complexity

  • Slower computation

  • May over-constrain design space

Use When:

  • Synthesis costs are critical

  • Specific modification chemistry is required

  • Integration with synthesis platforms

Pattern 3: Hybrid Approach (Flexible)

Description: Core workflow generates unmodified designs; optional post-processing adds modifications.

Implementation:

# Step 1: Design (fast, no modifications)
results = workflow.run_design(...)

# Step 2: Optional annotation (if requested)
if config.export_modifications:
    metadata = annotate_with_modifications(
        results.top_candidates,
        pattern=config.modification_pattern
    )
    save_modification_files(metadata, output_dir)

Pros:

  • Best of both worlds

  • No performance impact for basic usage

  • Advanced users get full features

Cons:

  • More code paths to maintain

  • Slight API complexity


Modification Pattern Library

Standard Patterns (Examples)

1. Minimal (Cost-Optimized)

{
  "name": "minimal",
  "description": "Minimum modifications for basic stability",
  "guide_pattern": {
    "2OMe": {"positions": "terminal_3"}  # Only 3' end protection
  },
  "passenger_pattern": {
    "2OMe": {"positions": "terminal_5"}  # Only 5' end protection
  }
}

2. Standard (Balanced)

{
  "name": "standard",
  "description": "Standard pattern for most applications",
  "guide_pattern": {
    "2OMe": {"positions": "alternating"},  # Positions 1,3,5,7,...
    "PS": {"positions": "terminal_dinucleotides"}  # First 2 and last 2
  },
  "passenger_pattern": {
    "2OMe": {"positions": "alternating"}
  }
}

3. Maximal (High Stability)

{
  "name": "maximal",
  "description": "Maximum stability for difficult targets",
  "guide_pattern": {
    "2OMe": {"positions": "all"},
    "PS": {"positions": "all_internucleotide"}
  },
  "passenger_pattern": {
    "2OMe": {"positions": "all"}
  }
}

Testing Strategy

Test Categories

  1. Unit Tests (Existing - All Pass βœ“)

    • Data model validation

    • Serialization/deserialization

    • Header parsing

    • Edge cases

  2. Integration Tests (New - Recommended)

    • Workflow with modification export

    • FASTA annotation pipeline

    • Pattern application

  3. Example Tests (New - Recommended)

    • Validate example JSON files

    • Tutorial notebook execution

    • CLI command examples

New Test Coverage Needed

# tests/integration/test_modification_workflow.py
def test_design_with_modification_export():
    """Test complete workflow with modification metadata export."""

def test_annotation_pipeline():
    """Test annotation of existing candidates with modifications."""

def test_pattern_library():
    """Test built-in modification patterns."""

def test_custom_pattern():
    """Test user-defined custom modification pattern."""

Recommendations Summary

1. Immediate Actions (Low Effort, High Impact)

βœ… Create Example Files

  • Standard modification patterns in examples/modification_patterns/

  • Real-world examples (FDA-approved siRNAs)

  • Custom pattern template

βœ… Add to Documentation

  • Quick start section on modifications

  • API reference for modification functions

  • Best practices guide

βœ… Add Integration Tests

  • Test modification annotation workflow

  • Validate pattern application

  • Ensure FASTA header roundtrip

2. Short-term Enhancements (Moderate Effort)

πŸ”§ Optional Workflow Integration

  • Add export_modifications flag to DesignParameters

  • Generate modification JSON alongside CSV outputs

  • Create annotated FASTA for synthesis

πŸ”§ Pattern Library

  • Implement built-in patterns (minimal, standard, maximal)

  • Support custom pattern loading from JSON

  • Validate patterns against sequence constraints

3. Long-term Vision (High Effort, High Value)

πŸš€ Intelligent Recommendations

  • Modification predictor based on sequence features

  • Synthesis cost estimation

  • Stability prediction models

πŸš€ Synthesis Integration

  • Export formats for synthesis platforms

  • Cost optimization

  • Batch ordering support


Conclusion

siRNAforge has a solid foundation for chemical modification handling. The data models are well-designed, thoroughly tested, and production-ready. The main opportunity is to make these features more discoverable and optionally integrate them into the main workflow.

Recommended Approach:

  1. Keep modifications as optional annotations (Pattern 3: Hybrid)

  2. Add examples and documentation (immediate win)

  3. Provide built-in patterns for common use cases

  4. Let users experiment without workflow overhead

This approach maintains the tool’s flexibility while providing advanced users with the features they need for production siRNA design and synthesis planning.


Appendix: Common Modification Types

Type

Full Name

Position Preference

Stability

Cost

Notes

2OMe

2’-O-methyl

Alternating, all

++

$

Most common, good balance

2F

2’-fluoro

Pyrimidines

+++

$$

Enhanced binding affinity

PS

Phosphorothioate

Terminal internucleotides

+++

$

Nuclease resistance

LNA

Locked Nucleic Acid

Sparse (every 3-4nt)

++++

$$$

Very high affinity, costly

MOE

2’-O-methoxyethyl

Alternating

++

$$

Improved PK/PD

Key:

  • Stability: + (low) to ++++ (very high)

  • Cost: \( (standard) to \)$$ (premium)


Document Version: 1.0 Last Updated: 2025-10-24 Author: siRNAforge Review System