Comprehensive Review: Chemical Modification Handling in siRNAforgeο
Date: 2025-10-24 Status: Review Complete Purpose: Evaluate current chemical modification infrastructure and recommend integration strategies
Executive Summaryο
siRNAforge has a well-designed, production-ready chemical modification system that is currently underutilized in the main workflow. This review outlines the existing infrastructure, identifies integration opportunities, and provides recommendations for incorporating modification metadata throughout the design pipeline.
Key Strengthsο
β
Robust Data Models - Pydantic-based validation with comprehensive error handling
β
Flexible Storage - JSON sidecar files and FASTA header encoding
β
Standards Compliant - Position numbering (1-based), provenance tracking
β
Well Tested - 35+ unit tests with 100% pass rate
β
Documented - Complete specification in modification_annotation_spec.md
Current Gapsο
β οΈ No Workflow Integration - Modifications not included in pipeline outputs β οΈ Manual Annotation Required - No automated addition to designed siRNAs β οΈ Limited Discoverability - Features exist but arenβt in primary workflows
Current Infrastructureο
1. Data Models (src/sirnaforge/models/modifications.py)ο
ChemicalModificationο
ChemicalModification(
type: str, # e.g., "2OMe", "2F", "PS", "LNA"
positions: list[int] # 1-based positions
)
Validation Features:
Non-empty modification type (whitespace stripped)
Positions must be >= 1 (1-based indexing)
Duplicate positions automatically removed and sorted
Position validation against sequence length
StrandMetadataο
StrandMetadata(
id: str,
sequence: str,
overhang: Optional[str] = None, # e.g., "dTdT", "UU"
chem_mods: list[ChemicalModification] = [],
notes: Optional[str] = None,
provenance: Optional[Provenance] = None,
confirmation_status: ConfirmationStatus = PENDING
)
Key Features:
Sequence validation (ATCGU only, case-normalized to uppercase)
Modification positions validated against sequence length
FASTA header generation with metadata encoding
Dict-like access for backward compatibility
Provenanceο
Provenance(
source_type: SourceType, # patent, publication, clinical_trial, database, designed, other
identifier: str, # Patent #, DOI, PubMed ID, etc.
url: Optional[str] = None
)
Use Cases:
Track FDA-approved siRNAs (e.g., Patisiran from patent US10060921B2)
Link to publications/clinical trials
Maintain audit trail for designed sequences
SequenceRecordο
SequenceRecord(
target_gene: str,
strand_role: StrandRole, # guide, sense, antisense, passenger
metadata: StrandMetadata
)
2. Utility Functions (src/sirnaforge/modifications.py)ο
Function |
Purpose |
Input |
Output |
|---|---|---|---|
|
Load JSON metadata with validation |
JSON file path |
|
|
Extract metadata from FASTA header |
BioPython SeqRecord |
Metadata dict |
|
Parse modification string |
β2OMe(1,4,6)+2F()β |
List of ChemicalModifications |
|
Parse provenance string |
βPatent:US10060921B2β |
Provenance object |
|
Merge JSON into FASTA headers |
FASTA + JSON paths |
Updated FASTA file |
|
Save metadata to JSON |
Metadata dict |
JSON file |
3. Integration Points in SiRNACandidateο
The SiRNACandidate model includes optional metadata fields:
class SiRNACandidate(BaseModel):
# ... standard fields ...
# Optional chemical modification metadata
guide_metadata: Optional[StrandMetadata] = None
passenger_metadata: Optional[StrandMetadata] = None
def to_fasta(self, include_metadata: bool = False) -> str:
"""Generate FASTA with optional metadata in header"""
Current Usage: Fields exist but are not populated during design workflow.
4. FASTA Header Encoding Formatο
Standard key-value pairs separated by semicolons:
>sirna_001 Target=TTR; Role=guide; Confirmed=pending; Overhang=dTdT; ChemMods=2OMe(1,4,6,11)+2F(); Provenance=Patent:US10060921B2; URL=https://example.com
AUCGAUCGAUCGAUCGAUCGA
Features:
Human-readable and machine-parseable
Supports multiple modification types
Optional fields (gracefully handles missing data)
Backward compatible (works with unmodified FASTA files)
5. JSON Sidecar Formatο
Metadata can be stored separately for easier curation:
{
"sirna_001": {
"id": "sirna_001",
"sequence": "AUCGAUCGAUCGAUCGAUCGA",
"target_gene": "TTR",
"strand_role": "guide",
"overhang": "dTdT",
"chem_mods": [
{"type": "2OMe", "positions": [1, 4, 6, 11]}
],
"provenance": {
"source_type": "designed",
"identifier": "sirnaforge_v1.0_TP53_candidate_001"
},
"confirmation_status": "pending",
"notes": "Top candidate for TP53 targeting"
}
}
Integration Opportunitiesο
A. Workflow Output Integrationο
Current State: Workflow generates CSV and FASTA outputs without modification metadata.
Integration Points:
Post-Design Annotation (
workflow.py::step4_generate_reports)After candidate generation, annotate top candidates with recommended modifications
Save metadata JSON alongside CSV outputs
Generate FASTA with modification headers for synthesis
CSV Export Enhancement
Add columns for modification metadata in candidate CSV
Include provenance information for traceability
Recommended Approach:
# In workflow.py after candidate selection
def _annotate_candidates_with_modifications(
candidates: list[SiRNACandidate],
modification_strategy: str = "standard_2ome"
) -> dict[str, StrandMetadata]:
"""Apply standard modification patterns to candidates."""
metadata = {}
for candidate in candidates:
# Example: Standard 2'-O-methyl pattern for stability
guide_mods = apply_modification_pattern(
candidate.guide_sequence,
pattern=modification_strategy
)
metadata[candidate.id] = StrandMetadata(
id=f"{candidate.id}_guide",
sequence=candidate.guide_sequence,
overhang="dTdT", # Standard DNA overhang
chem_mods=guide_mods,
provenance=Provenance(
source_type=SourceType.DESIGNED,
identifier=f"sirnaforge_v{__version__}_{candidate.transcript_id}_{candidate.id}"
),
confirmation_status=ConfirmationStatus.PENDING
)
return metadata
B. Annotation as Optional Outputο
User Control: Allow users to specify modification patterns via CLI/config.
# In DesignParameters model
modification_pattern: Optional[str] = Field(
default=None,
description="Chemical modification pattern to apply (standard_2ome, minimal, maximal, custom)"
)
modification_config: Optional[Path] = Field(
default=None,
description="Path to custom modification configuration JSON"
)
CLI Integration:
# Design with automatic modification annotation
sirnaforge design input.fasta -o output/ --modifications standard_2ome
# Custom modification pattern
sirnaforge design input.fasta -o output/ --modifications custom --modification-config mods.json
C. Separate Annotation Commandο
Keep design and annotation separate for flexibility:
# Design workflow (no modifications)
sirnaforge design input.fasta -o output/
# Annotate top candidates afterward
sirnaforge sequences annotate \
output/sirnaforge/TP53_pass.fasta \
modifications.json \
-o output/sirnaforge/TP53_pass_annotated.fasta
Advantages:
Maintains separation of concerns
Users can experiment with different modification patterns
No workflow slowdown for users who donβt need modifications
Recommended Integration Strategyο
Phase 1: Documentation and Examples (Immediate)ο
Goal: Make existing features more discoverable.
Add to Quick Start Guide
Section on βAdding Chemical Modification Metadataβ
Example workflow with modification annotation
Create Example Files
examples/modification_patterns/standard_2ome.jsonexamples/modification_patterns/fda_approved_onpattro.jsonexamples/modification_patterns/custom_pattern.json
Add Tutorial Notebook
examples/notebooks/chemical_modifications_tutorial.ipynbShows complete workflow with modifications
Phase 2: Workflow Integration (Short-term)ο
Goal: Provide optional modification annotation in main workflow.
Add to DesignParameters
export_modifications: bool = Field( default=False, description="Generate modification metadata files with outputs" )
Enhance Report Generation
When
export_modifications=True:Generate metadata JSON for top candidates
Create annotated FASTA with modification headers
Add modification columns to CSV output
Modification Pattern Library
Built-in patterns:
standard,minimal,maximalBased on literature (e.g., 2β-O-methyl at alternating positions)
User-definable custom patterns
Phase 3: Advanced Features (Long-term)ο
Goal: Intelligent modification recommendations.
Modification Predictor
Analyze sequence composition
Recommend modifications based on:
GC content (high GC β less modification needed)
Off-target risk (high risk β more specificity modifications)
Target tissue (liver β specific delivery modifications)
Synthesis Cost Estimation
Calculate approximate synthesis cost based on modifications
Help users balance efficacy vs. cost
Stability Prediction
Estimate serum stability based on modification pattern
Link to pharmacokinetics models
Implementation Patternsο
Pattern 1: Keep as Annotations (Recommended)ο
Description: Modifications remain as optional metadata annotations, not part of core design logic.
Pros:
Maintains clean separation of design vs. synthesis concerns
No workflow slowdown for users who donβt need modifications
Flexible - users can apply different patterns to same candidates
Cons:
Requires manual annotation step
Not integrated into scoring/ranking
Use When:
Users want flexibility in modification strategies
Modification patterns are experiment-specific
Focus is on sequence design, not synthesis planning
Pattern 2: Integrated Scoring (Advanced)ο
Description: Modification metadata influences candidate scoring.
Pros:
Holistic optimization (sequence + chemistry)
Can optimize for synthesis feasibility
Reflects real-world constraints
Cons:
Increased complexity
Slower computation
May over-constrain design space
Use When:
Synthesis costs are critical
Specific modification chemistry is required
Integration with synthesis platforms
Pattern 3: Hybrid Approach (Flexible)ο
Description: Core workflow generates unmodified designs; optional post-processing adds modifications.
Implementation:
# Step 1: Design (fast, no modifications)
results = workflow.run_design(...)
# Step 2: Optional annotation (if requested)
if config.export_modifications:
metadata = annotate_with_modifications(
results.top_candidates,
pattern=config.modification_pattern
)
save_modification_files(metadata, output_dir)
Pros:
Best of both worlds
No performance impact for basic usage
Advanced users get full features
Cons:
More code paths to maintain
Slight API complexity
Modification Pattern Libraryο
Standard Patterns (Examples)ο
1. Minimal (Cost-Optimized)ο
{
"name": "minimal",
"description": "Minimum modifications for basic stability",
"guide_pattern": {
"2OMe": {"positions": "terminal_3"} # Only 3' end protection
},
"passenger_pattern": {
"2OMe": {"positions": "terminal_5"} # Only 5' end protection
}
}
2. Standard (Balanced)ο
{
"name": "standard",
"description": "Standard pattern for most applications",
"guide_pattern": {
"2OMe": {"positions": "alternating"}, # Positions 1,3,5,7,...
"PS": {"positions": "terminal_dinucleotides"} # First 2 and last 2
},
"passenger_pattern": {
"2OMe": {"positions": "alternating"}
}
}
3. Maximal (High Stability)ο
{
"name": "maximal",
"description": "Maximum stability for difficult targets",
"guide_pattern": {
"2OMe": {"positions": "all"},
"PS": {"positions": "all_internucleotide"}
},
"passenger_pattern": {
"2OMe": {"positions": "all"}
}
}
Testing Strategyο
Test Categoriesο
Unit Tests (Existing - All Pass β)
Data model validation
Serialization/deserialization
Header parsing
Edge cases
Integration Tests (New - Recommended)
Workflow with modification export
FASTA annotation pipeline
Pattern application
Example Tests (New - Recommended)
Validate example JSON files
Tutorial notebook execution
CLI command examples
New Test Coverage Neededο
# tests/integration/test_modification_workflow.py
def test_design_with_modification_export():
"""Test complete workflow with modification metadata export."""
def test_annotation_pipeline():
"""Test annotation of existing candidates with modifications."""
def test_pattern_library():
"""Test built-in modification patterns."""
def test_custom_pattern():
"""Test user-defined custom modification pattern."""
Recommendations Summaryο
1. Immediate Actions (Low Effort, High Impact)ο
β Create Example Files
Standard modification patterns in
examples/modification_patterns/Real-world examples (FDA-approved siRNAs)
Custom pattern template
β Add to Documentation
Quick start section on modifications
API reference for modification functions
Best practices guide
β Add Integration Tests
Test modification annotation workflow
Validate pattern application
Ensure FASTA header roundtrip
2. Short-term Enhancements (Moderate Effort)ο
π§ Optional Workflow Integration
Add
export_modificationsflag to DesignParametersGenerate modification JSON alongside CSV outputs
Create annotated FASTA for synthesis
π§ Pattern Library
Implement built-in patterns (minimal, standard, maximal)
Support custom pattern loading from JSON
Validate patterns against sequence constraints
3. Long-term Vision (High Effort, High Value)ο
π Intelligent Recommendations
Modification predictor based on sequence features
Synthesis cost estimation
Stability prediction models
π Synthesis Integration
Export formats for synthesis platforms
Cost optimization
Batch ordering support
Conclusionο
siRNAforge has a solid foundation for chemical modification handling. The data models are well-designed, thoroughly tested, and production-ready. The main opportunity is to make these features more discoverable and optionally integrate them into the main workflow.
Recommended Approach:
Keep modifications as optional annotations (Pattern 3: Hybrid)
Add examples and documentation (immediate win)
Provide built-in patterns for common use cases
Let users experiment without workflow overhead
This approach maintains the toolβs flexibility while providing advanced users with the features they need for production siRNA design and synthesis planning.
Appendix: Common Modification Typesο
Type |
Full Name |
Position Preference |
Stability |
Cost |
Notes |
|---|---|---|---|---|---|
2OMe |
2β-O-methyl |
Alternating, all |
++ |
$ |
Most common, good balance |
2F |
2β-fluoro |
Pyrimidines |
+++ |
$$ |
Enhanced binding affinity |
PS |
Phosphorothioate |
Terminal internucleotides |
+++ |
$ |
Nuclease resistance |
LNA |
Locked Nucleic Acid |
Sparse (every 3-4nt) |
++++ |
$$$ |
Very high affinity, costly |
MOE |
2β-O-methoxyethyl |
Alternating |
++ |
$$ |
Improved PK/PD |
Key:
Stability: + (low) to ++++ (very high)
Cost: \( (standard) to \)$$ (premium)
Document Version: 1.0 Last Updated: 2025-10-24 Author: siRNAforge Review System