Custom Scoring Tutorial
This tutorial shows how to develop and integrate custom scoring functions for specialized siRNA applications.
Understanding Thermodynamic Asymmetry Scoring
What is Thermodynamic Asymmetry?
The thermodynamic asymmetry score is a critical component in siRNA prediction tools that measures the stability difference between the 5’ and 3’ ends of the guide and passenger strands of the siRNA duplex. This scoring mechanism is based on the fundamental principle that siRNAs function more effectively when the guide strand (antisense) is preferentially incorporated into the RNA-induced silencing complex (RISC).
The Science Behind Asymmetry Scoring
The asymmetry score predicts which strand of the siRNA duplex will be preferentially loaded into RISC. According to thermodynamic principles:
The 5’ end of the guide strand should be less thermodynamically stable than the 3’ end
This asymmetry promotes correct orientation in RISC
Proper strand selection is crucial for successful RNA interference
The underlying mechanism works because RISC preferentially incorporates the strand whose 5’ end is less stable, leading to unwinding of the duplex from the less stable end.
Research Foundation
The importance of thermodynamic asymmetry in siRNA design is supported by extensive research:
Key Publications
Khvorova A et al. (2003) - “Functional siRNAs and miRNAs exhibit strand bias”
Early identification of thermodynamic asymmetry importance for guide strand selection into RISC
Demonstrated that RISC incorporation is mediated by thermodynamic asymmetry at position 1 of the duplex
Showed that the less stable 5′ nucleotide is preferentially selected
Naito Y et al. (2009) - “Thermodynamic Stability and Watson-Crick Base Pairing in the Seed Duplex Are Major Determinants of the Efficiency of the siRNA-Based Off-Target Effect”
Strong evidence for the relevance of thermodynamic stability in siRNA efficacy
Underscores the importance of considering thermodynamic parameters in predicting siRNA efficiency
Amarzguioui M and Prydz H (2004) - “Designing siRNA That Distinguish between Genes That Differ by a Single Nucleotide”
Emphasized the relevance of thermodynamic asymmetry between sense and antisense siRNA strands
Identified asymmetry as a critical parameter in designing effective siRNAs
Ichihara M et al. (2017) - “Simple principles for predicting siRNA efficacy”
Comprehensive criteria for effectively predicting siRNA efficacy
Includes thermodynamic asymmetry as a key predictive feature
Implementation in siRNAforge
In siRNAforge, thermodynamic asymmetry scoring is implemented as part of the comprehensive scoring system:
class ThermodynamicAsymmetryScorer:
"""Calculate thermodynamic asymmetry score for siRNA duplex"""
def calculate_asymmetry_score(self, guide_sequence: str, passenger_sequence: str) -> float:
"""
Calculate asymmetry score based on 5' vs 3' end stability differences
Args:
guide_sequence: Guide strand sequence (antisense)
passenger_sequence: Passenger strand sequence (sense)
Returns:
Asymmetry score (0-1, higher indicates better asymmetry)
"""
# Calculate 5' end stability (positions 1-4)
guide_5_prime = guide_sequence[:4]
passenger_5_prime = passenger_sequence[:4]
# Calculate 3' end stability (positions -4 to -1)
guide_3_prime = guide_sequence[-4:]
passenger_3_prime = passenger_sequence[-4:]
# Calculate ΔG for each end
delta_g_5_prime = self._calculate_duplex_energy(guide_5_prime, passenger_5_prime)
delta_g_3_prime = self._calculate_duplex_energy(guide_3_prime, passenger_3_prime)
# Asymmetry favors less stable 5' end
asymmetry = delta_g_3_prime - delta_g_5_prime
# Normalize to 0-1 scale
return self._normalize_asymmetry_score(asymmetry)
Scoring Components
The thermodynamic asymmetry score in siRNAforge incorporates several key measurements:
5’ End Stability: Free energy of the first 4 nucleotides of the duplex
3’ End Stability: Free energy of the last 4 nucleotides of the duplex
Asymmetry Ratio: Difference in stability between 3’ and 5’ ends
Strand Bias Prediction: Likelihood of correct guide strand selection
Comprehensive Thermodynamic Metrics in siRNAforge
siRNAforge evaluates multiple thermodynamic parameters to predict siRNA efficacy. Each metric contributes to the overall assessment of siRNA quality:
GC Content (gc_content)
Optimal Range: 35% to 60% (ideally 40-55%)
Rationale: Balance between duplex stability (enhanced by G:C base pairs) and target accessibility (enabled by A:U base pairs)
Too Low (<35%): Insufficient duplex stability, poor RISC loading
Too High (>60%): Overly stable duplex, reduced target accessibility
Implementation: Calculated as percentage of G and C nucleotides in guide sequence
Asymmetry Score (asymmetry_score)
Optimal Range: 0.65 to 1.0 (higher is better)
Definition: Measure of thermodynamic stability difference between 5’ and 3’ ends of siRNA duplex
Mechanism: Higher asymmetry directs guide strand into RISC and prevents passenger strand loading
Calculation: Based on ΔG difference between duplex ends (ΔG₃’ - ΔG₅’)
Research Basis: Khvorova et al. (2003) demonstrated this is critical for proper strand selection
Paired Fraction (paired_fraction)
Optimal Range: 0.4 to 0.8 (moderate pairing preferred)
Definition: Fraction of nucleotides in guide strand paired with passenger strand in predicted secondary structure
Considerations:
High values (>0.8): Overly stable duplex, reduced strand separation in RISC
Low values (<0.4): Insufficient duplex stability for effective loading
Moderate values: Optimal balance for RISC processing
Minimum Free Energy (MFE) (mfe)
Optimal Range: -2 to -8 kcal/mol (moderate stability)
Definition: Free energy of the most stable RNA secondary structure
Interpretation:
Too negative (<-10 kcal/mol): Overly stable, resistant to RISC processing
Too positive (>0 kcal/mol): Unstable duplex, poor RNAi activity
Moderate values: Optimal for effective strand separation and target binding
Duplex Stability (duplex_stability_dg)
Optimal Range: -15 to -25 kcal/mol
Definition: Total free energy change (ΔG) for siRNA duplex formation
Balance Required:
Too stable (very negative ΔG): Difficult RISC loading and strand separation
Too unstable (less negative ΔG): Poor duplex formation and activity
Normalized Score:
duplex_stability_scoreconverts ΔG to 0-1 scale
End Stability Differences
5’ End ΔG (
dg_5p): Free energy at guide strand 5’ end3’ End ΔG (
dg_3p): Free energy at guide strand 3’ endEnd Difference (
delta_dg_end): Stability difference (ΔG₃’ - ΔG₅’)
Optimal Characteristics:
Positive
delta_dg_end: 5’ end less stable than 3’ end (preferred)Target Range: +2 to +6 kcal/mol difference
Mechanism: Promotes correct guide strand orientation in RISC
Melting Temperature (melting_temp_c)
Optimal Range: 60-78°C for human cells
Definition: Temperature at which 50% of duplexes dissociate
Considerations:
High Tm (>80°C): Overly stable duplexes resist strand separation
Low Tm (<50°C): Unstable duplexes with poor activity
System-specific: Optimal values vary by cell type and organism
Off-target Metrics
Off-target Count (
off_target_count): Number of potential off-target sitesTranscript Hits (
transcript_hit_count,transcript_hit_fraction): Target specificity measuresGoal: Minimize off-target binding while maintaining on-target efficacy
Acceptable Range: <5 high-confidence off-targets for most applications
Interpreting Asymmetry Scores
Score 0.8-1.0: Excellent asymmetry, strong guide strand bias
Score 0.6-0.8: Good asymmetry, likely correct strand selection
Score 0.4-0.6: Moderate asymmetry, may have mixed strand loading
Score 0.0-0.4: Poor asymmetry, risk of passenger strand incorporation
Integrated Scoring Guidelines
When evaluating siRNA candidates, consider the combined profile of all thermodynamic metrics:
High-Quality siRNA Profile
gc_content: 40-55%
asymmetry_score: 0.7-1.0
paired_fraction: 0.5-0.7
mfe: -4 to -7 kcal/mol
duplex_stability_dg: -18 to -22 kcal/mol
delta_dg_end: +2 to +5 kcal/mol
melting_temp_c: 65-75°C
off_target_count: 0-3
Acceptable siRNA Profile
gc_content: 35-60%
asymmetry_score: 0.6-0.7
paired_fraction: 0.4-0.8
mfe: -2 to -8 kcal/mol
duplex_stability_dg: -15 to -25 kcal/mol
delta_dg_end: +1 to +6 kcal/mol
melting_temp_c: 60-78°C
off_target_count: 0-5
Poor Quality Indicators
GC content <30% or >65%
Asymmetry score <0.5
Paired fraction >0.9 (overly stable) or <0.3 (unstable)
MFE <-10 kcal/mol (too stable) or >0 kcal/mol (unstable)
Delta_dg_end <0 (wrong asymmetry direction)
High off-target count (>10)
Context-Dependent Optimization
Important Note: Optimal values can vary based on:
Cell Type: Different cellular environments affect RNA stability
Organism: Species-specific RNA processing differences
Delivery Method: Transfection efficiency affects required stability
Target Gene: Secondary structure and accessibility vary
Experimental Conditions: Temperature, buffer conditions, etc.
Cell Type Considerations
# Example: Adjust parameters for different cell types
def get_cell_specific_thresholds(cell_type: str) -> dict:
"""Get optimized thresholds for specific cell types"""
thresholds = {
'human_hela': {
'gc_min': 40, 'gc_max': 55,
'asymmetry_min': 0.7,
'tm_min': 65, 'tm_max': 75
},
'mouse_3t3': {
'gc_min': 35, 'gc_max': 60,
'asymmetry_min': 0.65,
'tm_min': 60, 'tm_max': 78
},
'plant_arabidopsis': {
'gc_min': 45, 'gc_max': 65,
'asymmetry_min': 0.6,
'tm_min': 55, 'tm_max': 70
}
}
return thresholds.get(cell_type, thresholds['human_hela'])
Custom Scoring Framework
Understanding the Scoring System
siRNAforge uses a modular scoring framework where multiple components contribute to the final composite score:
class CompositeScorer:
"""Combines multiple scoring components"""
def __init__(self):
self.scorers = {
'thermodynamic_asymmetry': ThermodynamicAsymmetryScorer(),
'gc_content': GCContentScorer(),
'secondary_structure': SecondaryStructureScorer(),
'off_target': OffTargetScorer()
}
self.weights = {
'thermodynamic_asymmetry': 0.3, # 30% of total score
'gc_content': 0.2, # 20% of total score
'secondary_structure': 0.2, # 20% of total score
'off_target': 0.3 # 30% of total score
}
Implementing Custom Scorers
You can create custom scoring functions by extending the base scorer class:
from sirnaforge.core.scoring import BaseScorer
from sirnaforge.models.sirna import SiRNACandidate
class CustomThermodynamicScorer(BaseScorer):
"""Custom thermodynamic scoring with specialized parameters"""
def calculate_score(self, candidate: SiRNACandidate) -> float:
"""
Calculate custom thermodynamic score
Args:
candidate: siRNA candidate to score
Returns:
Score between 0-1 (higher is better)
"""
# Access sequence information
guide_seq = candidate.guide_sequence
passenger_seq = candidate.passenger_sequence
# Custom asymmetry calculation
asymmetry_score = self._calculate_custom_asymmetry(guide_seq, passenger_seq)
# Additional thermodynamic factors
melting_temp_score = self._score_melting_temperature(guide_seq)
duplex_stability_score = self._score_duplex_stability(guide_seq, passenger_seq)
# Combine components with custom weights
final_score = (
0.5 * asymmetry_score +
0.3 * melting_temp_score +
0.2 * duplex_stability_score
)
return final_score
def _calculate_custom_asymmetry(self, guide: str, passenger: str) -> float:
"""Implement custom asymmetry calculation"""
# Your custom algorithm here
pass
Integrating Machine Learning Models
Advanced users can integrate machine learning models for sophisticated scoring:
import joblib
from sklearn.ensemble import RandomForestRegressor
class MLThermodynamicScorer(BaseScorer):
"""Machine learning-based thermodynamic scoring"""
def __init__(self, model_path: str):
"""Load pre-trained ML model"""
self.model = joblib.load(model_path)
def calculate_score(self, candidate: SiRNACandidate) -> float:
"""Score using ML model"""
# Extract features
features = self._extract_thermodynamic_features(candidate)
# Predict using ML model
score = self.model.predict([features])[0]
# Ensure score is in valid range
return max(0.0, min(1.0, score))
def _extract_thermodynamic_features(self, candidate: SiRNACandidate) -> list:
"""Extract features for ML model"""
features = []
# Basic sequence features
features.extend([
candidate.gc_content,
len(candidate.guide_sequence),
candidate.position
])
# Thermodynamic features
features.extend([
self._calculate_base_asymmetry(candidate),
self._calculate_duplex_stability(candidate),
self._calculate_end_stabilities(candidate)
])
return features
Validation and Performance
Validating Custom Scores
Always validate your custom scoring functions:
def validate_scorer(scorer, test_candidates):
"""Validate custom scorer performance"""
scores = []
for candidate in test_candidates:
score = scorer.calculate_score(candidate)
# Check score validity
assert 0.0 <= score <= 1.0, f"Invalid score: {score}"
scores.append(score)
# Check score distribution
import numpy as np
print(f"Score range: {np.min(scores):.3f} - {np.max(scores):.3f}")
print(f"Score mean: {np.mean(scores):.3f}")
print(f"Score std: {np.std(scores):.3f}")
Performance Optimization
For large-scale analysis, optimize your scoring functions:
from functools import lru_cache
class OptimizedAsymmetryScorer(BaseScorer):
"""Performance-optimized asymmetry scorer"""
@lru_cache(maxsize=10000)
def _calculate_duplex_energy(self, seq1: str, seq2: str) -> float:
"""Cached energy calculation"""
# Expensive calculation cached for repeated sequences
return self._compute_energy(seq1, seq2)
def calculate_score_batch(self, candidates: list) -> list:
"""Batch processing for better performance"""
# Process multiple candidates efficiently
return [self.calculate_score(c) for c in candidates]
Advanced Topics
Research Integration
When developing scoring functions, always reference current research:
Literature Review: Stay current with siRNA efficacy research
Experimental Validation: Test against known effective siRNAs
Benchmarking: Compare against established scoring methods
Publication: Consider publishing novel scoring approaches
Future Directions
The field of siRNA scoring continues to evolve:
Deep Learning Models: Neural networks for complex pattern recognition
Multi-Modal Scoring: Integration of sequence, structure, and expression data
Species-Specific Models: Tailored scoring for different organisms
Real-Time Optimization: Adaptive scoring based on experimental feedback
For more advanced implementations, refer to:
API Reference for complete API documentation
siRNAforge Architecture for scoring system design
src/sirnaforge/core/design.pyfor current scoring implementation