Python API Tutorialο
This tutorial demonstrates how to use siRNAforge programmatically through its Python API.
Learning Objectivesο
Use siRNAforge classes and functions directly in Python
Automate siRNA design workflows
Customize scoring and filtering algorithms
Integrate with existing Python data analysis pipelines
Prerequisitesο
siRNAforge installed in development mode
Basic Python programming experience
Familiarity with basic siRNAforge concepts
For complete class and method documentation, see the API Reference.
Getting Startedο
Import Core Componentsο
# Core design functionality
from sirnaforge.core.design import SiRNADesigner
from sirnaforge.models.sirna import DesignParameters, FilterCriteria
# Data access
from sirnaforge.data.gene_search import GeneSearcher, DatabaseType
from sirnaforge.workflow import run_sirna_workflow
# Utilities
from pathlib import Path
import asyncio
import pandas as pd
Tip
Hover over class names in your IDE to see docstrings, or check the API Reference for full documentation.
Basic API Usageο
1. Simple siRNA Designο
from sirnaforge.core.design import SiRNADesigner
from sirnaforge.models.sirna import DesignParameters
# Configure design parameters
# See API docs for all DesignParameters options:
# https://sirnaforge.readthedocs.io/en/latest/api_reference.html#sirnaforge.models.sirna.DesignParameters
params = DesignParameters(
sirna_length=21,
top_candidates=20,
gc_content_range=(30.0, 60.0)
)
# Create designer instance
designer = SiRNADesigner(params)
# Design from sequences
sequences = [
"AUGCGCGAUCUCGAUGCAUGUGCGAUCGAUGCGUAUCGAU",
"GCCAUGCGAUCGAUGCGUAUCAUCGAUGCGCGAUGCUGUGAU"
]
results = designer.design_from_sequences(sequences)
# Access results (SiRNADesignResult object)
print(f"Generated {len(results.candidates)} siRNA candidates")
for candidate in results.top_candidates:
print(f"{candidate.sirna_id}: {candidate.guide_sequence} (score: {candidate.composite_score:.1f})")
See the API Reference for complete documentation of DesignParameters, SiRNADesigner, and SiRNACandidate.
2. Design from FASTA Fileο
from pathlib import Path
# Design from file
input_file = Path("transcripts.fasta")
results = designer.design_from_file(input_file)
# Save results
output_file = Path("sirna_results.tsv")
results.save_tsv(output_file)
print(f"Results saved to {output_file}")
Gene Search APIο
The gene search system retrieves transcripts from genomic databases. See the API Reference for complete GeneSearcher documentation.
Synchronous Gene Searchο
from sirnaforge.data.gene_search import search_gene_sync, DatabaseType
# Simple gene search
result = search_gene_sync("TP53", database=DatabaseType.ENSEMBL)
if result.success:
print(f"Found {len(result.transcripts)} transcripts for {result.gene_info.gene_name}")
# Access transcript information
for transcript in result.transcripts:
print(f" {transcript.transcript_id}: {len(transcript.sequence)} bp")
if transcript.is_canonical:
print(" (Canonical transcript)")
else:
print(f"Search failed: {result.error}")
See the API Reference for details on GeneSearchResult, TranscriptInfo, and GeneInfo.
Asynchronous Gene Searchο
import asyncio
from sirnaforge.data.gene_search import GeneSearcher
async def search_multiple_genes():
searcher = GeneSearcher()
genes = ["TP53", "BRCA1", "EGFR", "MYC"]
results = {}
for gene in genes:
print(f"Searching for {gene}...")
result = await searcher.search_gene(gene, DatabaseType.ENSEMBL)
results[gene] = result
return results
# Run the async search
results = asyncio.run(search_multiple_genes())
# Process results
for gene, result in results.items():
if result.success:
canonical_count = sum(1 for t in result.transcripts if t.is_canonical)
print(f"{gene}: {len(result.transcripts)} transcripts ({canonical_count} canonical)")
See the API Reference for details on async methods like GeneSearcher.search_gene and GeneSearcher.search_multiple_databases.
Advanced API Usageο
Custom Scoringο
siRNAforge allows you to implement custom scoring algorithms by extending the base scorer class. This is useful for research-specific requirements or novel siRNA design principles.
See the API Reference for complete documentation of the BaseScorer interface and built-in scorer implementations.
from sirnaforge.core.design import BaseScorer, SiRNADesigner
from sirnaforge.models.sirna import SiRNACandidate, DesignParameters
class CustomScorer(BaseScorer):
"""Custom scoring algorithm for specific research needs.
This example prioritizes GC content and 5' position over structure.
Extend BaseScorer and implement calculate_score() to create your own.
"""
def __init__(self, weight_gc=0.3, weight_position=0.2, weight_structure=0.5):
self.weight_gc = weight_gc
self.weight_position = weight_position
self.weight_structure = weight_structure
def calculate_score(self, candidate: SiRNACandidate) -> float:
"""Calculate custom composite score.
Args:
candidate: SiRNACandidate with thermodynamic properties
Returns:
Composite score between 0-10
"""
# GC content scoring (optimal around 45%)
gc_score = 1.0 - abs(candidate.gc_content - 45.0) / 45.0
# Position scoring (prefer 5' region)
position_score = max(0, 1.0 - candidate.position / 1000)
# Structure scoring (from thermodynamics)
structure_score = candidate.thermodynamic_score
# Weighted combination
composite = (
self.weight_gc * gc_score +
self.weight_position * position_score +
self.weight_structure * structure_score
)
return composite * 10.0 # Scale to 0-10
# Use custom scorer
custom_scorer = CustomScorer(weight_gc=0.4, weight_structure=0.6)
designer = SiRNADesigner(DesignParameters(), scorer=custom_scorer)
For more complex scoring, see the Custom Scoring Tutorial. The API Reference provides details on ThermodynamicAnalyzer for thermodynamic calculations.
Configuration and Customizationο
Configuration in siRNAforge is handled through Pydantic models for type safety and validation. See the API Reference for complete documentation of DesignParameters and FilterCriteria with all available options and validation rules.
Using Configuration Modelsο
from sirnaforge.models.sirna import DesignParameters, FilterCriteria
# Create parameters with validation
params = DesignParameters(
sirna_length=21,
top_candidates=20,
filters=FilterCriteria(
gc_min=30.0,
gc_max=60.0,
min_composite_score=5.0,
max_poly_runs=4
)
)
# Parameters are validated automatically
designer = SiRNADesigner(params)
Environment-Based Configurationο
import os
from sirnaforge.models.sirna import DesignParameters
# Read from environment variables
gc_min = float(os.getenv("SIRNAFORGE_GC_MIN", "30.0"))
gc_max = float(os.getenv("SIRNAFORGE_GC_MAX", "60.0"))
top_n = int(os.getenv("SIRNAFORGE_TOP_N", "20"))
params = DesignParameters(
sirna_length=21,
top_candidates=top_n,
gc_content_range=(gc_min, gc_max)
)
Validation: All parameters are validated by Pydantic. Invalid values raise
ValidationErrorwith clear messages.
Testing Your Codeο
Unit Testing API Usageο
import pytest
from unittest.mock import patch, MagicMock
from sirnaforge.core.design import SiRNADesigner
from sirnaforge.models.sirna import DesignParameters
def test_custom_api_usage():
"""Test custom API usage patterns."""
# Test parameters
params = DesignParameters(
sirna_length=21,
top_candidates=10,
gc_content_range=(35.0, 50.0)
)
designer = SiRNADesigner(params)
# Test with known sequences
test_sequences = [
"AUGCGCGAUCUCGAUGCAUGUGCGAUCGAUGCGUAUCGAU" * 2 # 80 nt
]
results = designer.design_from_sequences(test_sequences)
# Assertions
assert len(results.candidates) > 0
assert all(35.0 <= c.gc_content <= 50.0 for c in results.candidates)
assert all(len(c.guide_sequence) == 21 for c in results.candidates)
@patch('sirnaforge.data.gene_search.httpx.AsyncClient')
async def test_mock_api_integration(mock_client):
"""Test API integration with mocked responses."""
# Mock successful response
mock_response = MagicMock()
mock_response.json.return_value = {
"id": "ENSG00000141510",
"display_name": "TP53"
}
mock_response.raise_for_status.return_value = None
mock_client.return_value.__aenter__.return_value.get.return_value = mock_response
# Test search
from sirnaforge.data.gene_search import GeneSearcher
searcher = GeneSearcher()
result = await searcher.search_gene("TP53")
# Verify mock was called
assert mock_client.called
Best Practicesο
1. Resource Managementο
import asyncio
from contextlib import asynccontextmanager
@asynccontextmanager
async def sirna_session():
"""Managed session for siRNAforge operations."""
searcher = GeneSearcher()
try:
yield searcher
finally:
await searcher.close() # Clean up resources
# Usage
async def main():
async with sirna_session() as searcher:
result = await searcher.search_gene("TP53")
# Process result
2. Progress Monitoringο
from tqdm import tqdm
import time
def batch_design_with_progress(genes, output_dir):
"""Batch design with progress monitoring."""
results = {}
for gene in tqdm(genes, desc="Processing genes"):
try:
# Add small delay to show progress
time.sleep(0.1)
result = robust_sirna_design(gene)
results[gene] = result
# Update progress description
tqdm.write(f"β
Completed {gene}")
except Exception as e:
tqdm.write(f"β Failed {gene}: {e}")
results[gene] = None
return results
3. Memory-Efficient Processingο
def stream_large_fasta(file_path, chunk_size=1000):
"""Process large FASTA files in chunks."""
from Bio import SeqIO
with open(file_path, 'r') as handle:
sequences = []
for record in SeqIO.parse(handle, "fasta"):
sequences.append(str(record.seq))
if len(sequences) >= chunk_size:
yield sequences
sequences = []
# Yield remaining sequences
if sequences:
yield sequences
# Usage
designer = SiRNADesigner(DesignParameters())
all_results = []
for chunk in stream_large_fasta("large_transcripts.fasta", chunk_size=100):
chunk_results = designer.design_from_sequences(chunk)
all_results.extend(chunk_results.candidates)
# Optional: Save intermediate results
print(f"Processed chunk, total candidates: {len(all_results)}")
Next Stepsο
After mastering the Python API:
Usage Examples - Complex multi-step analyses and automation
Pipeline Integration - Nextflow pipeline development
Custom Scoring - Advanced algorithm development
API Reference - Complete class and method documentation
Developer Guide - Contributing to siRNAforge
Key API Modulesο
For complete module documentation, see the API Reference:
Core Design:
sirnaforge.core.design- Main design engineModels:
sirnaforge.models.sirna- Data models and schemasGene Search:
sirnaforge.data.gene_search- Database accessThermodynamics:
sirnaforge.core.thermodynamics- RNA folding analysisWorkflow:
sirnaforge.workflow- High-level orchestration
The Python API provides the flexibility to create sophisticated, automated siRNA design workflows tailored to your specific research needs.