Benchmarking Overview
Purpose
Understanding the computational requirements of spaceprime simulations helps users:
- Plan resources: Estimate time and memory needed for different simulation scenarios
- Choose parameters wisely: Understand trade-offs between model complexity and compute cost
- Identify bottlenecks: See which operations dominate runtime for different configurations
- Scale appropriately: Know when to use HPC resources vs. local machines
Key Parameters Affecting Performance
spaceprime performance is primarily affected by:
Grid Size (Number of Demes)
The number of demes is the product of grid rows × columns. This affects:
- Migration matrix: O(n²) memory and computation, where n = number of demes
- Demography setup: Linear to quadratic scaling with deme count
- Simulation time: Depends on population structure and sample sizes
| Grid | Demes |
|---|---|
| 5×5 | 25 |
| 10×10 | 100 |
| 20×20 | 400 |
| 30×30 | 900 |
| 60×60 | 3,600 |
| 100×100 | 10,000 |
Maximum Local Size
The max_local_size parameter sets the upper bound for deme population sizes. Larger populations:
- Increase coalescence times
- May require more simulation time
- Have minimal effect on setup time
Migration Rate
Migration rates affect simulation dynamics but have minimal impact on setup time. Higher rates lead to:
- More migration events during simulation
- Potentially faster coalescence (less population structure)
Time Slices
Multiple time slices (3D deme arrays) add temporal dynamics:
- Each time slice adds demographic events
- Migration matrices may be recalculated
- Linear scaling with number of time slices
Timesteps
The timesteps parameter controls the time between demographic events:
- Affects simulation dynamics, not setup time
- Shorter timesteps = more events = longer setup
Methodology
Our benchmarks measure three key operations:
1. Migration Matrix Calculation
from spaceprime.utilities import calc_migration_matrix
mig_matrix = calc_migration_matrix(demes, rate=0.01, scale=True)Time complexity: O(n²) where n = number of demes
2. Demography Setup
from spaceprime import spDemography
demo = spDemography()
demo.stepping_stone_2d(d=demes, rate=0.01, scale=True, timesteps=100)
demo.add_ancestral_populations(anc_sizes=[10000], merge_time=10000)This includes:
- Creating population objects
- Setting up migration relationships
- Adding demographic events (for multi-slice scenarios)
- Adding ancestral populations
3. Ancestry Simulation
import msprime
ts = msprime.sim_ancestry(
samples=sample_dict,
demography=demo,
sequence_length=10000,
ploidy=2,
)Simulation time depends on:
- Number of samples
- Population sizes and structure
- Sequence length
- Migration dynamics
Memory Tracking
We use Python’s tracemalloc to measure peak memory usage:
import tracemalloc
tracemalloc.start()
# ... run benchmark ...
current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()Key memory consumers:
- Migration matrix: n² × 8 bytes (float64)
- Deme arrays: rows × cols × time_slices × 8 bytes
- msprime internals: Proportional to population count
Running Benchmarks
Quick Test
cd benchmarking
python run_benchmarks.py --config quickFull Suite
python run_benchmarks.py --config defaultCustom Output
python run_benchmarks.py --config full --output results/my_benchmark.csvAnalyzing Results
Generate visualizations from benchmark results:
python analyze_results.py results/benchmark_default_*.csv --output figures/This produces:
scaling_by_demes.png: Log-log scaling plotheatmap_grid_time.png: Grid size × time slices heatmapparameter_effects.png: Individual parameter effectstime_breakdown.png: Time by component
Configuration Options
Three benchmark configurations are available:
Quick (--config quick)
Fast validation run:
- Grid sizes: 5×5, 10×10
- Deme sizes: 1000
- Migration rates: 0.01
- Time slices: 1, 5
- 1 replicate
Default (--config default)
Comprehensive benchmarking:
- Grid sizes: 5×5 to 30×30
- Deme sizes: 100 to 10000
- Migration rates: 0.001, 0.01, 0.1
- Time slices: 1, 5, 10, 20
- 3 replicates
Full (--config full)
Exhaustive analysis:
- More grid sizes and parameter values
- 5 replicates for statistical power
See the Results page for detailed findings.