CLI
The spaceprime command-line interface (CLI) provides a streamlined way to run large batches of spatially explicit coalescent simulations without writing Python code. It is designed for two primary workflows: prior predictive simulation (drawing random parameter combinations for likelihood-free inference) and fixed-parameter replication (running many replicates under the same model).
Use cases
Running many simulations for ABC or ML-based inference Likelihood-free methods such as Approximate Bayesian Computation (ABC) and machine learning require hundreds to thousands of simulations drawn from prior distributions. The CLI supports this directly: any numeric argument accepts a [min, max] range, and --num_param_combos controls how many random draws to make. Each draw produces an independent demographic model whose parameters are recorded in the metadata CSV.
Parallel execution on a workstation or cluster Use --cpu to distribute parameter combinations across multiple cores. On an HPC cluster, combine --cpu with a job array: submit one job per chunk of --num_param_combos and merge the output CSVs afterward.
Diversity mapping across the landscape The --map flag simulates genetic diversity for every deme in the landscape and writes the result as a GeoTIFF raster, instead of per-sample outputs. This is useful for visualizing expected patterns of diversity under different demographic scenarios.
Reproducible runs via YAML configuration Pass --params path/to/config.yaml to read all arguments from a configuration file. This makes runs reproducible, version-controllable, and easy to share. See Configuration file for a template.
Quickstart
This example runs a single simulation with a linear habitat-to-deme transformation, a fixed migration rate, and all three output types (tree sequence, VCF, and summary statistics).
You need two input files:
habitat.tif— a single-band GeoTIFF of habitat suitability values (0–1)samples.csv— a CSV with columnslongitudeandlatitude
spaceprime \
--raster habitat.tif \
--coords samples.csv \
--max_local_size 1000 \
--mig_rate 0.01 \
--merge_time 10000 \
--mutation_rate 1e-8 \
--seq_length 1000000 \
--out_type 3 \
--out_folder results/ \
--out_prefix my_simThis produces:
| File | Contents |
|---|---|
my_sim_ancestry_<seed>.trees |
msprime tree sequence |
my_sim_vcf_<seed>.vcf |
VCF of simulated variants |
my_sim_sumstats.csv |
Genetic summary statistics |
my_sim_metadata.csv |
Parameters and seeds for each replicate |
spaceprime_<timestamp>.log |
Run log |
Each output file name includes the ancestry seed used for that replicate, so you can reproduce any individual simulation later.
Advanced model setup
Prior-based simulation for inference
This example draws 500 random parameter combinations from prior ranges and runs each on 4 CPUs in parallel. This is a typical setup for training an ABC or neural network classifier.
spaceprime \
--raster habitat.tif \
--coords samples.csv \
--max_local_size 500 5000 \
--mig_rate 0.001 0.1 \
--merge_time 1000 100000 \
--mutation_rate 1e-9 1e-7 \
--recombination_rate 0 1e-8 \
--num_param_combos 500 \
--num_coalescent_sims 1 \
--seq_length 500000 \
--out_type 2 \
--out_folder results/ \
--out_prefix abc_run \
--cpu 4Any argument that accepts a [min, max] pair (see Argument reference) treats that pair as a uniform prior. One random value is drawn from the range for each of the 500 combinations. The exact value used for every replicate is recorded in abc_run_metadata.csv, so you can reconstruct parameter—output pairs for training.
Model with ancestral populations
When sampling spans historically isolated lineages (e.g., glacial refugia), add ancestral populations that merge into the present-day landscape model at a specified time.
spaceprime \
--raster habitat.tif \
--coords samples.csv \
--anc_pop_id anc_pop_ids.csv \
--max_local_size 1000 \
--mig_rate 0.01 \
--merge_time 10000 \
--anc_sizes 5000 5000 \
--anc_merge_time 50000 \
--anc_merge_size 10000 \
--anc_mig_rate 0.001 \
--out_type 2 \
--out_folder results/ \
--out_prefix anc_pop_runanc_pop_ids.csv must have a column named anc_pop_id with one row per sample coordinate, assigning each sample to a numbered ancestral population.
Diversity map
To generate a per-deme diversity raster instead of per-sample outputs:
spaceprime \
--raster habitat.tif \
--coords samples.csv \
--max_local_size 1000 \
--mig_rate 0.01 \
--merge_time 10000 \
--map true \
--map_sample_num 3 \
--out_folder results/ \
--out_prefix diversity_runThe output is a GeoTIFF (diversity_run_diversity_map_<seed>.tif) aligned to the input raster grid. --map overrides --out_type.
Using a YAML configuration file
For reproducible runs, write all arguments to a YAML file and pass it with --params:
spaceprime --params config.yamlSee the Configuration file section for a full template.
Configuration file
The templates/config.yaml file in the spaceprime repository provides a full template for all available parameters. Copy it and fill in your paths and values:
# spaceprime configuration file
# List entries: [entry1, entry2]
# Ranges (uniform priors): [min, max]
# Paths must be quoted: "path/to/file"
# Booleans: true or false
# --- global ---
raster: "path/to/habitat.tif"
coords: "path/to/samples.csv"
individuals: null # optional: list of IDs or path to CSV with 'individual_id' column
# --- demography ---
normalize: false
transformation: "linear" # linear | threshold | sigmoid
max_local_size: [1000] # single value or [min, max] range
threshold: null # required when transformation is 'threshold'
inflection_point: [0.5] # used with sigmoid transformation
slope: [0.05] # used with sigmoid transformation
mig_rate: [0.01] # global migration rate, single value or [min, max]
scale: true
anc_pop_id: null # path to CSV with 'anc_pop_id' column, or null
timesteps: 1
anc_sizes: null # list of ints or list of [min, max] pairs, one per ancestral pop
merge_time: null # generations; single value or [min, max]
anc_merge_time: null
anc_merge_size: null
anc_mig_rate: null
# --- simulation ---
seq_length: 1000000
mutation_rate: [1e-8] # single value or [min, max]
recombination_rate: [0]
ploidy: 2
num_param_combos: 1
num_coalescent_sims: 1
# --- analysis ---
missing_data_perc: 0
r2_thresh: 0.1
filter_monomorphic: true
filter_singletons: true
sumstats: "all" # pi | tajima_d | sfs_h | fst | dxy | ibd | all
within_anc_pop_sumstats: false
between_anc_pop_sumstats: false
# --- output ---
out_type: 3 # 0=trees, 1=VCF, 2=sumstats CSV, 3=all
map: false
map_sample_num: 2
out_folder: null # defaults to current working directory
out_prefix: "spaceprime"
log_level: "INFO" # DEBUG | INFO | WARNING | ERROR
cpu: 1When --params is provided, all command-line arguments are ignored in favour of the YAML file.
Argument reference
Global
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--params |
-p |
str | null |
Path to YAML config file. When provided, all other CLI arguments are ignored. |
--raster |
-r |
str | — | Path to habitat suitability raster (any format readable by rasterio). |
--coords |
-co |
str | — | Path to CSV of sampling coordinates. Must have longitude and latitude columns. |
--individuals |
-i |
str/list | null |
Individual IDs: a comma-separated list or path to a CSV with an individual_id column. Length must match --coords. Used to label VCF samples. |
Demography setup
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--normalize |
-n |
bool | false |
Normalise raster values to [0, 1] before conversion to deme sizes. |
--transformation |
-t |
str | linear |
Function mapping habitat values to deme sizes. Options: linear, threshold, sigmoid. |
--max_local_size |
-mls |
int | 1000 |
Maximum deme size. Accepts a single int or a [min, max] range. |
--threshold |
-th |
float | null |
Habitat value below which demes are set to zero (threshold transformation). Single value or [min, max]. |
--inflection_point |
-ip |
float | 0.5 |
Inflection point of the sigmoid transformation. Single value or [min, max]. |
--slope |
-s |
float | 0.05 |
Slope of the sigmoid transformation. Single value or [min, max]. |
--mig_rate |
-m |
float | 1e-8 |
Global migration rate between adjacent demes. Single value or [min, max]. |
--scale |
-sc |
bool | true |
Scale migration by donor/recipient deme size: m = (N_donor / N_recipient) * m_global. |
--anc_pop_id |
-a |
str/list | null |
Ancestral population assignments. Path to a CSV with column anc_pop_id, or a comma-separated list. Length must equal the number of sampling coordinates. |
--timesteps |
-ts |
int | 1 |
Generations between demographic events (for multi-time-slice rasters). |
--anc_sizes |
-as |
list | null |
Sizes of ancestral populations, one per population. Each entry can be a single int or a [min, max] pair. |
--merge_time |
-mt |
int | null |
Generation at which demes collapse into ancestral populations. Single value or [min, max]. |
--anc_merge_time |
-amt |
int | null |
Generation at which ancestral populations merge into a root. Single value or [min, max]. |
--anc_merge_size |
-ams |
int | null |
Size of the merged ancestral root population. Single value or [min, max]. |
--anc_mig_rate |
-amr |
float | null |
Migration rate between ancestral populations. Single value or [min, max]. |
Simulation setup
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--seq_length |
-sl |
int | 1000000 |
Simulated sequence length in base pairs. Accepts scientific notation (e.g. 1e6). |
--mutation_rate |
-mu |
float | 1e-8 |
Mutation rate per base pair per generation. Single value or [min, max]. |
--recombination_rate |
-rr |
float | 0 |
Recombination rate per base pair per generation. Single value or [min, max]. |
--ploidy |
-pl |
int | 2 |
Ploidy of simulated individuals. |
--num_param_combos |
-npc |
int | 1 |
Number of random parameter combinations to draw. When > 1, any [min, max] argument is treated as a prior and sampled uniformly. |
--num_coalescent_sims |
-ncs |
int | 1 |
Coalescent replicates per parameter combination. |
Analysis setup
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--missing_data_perc |
-mdp |
float | 0 |
Fraction of genotype data to mask as missing (0–1). |
--r2_thresh |
-rt |
float | 0.1 |
LD pruning threshold. Sites with R² above this value are removed. |
--filter_monomorphic |
-fm |
bool | true |
Remove monomorphic sites before computing summary statistics. |
--filter_singletons |
-fs |
bool | true |
Remove singleton sites before computing summary statistics. |
--sumstats |
-ss |
list | all |
Summary statistics to compute. Options: pi, tajima_d, sfs_h, fst, dxy, ibd, or all. |
--within_anc_pop_sumstats |
-wap |
bool | false |
Compute summary statistics separately within each ancestral population. |
--between_anc_pop_sumstats |
-bap |
bool | false |
Compute Fst and/or Dxy between ancestral populations. |
Output
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--out_type |
-ot |
int | 3 |
Output format: 0 = tree sequences only, 1 = VCFs only, 2 = summary statistics CSV only, 3 = all outputs. |
--map |
-map |
bool | false |
Output a per-deme diversity GeoTIFF instead of per-sample files. Overrides --out_type. |
--map_sample_num |
-msn |
int | 2 |
Individuals sampled per deme when generating a diversity map. Higher values improve accuracy at the cost of speed. |
--out_folder |
-of |
str | CWD | Directory for output files. Must already exist. |
--out_prefix |
-op |
str | spaceprime |
Prefix applied to all output file names. |
--log_level |
-ll |
str | INFO |
Logging verbosity: DEBUG, INFO, WARNING, or ERROR. Log is written to <out_folder>/spaceprime_<timestamp>.log. |
--cpu |
-c |
int | 1 |
Number of CPUs for parallel execution. Each CPU processes one parameter combination at a time. |
Any argument listed as accepting a [min, max] range behaves as a fixed value when --num_param_combos 1 (the default). A range only has effect when --num_param_combos > 1, in which case one value is sampled uniformly from [min, max] for each combination.