CLI

The spaceprime command-line interface (CLI) provides a streamlined way to run large batches of spatially explicit coalescent simulations without writing Python code. It is designed for two primary workflows: prior predictive simulation (drawing random parameter combinations for likelihood-free inference) and fixed-parameter replication (running many replicates under the same model).

Use cases

Running many simulations for ABC or ML-based inference Likelihood-free methods such as Approximate Bayesian Computation (ABC) and machine learning require hundreds to thousands of simulations drawn from prior distributions. The CLI supports this directly: any numeric argument accepts a [min, max] range, and --num_param_combos controls how many random draws to make. Each draw produces an independent demographic model whose parameters are recorded in the metadata CSV.

Parallel execution on a workstation or cluster Use --cpu to distribute parameter combinations across multiple cores. On an HPC cluster, combine --cpu with a job array: submit one job per chunk of --num_param_combos and merge the output CSVs afterward.

Diversity mapping across the landscape The --map flag simulates genetic diversity for every deme in the landscape and writes the result as a GeoTIFF raster, instead of per-sample outputs. This is useful for visualizing expected patterns of diversity under different demographic scenarios.

Reproducible runs via YAML configuration Pass --params path/to/config.yaml to read all arguments from a configuration file. This makes runs reproducible, version-controllable, and easy to share. See Configuration file for a template.

Quickstart

This example runs a single simulation with a linear habitat-to-deme transformation, a fixed migration rate, and all three output types (tree sequence, VCF, and summary statistics).

You need two input files:

habitat.tif — a single-band GeoTIFF of habitat suitability values (0–1)
samples.csv — a CSV with columns longitude and latitude

spaceprime \
  --raster habitat.tif \
  --coords samples.csv \
  --max_local_size 1000 \
  --mig_rate 0.01 \
  --merge_time 10000 \
  --mutation_rate 1e-8 \
  --seq_length 1000000 \
  --out_type 3 \
  --out_folder results/ \
  --out_prefix my_sim

This produces:

File	Contents
`my_sim_ancestry_<seed>.trees`	msprime tree sequence
`my_sim_vcf_<seed>.vcf`	VCF of simulated variants
`my_sim_sumstats.csv`	Genetic summary statistics
`my_sim_metadata.csv`	Parameters and seeds for each replicate
`spaceprime_<timestamp>.log`	Run log

Output seeds

Each output file name includes the ancestry seed used for that replicate, so you can reproduce any individual simulation later.

Advanced model setup

Prior-based simulation for inference

This example draws 500 random parameter combinations from prior ranges and runs each on 4 CPUs in parallel. This is a typical setup for training an ABC or neural network classifier.

spaceprime \
  --raster habitat.tif \
  --coords samples.csv \
  --max_local_size 500 5000 \
  --mig_rate 0.001 0.1 \
  --merge_time 1000 100000 \
  --mutation_rate 1e-9 1e-7 \
  --recombination_rate 0 1e-8 \
  --num_param_combos 500 \
  --num_coalescent_sims 1 \
  --seq_length 500000 \
  --out_type 2 \
  --out_folder results/ \
  --out_prefix abc_run \
  --cpu 4

Any argument that accepts a [min, max] pair (see Argument reference) treats that pair as a uniform prior. One random value is drawn from the range for each of the 500 combinations. The exact value used for every replicate is recorded in abc_run_metadata.csv, so you can reconstruct parameter—output pairs for training.

Model with ancestral populations

When sampling spans historically isolated lineages (e.g., glacial refugia), add ancestral populations that merge into the present-day landscape model at a specified time.

spaceprime \
  --raster habitat.tif \
  --coords samples.csv \
  --anc_pop_id anc_pop_ids.csv \
  --max_local_size 1000 \
  --mig_rate 0.01 \
  --merge_time 10000 \
  --anc_sizes 5000 5000 \
  --anc_merge_time 50000 \
  --anc_merge_size 10000 \
  --anc_mig_rate 0.001 \
  --out_type 2 \
  --out_folder results/ \
  --out_prefix anc_pop_run

anc_pop_ids.csv must have a column named anc_pop_id with one row per sample coordinate, assigning each sample to a numbered ancestral population.

Diversity map

To generate a per-deme diversity raster instead of per-sample outputs:

spaceprime \
  --raster habitat.tif \
  --coords samples.csv \
  --max_local_size 1000 \
  --mig_rate 0.01 \
  --merge_time 10000 \
  --map true \
  --map_sample_num 3 \
  --out_folder results/ \
  --out_prefix diversity_run

The output is a GeoTIFF (diversity_run_diversity_map_<seed>.tif) aligned to the input raster grid. --map overrides --out_type.

Using a YAML configuration file

For reproducible runs, write all arguments to a YAML file and pass it with --params:

spaceprime --params config.yaml

See the Configuration file section for a full template.

Configuration file

The templates/config.yaml file in the spaceprime repository provides a full template for all available parameters. Copy it and fill in your paths and values:

# spaceprime configuration file
# List entries: [entry1, entry2]
# Ranges (uniform priors): [min, max]
# Paths must be quoted: "path/to/file"
# Booleans: true or false

# --- global ---
raster: "path/to/habitat.tif"
coords: "path/to/samples.csv"
individuals: null  # optional: list of IDs or path to CSV with 'individual_id' column

# --- demography ---
normalize: false
transformation: "linear"   # linear | threshold | sigmoid
max_local_size: [1000]     # single value or [min, max] range
threshold: null            # required when transformation is 'threshold'
inflection_point: [0.5]    # used with sigmoid transformation
slope: [0.05]              # used with sigmoid transformation
mig_rate: [0.01]           # global migration rate, single value or [min, max]
scale: true
anc_pop_id: null           # path to CSV with 'anc_pop_id' column, or null
timesteps: 1
anc_sizes: null            # list of ints or list of [min, max] pairs, one per ancestral pop
merge_time: null           # generations; single value or [min, max]
anc_merge_time: null
anc_merge_size: null
anc_mig_rate: null

# --- simulation ---
seq_length: 1000000
mutation_rate: [1e-8]      # single value or [min, max]
recombination_rate: [0]
ploidy: 2
num_param_combos: 1
num_coalescent_sims: 1

# --- analysis ---
missing_data_perc: 0
r2_thresh: 0.1
filter_monomorphic: true
filter_singletons: true
sumstats: "all"            # pi | tajima_d | sfs_h | fst | dxy | ibd | all
within_anc_pop_sumstats: false
between_anc_pop_sumstats: false

# --- output ---
out_type: 3     # 0=trees, 1=VCF, 2=sumstats CSV, 3=all
map: false
map_sample_num: 2
out_folder: null  # defaults to current working directory
out_prefix: "spaceprime"
log_level: "INFO"   # DEBUG | INFO | WARNING | ERROR
cpu: 1

Note

When --params is provided, all command-line arguments are ignored in favour of the YAML file.

Argument reference

Global

Flag	Short	Type	Default	Description
`--params`	`-p`	str	`null`	Path to YAML config file. When provided, all other CLI arguments are ignored.
`--raster`	`-r`	str	—	Path to habitat suitability raster (any format readable by rasterio).
`--coords`	`-co`	str	—	Path to CSV of sampling coordinates. Must have `longitude` and `latitude` columns.
`--individuals`	`-i`	str/list	`null`	Individual IDs: a comma-separated list or path to a CSV with an `individual_id` column. Length must match `--coords`. Used to label VCF samples.

Demography setup

Flag	Short	Type	Default	Description
`--normalize`	`-n`	bool	`false`	Normalise raster values to [0, 1] before conversion to deme sizes.
`--transformation`	`-t`	str	`linear`	Function mapping habitat values to deme sizes. Options: `linear`, `threshold`, `sigmoid`.
`--max_local_size`	`-mls`	int	`1000`	Maximum deme size. Accepts a single int or a `[min, max]` range.
`--threshold`	`-th`	float	`null`	Habitat value below which demes are set to zero (threshold transformation). Single value or `[min, max]`.
`--inflection_point`	`-ip`	float	`0.5`	Inflection point of the sigmoid transformation. Single value or `[min, max]`.
`--slope`	`-s`	float	`0.05`	Slope of the sigmoid transformation. Single value or `[min, max]`.
`--mig_rate`	`-m`	float	`1e-8`	Global migration rate between adjacent demes. Single value or `[min, max]`.
`--scale`	`-sc`	bool	`true`	Scale migration by donor/recipient deme size: `m = (N_donor / N_recipient) * m_global`.
`--anc_pop_id`	`-a`	str/list	`null`	Ancestral population assignments. Path to a CSV with column `anc_pop_id`, or a comma-separated list. Length must equal the number of sampling coordinates.
`--timesteps`	`-ts`	int	`1`	Generations between demographic events (for multi-time-slice rasters).
`--anc_sizes`	`-as`	list	`null`	Sizes of ancestral populations, one per population. Each entry can be a single int or a `[min, max]` pair.
`--merge_time`	`-mt`	int	`null`	Generation at which demes collapse into ancestral populations. Single value or `[min, max]`.
`--anc_merge_time`	`-amt`	int	`null`	Generation at which ancestral populations merge into a root. Single value or `[min, max]`.
`--anc_merge_size`	`-ams`	int	`null`	Size of the merged ancestral root population. Single value or `[min, max]`.
`--anc_mig_rate`	`-amr`	float	`null`	Migration rate between ancestral populations. Single value or `[min, max]`.

Simulation setup

Flag	Short	Type	Default	Description
`--seq_length`	`-sl`	int	`1000000`	Simulated sequence length in base pairs. Accepts scientific notation (e.g. `1e6`).
`--mutation_rate`	`-mu`	float	`1e-8`	Mutation rate per base pair per generation. Single value or `[min, max]`.
`--recombination_rate`	`-rr`	float	`0`	Recombination rate per base pair per generation. Single value or `[min, max]`.
`--ploidy`	`-pl`	int	`2`	Ploidy of simulated individuals.
`--num_param_combos`	`-npc`	int	`1`	Number of random parameter combinations to draw. When > 1, any `[min, max]` argument is treated as a prior and sampled uniformly.
`--num_coalescent_sims`	`-ncs`	int	`1`	Coalescent replicates per parameter combination.

Analysis setup

Flag	Short	Type	Default	Description
`--missing_data_perc`	`-mdp`	float	`0`	Fraction of genotype data to mask as missing (0–1).
`--r2_thresh`	`-rt`	float	`0.1`	LD pruning threshold. Sites with R² above this value are removed.
`--filter_monomorphic`	`-fm`	bool	`true`	Remove monomorphic sites before computing summary statistics.
`--filter_singletons`	`-fs`	bool	`true`	Remove singleton sites before computing summary statistics.
`--sumstats`	`-ss`	list	`all`	Summary statistics to compute. Options: `pi`, `tajima_d`, `sfs_h`, `fst`, `dxy`, `ibd`, or `all`.
`--within_anc_pop_sumstats`	`-wap`	bool	`false`	Compute summary statistics separately within each ancestral population.
`--between_anc_pop_sumstats`	`-bap`	bool	`false`	Compute Fst and/or Dxy between ancestral populations.

Output

Flag	Short	Type	Default	Description
`--out_type`	`-ot`	int	`3`	Output format: `0` = tree sequences only, `1` = VCFs only, `2` = summary statistics CSV only, `3` = all outputs.
`--map`	`-map`	bool	`false`	Output a per-deme diversity GeoTIFF instead of per-sample files. Overrides `--out_type`.
`--map_sample_num`	`-msn`	int	`2`	Individuals sampled per deme when generating a diversity map. Higher values improve accuracy at the cost of speed.
`--out_folder`	`-of`	str	CWD	Directory for output files. Must already exist.
`--out_prefix`	`-op`	str	`spaceprime`	Prefix applied to all output file names.
`--log_level`	`-ll`	str	`INFO`	Logging verbosity: `DEBUG`, `INFO`, `WARNING`, or `ERROR`. Log is written to `<out_folder>/spaceprime_<timestamp>.log`.
`--cpu`	`-c`	int	`1`	Number of CPUs for parallel execution. Each CPU processes one parameter combination at a time.

Range arguments

Any argument listed as accepting a [min, max] range behaves as a fixed value when --num_param_combos 1 (the default). A range only has effect when --num_param_combos > 1, in which case one value is sampled uniformly from [min, max] for each combination.