calc_sumstats

calc_sumstats(ac, coords_dict, anc_demes_dict=None, ac_demes=None, ac_anc=None, between_anc_pop_sumstats=False, return_df=False, precision=6)

Calculates a suite of genetic summary statistics on allele counts matrices generated by filter_gt or otherwise generated through scikit-allel. The required input is an allele counts matrix for all individuals/demes, and a dictionary mapping deme IDs to their coordinates, generated by coords_to_deme_dict. Optional inputs are dictionaries mapping ancestral population IDs to their constituent demes (anc_demes_dict), an dictionary of allele counts matrices for each deme (ac_demes), and dictionary of allele counts matrices for each ancestral population (ac_anc).

Parameters

Name Type Description Default
ac np.ndarray An allele counts matrix for all individuals/demes. required
coords_dict dict A dictionary mapping deme IDs to their coordinates, generated by [coords_to_deme_dict][utilities.coords_to_deme_dict]. required
anc_demes_dict dict A dictionary mapping ancestral population IDs to their constituent demes. Defaults to None. None
ac_demes dict A dictionary mapping deme IDs to their allele counts matrices. Necessary if you want to calculate Fst or Dxy between demes. Defaults to None. None
ac_anc dict A dictionary mapping ancestral population IDs to their allel counts matrices. If provided, summary statistics are calculated within ancestral populations and not among them. Defaults to None. None
between_anc_pop_sumstats bool Whether to calculate Fst or Dxy between ancestral populations. Defaults to False. False
return_df bool Whether to return the summary statistics as a pandas DataFrame. Defaults to False. False
precision int The number of decimal places to round the summary statistics to. Defaults to 6. 6

Returns

Type Description
dict A dictionary of summary statistics.

Notes

This function calculates the following summary statistics, either species-wide or per ancestral population, if provided: - Site Frequency Spectrum Hill numbers (q1 and q2), corrected for the number of sites - Pi (nucleotide diversity) - Tajima’s D - Pairwise Dxy - If between_anc_pop_sumstats is True, also calculates pairwise Dxy and Hudson’s FST between ancestral populations - Pairwise Hudson’s FST - If between_anc_pop_sumstats is True, also calculates pairwise Dxy and Hudson’s FST between ancestral populations - Isolation-by-distance slope and R2 - Moran’s I