calc_sumstats
calc_sumstats(ac, coords_dict, anc_demes_dict=None, ac_demes=None, ac_anc=None, between_anc_pop_sumstats=False, return_df=False, precision=6)
Calculates a suite of genetic summary statistics on allele counts matrices generated by filter_gt
or otherwise generated through scikit-allel. The required input is an allele counts matrix for all individuals/demes, and a dictionary mapping deme IDs to their coordinates, generated by coords_to_deme_dict
. Optional inputs are dictionaries mapping ancestral population IDs to their constituent demes (anc_demes_dict
), an dictionary of allele counts matrices for each deme (ac_demes
), and dictionary of allele counts matrices for each ancestral population (ac_anc
).
Parameters
Name | Type | Description | Default |
---|---|---|---|
ac |
np.ndarray | An allele counts matrix for all individuals/demes. | required |
coords_dict |
dict | A dictionary mapping deme IDs to their coordinates, generated by [coords_to_deme_dict][utilities.coords_to_deme_dict]. | required |
anc_demes_dict |
dict | A dictionary mapping ancestral population IDs to their constituent demes. Defaults to None. | None |
ac_demes |
dict | A dictionary mapping deme IDs to their allele counts matrices. Necessary if you want to calculate Fst or Dxy between demes. Defaults to None. | None |
ac_anc |
dict | A dictionary mapping ancestral population IDs to their allel counts matrices. If provided, summary statistics are calculated within ancestral populations and not among them. Defaults to None. | None |
between_anc_pop_sumstats |
bool | Whether to calculate Fst or Dxy between ancestral populations. Defaults to False. | False |
return_df |
bool | Whether to return the summary statistics as a pandas DataFrame. Defaults to False. | False |
precision |
int | The number of decimal places to round the summary statistics to. Defaults to 6. | 6 |
Returns
Type | Description |
---|---|
dict | A dictionary of summary statistics. |
Notes
This function calculates the following summary statistics, either species-wide or per ancestral population, if provided: - Site Frequency Spectrum Hill numbers (q1 and q2), corrected for the number of sites - Pi (nucleotide diversity) - Tajima’s D - Pairwise Dxy - If between_anc_pop_sumstats
is True, also calculates pairwise Dxy and Hudson’s FST between ancestral populations - Pairwise Hudson’s FST - If between_anc_pop_sumstats
is True, also calculates pairwise Dxy and Hudson’s FST between ancestral populations - Isolation-by-distance slope and R2 - Moran’s I