filter_gt

filter_gt(gt, deme_dict_inds=None, deme_dict_anc=None, missing_data_perc=0, r2_thresh=0.1, filter_monomorphic=True, filter_singletons=True)

Filter genotype matrices output by ts.genotype_matrix() to filter out monomorphic sites, loci in linkage disequilibrium, and recreate missing data patterns common to empirical genotype data. Returns the genotype matrix and allele counts matrix for the filtered loci, and optionally allele counts matrices for demes and ancestral populations.

Parameters

Name Type Description Default
gt np.ndarray The genotype matrix. required
deme_dict_inds dict A dictionary containing the indices of individuals in each deme. Defaults to None. None
deme_dict_anc dict A dictionary containing the indices of individuals in each ancestral population. Defaults to None. None
missing_data_perc float The percentage of missing data allowed. Defaults to 0. 0
r2_thresh float The threshold for linkage disequilibrium. Defaults to 0.1. 0.1
filter_monomorphic bool Whether to filter out monomorphic sites, keeping only segregating sites. Defaults to True. True
filter_singletons bool Whether to filter out singletons. Defaults to True. True

Returns

Type Description
Tuple[allel.GenotypeArray, allel.AlleleCountsArray, Optional[Dict[str, allel.AlleleCountsArray]], Optional[Dict[str, allel.AlleleCountsArray]]] A tuple containing the filtered genotype matrix, the allele counts matrix, a dictionary of allele counts matrices for demes (if deme_dict_inds is provided), and a dictionary of allele counts matrices for ancestral populations (if deme_dict_anc is provided).

Notes

This function uses a random mask to simulate missing data in the genotype matrix. For reproducibility it’s advised to set a np.random.seed() before calling this function.