ancIBD.loaddata

Class containing functions to load data needed for HMM IBD Run. Returns all relevant data for HMM in standardized format: 1) Haplotype Probabilities (for ancestral allele) 2) Allele Frequencies 3) Genetic Map @ Author: Harald Ringbauer, September 2020

Module Contents

Classes

LoadData

Class to load data in uniform format

LoadSimulated

Class to load simulated data saved in standard format.

LoadHDF5

Class to load HDF data for 2 individuals in standard format.

LoadHDF5Multi

Class to load HDF5 data for multiple individuals.

LoadH5Multi2

Update to more accurate genotype probabilities

Functions

load_loaddata([l_model, path])

Factory method to return the right loading Model

class ancIBD.loaddata.LoadData(path='')

Bases: object

Class to load data in uniform format

path = ''
output = True
abstract return_p(**kwargs)

Return array of Allele Frequencies [l]

abstract return_map(**kwargs)

Return genetic map [l] in Morgan

abstract return_haplotypes_ll(**kwargs)

Return haplotype likelihoods [4,l,2]

load_all_data(**kwargs)

Load all haplotype likelihoods haplotype likelihoods [2*n,l,2] derived allele frequencies [l] map in Morgan [l]

check_valid_data(htsl, p, m)

Check whether data in valid format

set_params(**kwargs)

Set the Parameters. Takes keyworded arguments

class ancIBD.loaddata.LoadSimulated(path='')

Bases: LoadData

Class to load simulated data saved in standard format.

r_gap = 1.0
r_path = ''
load_all_data(**kwargs)

Load all haplotype likelihoods haplotype likelihoods [2*n,l,2] derived allele frequencies [l] map in Morgan [l]

class ancIBD.loaddata.LoadHDF5(path='')

Bases: LoadData

Class to load HDF data for 2 individuals in standard format.

path_h5 = ''
iids = []
ch = 3
min_error = 1e-05
p_col = 'variants/AF_ALL'
return_map(f)

Return the recombination map

return_p(f)

Return array of Allele Frequencies [l] self.p_col: The HDF5 field with the allele frequencies If not given, use default p=0.5

get_individual_idx(f, iid='', f_col='samples')

Return index of individual iid

get_haplo_prob(f, idx)

Get haploid ancestral probability for indivual [2,l]

load_all_data(**kwargs)

Return haplotype likelihoods [4,l] for anc. allele derived allele frequencies [l] map in Morgan [l]

class ancIBD.loaddata.LoadHDF5Multi(path='')

Bases: LoadHDF5

Class to load HDF5 data for multiple individuals. Default now also for only pairs of individuals.

path_h5 = ''
iids = []
ch = 3
min_error = 1e-05
p_col = ''
ploidy = 2
get_haplo_prob(f, idcs)

Get haploid ancestral probability for n individuals Return [n,l,2] array

filter_valid_data(hts, p, m, bp)

Filter to SNPs with fully valid data. Return filtered data.

load_all_data(**kwargs)

Return haplotype likelihoods [n*2,l] for anc. allele. along first axis: 2*i, 2*(i+1) haplotype of ind i derived allele frequencies [l] map in Morgan [l] bp positions [l]

get_p(htsl)

Get Allele frequency from haplotype probabilities. Return array of derived allele freqs [l]

get_p_hdf5(f, col)

Get allele frequs from HDF f in dataset col. Return array of derived allele freqs [l]

class ancIBD.loaddata.LoadH5Multi2(path='')

Bases: LoadHDF5Multi

Update to more accurate genotype probabilities from diploid to haploid.

path_h5 = ''
iids = []
ch = 3
min_error = 0.001
pph_error = 0.01
p_col = ''
get_haplo_prob(f, idcs, ploidy=2)

Get haploid ancestral probability for n individuals Return [n,l,2] array. Calculated from GP and GT in the proper way. ploidy: it can be either an integer or an array of integers when it’s an integer, it must be either 1 or 2 and then we assume that all individuals have the same ploidy when it’s an array, it must have the same length as idcs and then it specifies the ploidy of each individual

ancIBD.loaddata.load_loaddata(l_model='simulated', path='', **kwargs)

Factory method to return the right loading Model