ancIBD.run

Function for running hapBLOCK on single chromsome, with all relevant keywords. Function to run hapBLOCK on a full individual and full reference Dataset, with all relevant keywords. @Author: Harald Ringbauer, 2020

Module Contents

Functions

hapBLOCK_chrom([folder_in, iids, ch, folder_out, ...])

Run IBD for ONE pair of Individuals.

prep_param_list_chrom(folder_in[, iids, ch, ...])

Prepare parameter lists for multirun of hapBLOCK_chrom. Ideal for multi-processing,

hapBLOCK_chroms([folder_in, iids, run_iids, ch, ...])

Run IBD for list of Individuals, and saves their IBD csv into a single

get_sample_index(iids, sample)

Get Index of sample - check if really there

hapBLOCK_times([folder_in, iids, run_iids, ch, ...])

Run IBD for list of Individuals, and returns runtimes. Same as hapBLOCK_chroms (see docstring there) but also return runtimes. USED ONLY FOR BENCHMARKING

run_plot_pair([path_h5, iids, ch, xlim, folder_out, ...])

Run and plot IBD for pair of Individuals.

run_plot_pair_IBD2([path_h5, iids, ch, xlim, ...])

Run and plot IBD for pair of Individuals.

hapBLOCK_chrom_mixedPloidy([folder_in, iids, ploidy, ...])

Run IBD for one pair of samples, and return a dataframe of the called IBD, the posterior prob vector and r_vec.

hapBLOCK_chroms_mixedPloidy([folder_in, iids, ploidy, ...])

Run IBD for list of Individuals, and saves their IBD csv into a single

run_plot_pair_X([folder_in, iids, ploidy, xlim, plot, ...])

Run and plot IBD for pair of Individuals.

ancIBD.run.hapBLOCK_chrom(folder_in='./data/hdf5/1240k_v43/ch', iids=['', ''], ch=2, folder_out='', output=False, prefix_out='', logfile=False, l_model='h5', e_model='haploid_gl2', h_model='FiveStateScaled', t_model='standard', p_model='hapROH', p_col='variants/AF_ALL', ibd_in=1, ibd_out=10, ibd_jump=400, ibd_jump2=0.5, min_cm=2, min_error=0.001, cutoff_post=0.99, max_gap=0.0075, IBD2=False, cutoff_post2=0.975, min_cm2_init=1.0, min_cm2_after_merge=2.0)

Run IBD for ONE pair of Individuals. folder_in: hdf5 path up to chromosome. iids: List of IIDs to compare [length 2] folder_out: Where to save the hapBLOCK output to min_cm: Minimal block length to call and save [cM] savepath: Where to save the IBD plot to. p_col: The dataset to use in hdf5 for der. AF. If default use p=0.5. min_error: Caps min/max prob. haplotype being derived to min_error/1-min_error when loading data If empyt use in sample AF. Return df_ibd, posterior, map, tot_ll

ancIBD.run.prep_param_list_chrom(folder_in, iids=[], ch=3, folder_out='', output=True, logfile=False, prefix_out='default/', l_model='h5', e_model='haploid_gl2', h_model='FiveStateScaled', t_model='standard', p_col='variants/AF_ALL', ibd_in=1, ibd_out=1, ibd_jump=500, min_cm=2, cutoff_post=0.99, max_gap=0.0)

Prepare parameter lists for multirun of hapBLOCK_chrom. Ideal for multi-processing, as it gives a list of parameters - one for each iid pair.

ancIBD.run.hapBLOCK_chroms(folder_in='./data/hdf5/1240k_v43/ch', iids=[], run_iids=[], ch=2, folder_out='', output=False, prefix_out='', logfile=False, l_model='h5', e_model='haploid_gl2', h_model='FiveStateScaled', t_model='standard', p_model='hapROH', p_col='variants/AF_ALL', ibd_in=1, ibd_out=10, ibd_jump=400, ibd_jump2=0.5, min_cm=2, cutoff_post=0.99, max_gap=0.0075, IBD2=False, cutoff_post2=0.975, min_cm2_init=1.0, min_cm2_after_merge=2.0, mask='')

Run IBD for list of Individuals, and saves their IBD csv into a single output folder. folder_in: hdf5 path up to chromosome. iids: List of IIDs to load [k indivdiuals] run_iids: If given: list of IID pairs to run. If not run all pairs folder_out: Where to save the hapBLOCK output to min_cm: Minimal block length to call and save [cM] savepath: Where to save the IBD plot to. Return df_ibd, posterior, map, tot_ll

ancIBD.run.get_sample_index(iids, sample)

Get Index of sample - check if really there

ancIBD.run.hapBLOCK_times(folder_in='./data/hdf5/1240k_v43/ch', iids=[], run_iids=[], ch=2, folder_out='', output=False, prefix_out='', logfile=False, l_model='h5', e_model='haploid_gl', h_model='FiveStateScaled', t_model='standard', p_col='variants/AF_ALL', ibd_in=1, ibd_out=10, ibd_jump=400, min_cm=2, cutoff_post=0.99, max_gap=0.0075, processes=1)

Run IBD for list of Individuals, and returns runtimes. Same as hapBLOCK_chroms (see docstring there) but also return runtimes. USED ONLY FOR BENCHMARKING Return df_ibd, t1 [start time], t2 [after loading], t3 [after calculating trans rates], t4 [before saving]

ancIBD.run.run_plot_pair(path_h5='/n/groups/reich/hringbauer/git/hapBLOCK/data/hdf5/1240k_v43/ch', iids=['', ''], ch=2, xlim=[], folder_out='', plot=False, path_fig='', output=False, exact=True, ibd_in=1, ibd_out=10, ibd_jump=400, min_cm=2, cutoff_post=0.99, max_gap=0.0075, min_error=0.001, l_model='hdf5', e_model='haploid_gl', h_model='FiveStateScaled', p_col='variants/AF_ALL', title='', c='gray', c_hw='maroon', state=0, return_post=False, **kwargs)

Run and plot IBD for pair of Individuals. folder_out: Where to save the hapBLOCK output to iids: list of two iids [List of Length 2] path_fig: Where to save the IBD plot to [String] p_col: The dataset to use in hdf5 for der. AF. If default use p=0.5.

If empty string use in sample AF.

return_post: Whether to return posterior [Boolean] min_error: Caps min/max prob. haplotype being derived to min_error/1-min_error when loading data kwargs: Optional Keyword Arguments for Plotting (e.g. c_ibd)

ancIBD.run.run_plot_pair_IBD2(path_h5='/n/groups/reich/hringbauer/git/hapBLOCK/data/hdf5/1240k_v43/ch', iids=['', ''], ch=2, xlim=[], folder_out='', plot=False, path_fig='', output=False, exact=True, ibd_in=1, ibd_out=10, ibd_jump=400, min_cm1=8, cutoff_post=0.99, max_gap=0.0075, p_col='variants/AF_ALL', cutoff_post2=0.8, min_cm2_init=0.25, min_cm2_after_merge=4.0, title='', state=0, return_post=False)

Run and plot IBD for pair of Individuals. folder_out: Where to save the hapBLOCK output to iids: list of two iids [List of Length 2] path_fig: Where to save the IBD plot to [String] p_col: The dataset to use in hdf5 for der. AF. If default use p=0.5.

If empty string use in sample AF.

return_post: Whether to return posterior [Boolean] min_error: Caps min/max prob. haplotype being derived to min_error/1-min_error when loading data min_cm1: minimum length of IBD1 to call and visualize kwargs: Optional Keyword Arguments for Plotting (e.g. c_ibd)

ancIBD.run.hapBLOCK_chrom_mixedPloidy(folder_in='./data/hdf5/1240k_v43/ch', iids=[], ploidy=(2, 2), ch='X', output=False, logfile=False, p_col='variants/AF_ALL', ibd_in=1, ibd_out=10, ibd_jump=400, min_cm=2, cutoff_post=0.99, max_gap=0.0075, mask='')

Run IBD for one pair of samples, and return a dataframe of the called IBD, the posterior prob vector and r_vec. This function is intended for running ancIBD on X chromosome, where ploidy differs between males and females. folder_in: hdf5 path up to chromosome. iids: List of IIDs to load [k indivdiuals] ploidy: it can be either an integer or a list of integers. If it’s an integer, it must be either 1 or 2 and we assume that all samples have the same ploidy. If it’s a list then it must have the same length as iids and it specifies the ploidy of each sample. run_iids: If given: list of IID pairs to run. If not run all pairs folder_out: Where to save the hapBLOCK output to min_cm: Minimal block length to call and save [cM] savepath: Where to save the IBD plot to. Return df_ibd, posterior, map, tot_ll

ancIBD.run.hapBLOCK_chroms_mixedPloidy(folder_in='./data/hdf5/1240k_v43/ch', iids=[], ploidy=2, run_iids=[], ch=2, folder_out='', output=False, prefix_out='', logfile=False, p_col='variants/AF_ALL', ibd_in=1, ibd_out=10, ibd_jump=400, min_cm=2, cutoff_post=0.99, max_gap=0.0075, mask='')

Run IBD for list of Individuals, and saves their IBD csv into a single output folder. This function is intended for running ancIBD on X chromosome, where ploidy differs between males and females. folder_in: hdf5 path up to chromosome. iids: List of IIDs to load [k indivdiuals] ploidy: it can be either an integer or a list of integers. If it’s an integer, it must be either 1 or 2 and we assume that all samples have the same ploidy. If it’s a list then it must have the same length as iids and it specifies the ploidy of each sample. run_iids: If given: list of IID pairs to run. If not run all pairs folder_out: Where to save the hapBLOCK output to min_cm: Minimal block length to call and save [cM] savepath: Where to save the IBD plot to. Return df_ibd, posterior, map, tot_ll

ancIBD.run.run_plot_pair_X(folder_in='', iids=['', ''], ploidy=(2, 2), xlim=[], plot=False, gp_filter=0.99, path_fig='', output=False, exact=True, ibd_in=1, ibd_out=10, ibd_jump=400, min_cm=2, cutoff_post=0.99, max_gap=0.0075, p_col='variants/AF_ALL', title='', c='gray', c_hw='maroon', state=0, return_post=False, **kwargs)

Run and plot IBD for pair of Individuals. folder_out: Where to save the hapBLOCK output to iids: list of two iids [List of Length 2] path_fig: Where to save the IBD plot to [String] p_col: The dataset to use in hdf5 for der. AF. If default use p=0.5.

If empty string use in sample AF.

return_post: Whether to return posterior [Boolean] min_error: Caps min/max prob. haplotype being derived to min_error/1-min_error when loading data kwargs: Optional Keyword Arguments for Plotting (e.g. c_ibd)