ancIBD.IO.ind_ibd

Class for post-processing pw. IBD list into a single summary dataframe for each pair of individuals, for various IBD length classes @ Author: Harald Ringbauer, 2020

Module Contents

Functions

filter_ibd_df(df[, min_cm, snp_cm, output])

Post Process ROH Dataframe. Filter to rows that are okay.

roh_statistic_df(df[, min_cm, col_lengthM])

Gives out Summary statistic of ROH df

roh_statistics_df(df[, min_cms, col_lengthM])

Gives out IBD df row summary statistics.

create_ind_ibd_df([ibd_data, min_cms, snp_cm, min_cm, ...])

Create dataframe with summary statistics for each individual.

create_ind_ibd_df_IBD2([ibd_data, min_cms, snp_cm, ...])

Create dataframe with summary statistics for each individual.

ind_all_ibd_df([path_ibd, col_lengthM, snp_cm, ...])

Create dataframe with all IBD for each indivdiual pair

ibd_lengths(df[, col_lengthM, string, sort, decimals, mpl])

Returns list of IBD lengths in IBD dataframe df. [in cM]

all_pairs_ibd(df_res, df_iid)

Return a new IBD dataframe with all possible IID pairs,

combine_all_chroms([chs, folder_base, path_save])

Combine All Chromosomes.

ancIBD.IO.ind_ibd.filter_ibd_df(df, min_cm=4, snp_cm=60, output=True)

Post Process ROH Dataframe. Filter to rows that are okay. min_cm: Minimum Length in CentiMorgan snp_cm: How many SNPs per CentiMorgan

ancIBD.IO.ind_ibd.roh_statistic_df(df, min_cm=0, col_lengthM='lengthM')

Gives out Summary statistic of ROH df

ancIBD.IO.ind_ibd.roh_statistics_df(df, min_cms=[8, 12, 16, 20], col_lengthM='lengthM')

Gives out IBD df row summary statistics. Return list of sum_roh, n_roh, max_roh for each of them [as list] min_cm: List of minimum IBD lengths [in cM]

ancIBD.IO.ind_ibd.create_ind_ibd_df(ibd_data='/n/groups/reich/hringbauer/git/yamnaya/output/ibd/v43/ch_all.tsv', min_cms=[8, 12, 16, 20], snp_cm=220, min_cm=6, sort_col=-1, savepath='', output=True)

Create dataframe with summary statistics for each individual. Return this novel dataframe in hapROH format [IBD in cM] ibd_data: If string, what ibd file to load. Or IBD dataframe. savepath: If given: Save post-processed IBD dataframe to there. min_cms: What IBD lengths to use as cutoff in analysis [cM]. snp_cm: Minimum Density of SNP per cM of IBD block. sort_col: Which min_cms col to use for sort. If <0 no sort conducted.

ancIBD.IO.ind_ibd.create_ind_ibd_df_IBD2(ibd_data='/n/groups/reich/hringbauer/git/yamnaya/output/ibd/v43/ch_all.tsv', min_cms=[8, 12, 16, 20], snp_cm=220, min_cm=6, sort_col=-1, savepath='', output=True)

Create dataframe with summary statistics for each individual. !!!This should only be used for ancIBD run with the IBD2 mode.!!! Return this novel dataframe in hapROH format [IBD in cM] ibd_data: If string, what ibd file to load. Or IBD dataframe. savepath: If given: Save post-processed IBD dataframe to there. min_cms: What IBD lengths to use as cutoff in analysis [cM]. Note that this filter only applies to IBD1. snp_cm: Minimum Density of SNP per cM of IBD block. Note that this filter only applies to IBD1. sort_col: Which min_cms col to use for sort. If <0 no sort conducted.

ancIBD.IO.ind_ibd.ind_all_ibd_df(path_ibd='/n/groups/reich/hringbauer/git/yamnaya/output/ibd/v43/ch_all.tsv', col_lengthM='lengthM', snp_cm=220, min_cm=5, output=True, sort=True, decimals=2, col_new='ibd', savepath='')

Create dataframe with all IBD for each indivdiual pair Return this novel dataframe in hapROH format [IBD in cM] path_ibd: What ibd file to load. snp_cm: Minimum Density of SNP per cM of IBD block. sort: If True sort by longest IBD decimals: To how many decimals to round

ancIBD.IO.ind_ibd.ibd_lengths(df, col_lengthM='lengthM', string=True, sort=True, decimals=2, mpl=100)

Returns list of IBD lengths in IBD dataframe df. [in cM] string: If True - return comma seperated string. sort: Whether to sort IBD list

ancIBD.IO.ind_ibd.all_pairs_ibd(df_res, df_iid)

Return a new IBD dataframe with all possible IID pairs, set to 0 IBD if not in IBD dataframe df_iid: where to take iids from df_res: IBD dataframe (standard format)

ancIBD.IO.ind_ibd.combine_all_chroms(chs=[], folder_base='PATH/ch', path_save='PATH/ch_all.tsv')

Combine All Chromosomes. chs: Which Chromosomes to run [list] folder_base: Where to load from (path part up to including ch) path_save: Where to save the combined file to.