`ancIBD.IO.batch_run`

Various Functions to prepare parameters for a batched run on a cluster Splits up individuals into batches, run all pw. batches with ancIBD, and then collect results. Functions here splits up input into batches, in a standardized way @ Author: Harald Ringbauer, 2023

Module Contents

Functions

`get_iids`([path_meta, min_snps])	Return list of iids to run
`get_batch_idcs`(i, batch_size)	Return the Index of batch and the within batch index
`get_batch_pair_idx`(i, batch_nr)	Return the Index of the two batches two run,
`create_savepath`([folder_base, ch, b1, b2, output])	Create the savepath in standardized output format
`clean_double`([run_idc, output])	Removes self comparisons as well as double comparisons.
`get_idx_batch`(b, batch_size)	Return all the indices of Indivdiuals of batch
`get_unique_iid_pairs`(iids, b1, b2, batch_size)	Get List of unique iid pairs to run.
`get_run_lists_batch`([i, k, batch_size, output])	Get IID list and Run lists for batches of samples.
`save_ibd_df`(df_ibd, savepath[, create])	Saves IBD Dataframe
`get_run_params_from_i`(i[, metapath, batch_size, ...])	Return the run parameters for run i
`join_chromosomes`(base_path[, chs, file_out, output])	Join different Chromosomes together and save output.
`to_ind_df_batch`(b1, b2[, folder_out, chs, min_cms, ...])	Post-process a batch of individals.
`to_ind_df_batches`([batches, folder_out, chs, min_cms, ...])	Runs multiple combinations of batches (wrapper of single batch function and then combines the processed batches.
`to_ibd_df_batches`([batches, folder_out, chs, min_cms, ...])	Runs multiple combinations of batches (wrapper of single batch function and then combines the processed batches.
`print_runid_missing`([b, folder_out, output])	Finds and prints indices of missing output (chXX.tsv) for batchwise runs.
`find_output_missing`([metapath, folder_out, ...])	Return List of all run nr.s that are missing.
`get_batch_nr`(n_iids[, batchsize, n_chr])	Get the number of jobs to submit to cluster.

ancIBD.IO.batch_run.get_iids(path_meta='/n/groups/reich/hringbauer/git/yamnaya/data/meta_v2.tsv', min_snps=600000): Return list of iids to run

ancIBD.IO.batch_run.get_batch_idcs(i, batch_size): Return the Index of batch and the within batch index

ancIBD.IO.batch_run.get_batch_pair_idx(i, batch_nr): Return the Index of the two batches two run, using only triangular comparisons

ancIBD.IO.batch_run.create_savepath(folder_base='', ch=1, b1=1, b2=2, output=True): Create the savepath in standardized output format

ancIBD.IO.batch_run.clean_double(run_idc=[], output=True): Removes self comparisons as well as double comparisons. Input: list of pair-wise IIDs to run Return cleaned list.

ancIBD.IO.batch_run.get_idx_batch(b, batch_size): Return all the indices of Indivdiuals of batch

ancIBD.IO.batch_run.get_unique_iid_pairs(iids, b1, b2, batch_size): Get List of unique iid pairs to run. Return list of unique pairs, and list of all iids

ancIBD.IO.batch_run.get_run_lists_batch(i=67, k=3500, batch_size=400, output=True): Get IID list and Run lists for batches of samples. i: Run number k: number of total indivdiuals batch_size: number of individuals in one batch Return batch indices

ancIBD.IO.batch_run.save_ibd_df(df_ibd, savepath, create=True): Saves IBD Dataframe

ancIBD.IO.batch_run.get_run_params_from_i(i, metapath='./data/iid_lists/iid_ibd_eurasia_v1.tsv', batch_size=400, min_snps=0, output=True, folder_out='/n/groups/reich/hringbauer/git/ibd_euro/output/ibd/v1/'): Return the run parameters for run i min_snps: Minimum number of SNPs covered (for potential filtering on meta file) Returns iids, run_iids, and the output folder

ancIBD.IO.batch_run.join_chromosomes(base_path, chs=range(1, 23), file_out='ch_all.tsv', output=True): Join different Chromosomes together and save output. Return joined dataframe. file_out: If given, save the joint file with that name into the base_path folder

ancIBD.IO.batch_run.to_ind_df_batch(b1, b2, folder_out='/n/groups/reich/hringbauer/git/ibd_euro/output/ibd/v1/', chs=range(1, 23), min_cms=[8, 12, 16, 20], snp_cm=220, min_cm=8, output=False): Post-process a batch of individals. Returns individal IBD dataframe

ancIBD.IO.batch_run.to_ind_df_batches(batches=8, folder_out='/n/groups/reich/hringbauer/git/ibd_euro/output/ibd/v1/', chs=range(1, 23), min_cms=[8, 12, 16, 20], snp_cm=220, min_cm=8, output=False, savepath=''): Runs multiple combinations of batches (wrapper of single batch function and then combines the processed batches. Postprocess IBD to individal summary dataframe. Return merged IBD dataframe batches: If int: create all possible combinations. Otherwise needs to be array [n,2] of all pairs to run. savepath: If given, save IBD dataframe to there

ancIBD.IO.batch_run.to_ibd_df_batches(batches=8, folder_out='/n/groups/reich/hringbauer/git/ibd_euro/output/ibd/v1/', chs=range(1, 23), min_cms=[8, 12, 16, 20], snp_cm=220, min_cm=8, output=False, savepath=''): Runs multiple combinations of batches (wrapper of single batch function and then combines the processed batches. Postprocess IBD to individal summary dataframe. Return merged IBD dataframe batches: If int: create all possible combinations. Otherwise needs to be array [n,2] of all pairs to run. savepath: If given, save IBD dataframe to there

ancIBD.IO.batch_run.print_runid_missing(b=1, folder_out='', output=False): Finds and prints indices of missing output (chXX.tsv) for batchwise runs. Return list of missing indices. Ideal for rerunning batch scripts. Uses C Indexing as would be used in submission script.

ancIBD.IO.batch_run.find_output_missing(metapath='', folder_out='', batch_size=400, rge=[10, 20]): Return List of all run nr.s that are missing. metapath: Path to .tsv of IIDs run for IBD screening [str] folder_out: Output folder [str] batch_size: How many individuals have been run per batch [int]

ancIBD.IO.batch_run.get_batch_nr(n_iids, batchsize=400, n_chr=22): Get the number of jobs to submit to cluster. Return n [int]

ancIBD.IO.batch_run

Module Contents

Functions

`ancIBD.IO.batch_run`