ancIBD.IO.batch_run
Various Functions to prepare parameters for a batched run on a cluster Splits up individuals into batches, run all pw. batches with ancIBD, and then collect results. Functions here splits up input into batches, in a standardized way @ Author: Harald Ringbauer, 2023
Module Contents
Functions
|
Return list of iids to run |
|
Return the Index of batch and the within batch index |
|
Return the Index of the two batches two run, |
|
Create the savepath in standardized output format |
|
Removes self comparisons as well as double comparisons. |
|
Return all the indices of Indivdiuals of batch |
|
Get List of unique iid pairs to run. |
|
Get IID list and Run lists for batches of samples. |
|
Saves IBD Dataframe |
|
Return the run parameters for run i |
|
Join different Chromosomes together and save output. |
|
Post-process a batch of individals. |
|
Runs multiple combinations of batches (wrapper of single batch function and then combines the processed batches. |
|
Runs multiple combinations of batches (wrapper of single batch function and then combines the processed batches. |
|
Finds and prints indices of missing output (chXX.tsv) for batchwise runs. |
|
Return List of all run nr.s that are missing. |
|
Get the number of jobs to submit to cluster. |
- ancIBD.IO.batch_run.get_iids(path_meta='/n/groups/reich/hringbauer/git/yamnaya/data/meta_v2.tsv', min_snps=600000)
Return list of iids to run
- ancIBD.IO.batch_run.get_batch_idcs(i, batch_size)
Return the Index of batch and the within batch index
- ancIBD.IO.batch_run.get_batch_pair_idx(i, batch_nr)
Return the Index of the two batches two run, using only triangular comparisons
- ancIBD.IO.batch_run.create_savepath(folder_base='', ch=1, b1=1, b2=2, output=True)
Create the savepath in standardized output format
- ancIBD.IO.batch_run.clean_double(run_idc=[], output=True)
Removes self comparisons as well as double comparisons. Input: list of pair-wise IIDs to run Return cleaned list.
- ancIBD.IO.batch_run.get_idx_batch(b, batch_size)
Return all the indices of Indivdiuals of batch
- ancIBD.IO.batch_run.get_unique_iid_pairs(iids, b1, b2, batch_size)
Get List of unique iid pairs to run. Return list of unique pairs, and list of all iids
- ancIBD.IO.batch_run.get_run_lists_batch(i=67, k=3500, batch_size=400, output=True)
Get IID list and Run lists for batches of samples. i: Run number k: number of total indivdiuals batch_size: number of individuals in one batch Return batch indices
- ancIBD.IO.batch_run.save_ibd_df(df_ibd, savepath, create=True)
Saves IBD Dataframe
- ancIBD.IO.batch_run.get_run_params_from_i(i, metapath='./data/iid_lists/iid_ibd_eurasia_v1.tsv', batch_size=400, min_snps=0, output=True, folder_out='/n/groups/reich/hringbauer/git/ibd_euro/output/ibd/v1/')
Return the run parameters for run i min_snps: Minimum number of SNPs covered (for potential filtering on meta file) Returns iids, run_iids, and the output folder
- ancIBD.IO.batch_run.join_chromosomes(base_path, chs=range(1, 23), file_out='ch_all.tsv', output=True)
Join different Chromosomes together and save output. Return joined dataframe. file_out: If given, save the joint file with that name into the base_path folder
- ancIBD.IO.batch_run.to_ind_df_batch(b1, b2, folder_out='/n/groups/reich/hringbauer/git/ibd_euro/output/ibd/v1/', chs=range(1, 23), min_cms=[8, 12, 16, 20], snp_cm=220, min_cm=8, output=False)
Post-process a batch of individals. Returns individal IBD dataframe
- ancIBD.IO.batch_run.to_ind_df_batches(batches=8, folder_out='/n/groups/reich/hringbauer/git/ibd_euro/output/ibd/v1/', chs=range(1, 23), min_cms=[8, 12, 16, 20], snp_cm=220, min_cm=8, output=False, savepath='')
Runs multiple combinations of batches (wrapper of single batch function and then combines the processed batches. Postprocess IBD to individal summary dataframe. Return merged IBD dataframe batches: If int: create all possible combinations. Otherwise needs to be array [n,2] of all pairs to run. savepath: If given, save IBD dataframe to there
- ancIBD.IO.batch_run.to_ibd_df_batches(batches=8, folder_out='/n/groups/reich/hringbauer/git/ibd_euro/output/ibd/v1/', chs=range(1, 23), min_cms=[8, 12, 16, 20], snp_cm=220, min_cm=8, output=False, savepath='')
Runs multiple combinations of batches (wrapper of single batch function and then combines the processed batches. Postprocess IBD to individal summary dataframe. Return merged IBD dataframe batches: If int: create all possible combinations. Otherwise needs to be array [n,2] of all pairs to run. savepath: If given, save IBD dataframe to there
- ancIBD.IO.batch_run.print_runid_missing(b=1, folder_out='', output=False)
Finds and prints indices of missing output (chXX.tsv) for batchwise runs. Return list of missing indices. Ideal for rerunning batch scripts. Uses C Indexing as would be used in submission script.
- ancIBD.IO.batch_run.find_output_missing(metapath='', folder_out='', batch_size=400, rge=[10, 20])
Return List of all run nr.s that are missing. metapath: Path to .tsv of IIDs run for IBD screening [str] folder_out: Output folder [str] batch_size: How many individuals have been run per batch [int]
- ancIBD.IO.batch_run.get_batch_nr(n_iids, batchsize=400, n_chr=22)
Get the number of jobs to submit to cluster. Return n [int]