ancIBD.IO.prepare_h5
Functions to prepare HDF5 file from imputed VCFs @ Author: Harald Ringbauer, 2021
Module Contents
Functions
|
Save all 1240 Markers of .snp eigenstrat file. |
|
Save all 1240 and 1000G Markers of .snp eigenstrat file. |
|
Same as PLINK, but with bcftools and directly via Marker Positions. |
|
Same as PLINK, but with bcftools and directly via Marker Positions. |
|
Merges Set of VCFs into one VCF. |
|
Convert Ali's vcf to 1240K hdf5. |
- ancIBD.IO.prepare_h5.save_1240kmarkers(snp1240k_path='', marker_path='', ch=0)
Save all 1240 Markers of .snp eigenstrat file. to marker_path. ch: Chromosome. If null filter all of them
- ancIBD.IO.prepare_h5.save_1240_1000g_kmarkers(ch=3, snp_path='', marker_path='')
Save all 1240 and 1000G Markers of .snp eigenstrat file. to marker_path. Loads Ali Path file snp_path: Where to find the SNPs plus their types
- ancIBD.IO.prepare_h5.bctools_filter_vcf(in_vcf_path='', out_vcf_path='', marker_path='')
Same as PLINK, but with bcftools and directly via Marker Positions. filter_iids: Whether to use the .csv with Indivdiduals to extract. Check whether out_vcf_path has .gz or .vcf at end and compresses for former
- ancIBD.IO.prepare_h5.bctools_filter_vcf_allvariants(in_vcf_path='', out_vcf_path='', marker_path='')
Same as PLINK, but with bcftools and directly via Marker Positions. filter_iids: Whether to use the .csv with Indivdiduals to extract
- ancIBD.IO.prepare_h5.merge_vcfs(in_vcf_paths=[], out_vcf_path='')
Merges Set of VCFs into one VCF. in_vcf_paths: List of VCFs to merge out_vcf_path: Output of VCF
- ancIBD.IO.prepare_h5.vcf_to_1240K_hdf(in_vcf_path='/n/groups/reich/ali/WholeGenomeImputation/imputed/v43.4/chr3.bcf', path_vcf='./data/vcf/1240k_v43/ch3.vcf.gz', path_h5='./data/hdf5/1240k_v43/ch3.h5', marker_path='./data/filters/ho_snps_bcftools_ch3.csv', map_path='/n/groups/reich/DAVID/V43/V43.5/v43.5.snp', af_path='', col_sample_af='AF_SAMPLE', chunk_length=10000, chunk_width=8, buffer_size=20000, ch=3)
Convert Ali’s vcf to 1240K hdf5.
- Parameters:
in_vcf_path (str) – Input VCF file (i.e, output from GLIMPSE)
path_vcf (str) – A filtered vcf of in_vcf_path that contains only 1240k sites.
path_h5 (str) – Path of the output HDF5 files
marker_path (str) – Path to file containing SNPs to downsample. If marker_path empty, no SNP filtering done.
map_path (str) – Path to eigenstrat SNP file containing genetic map. These are merged into a hdf5 field “variants/MAP”. If map_path empty, no genetic map is merged in.
af_path (str) – Path to tab-seperated table containing allele frequencies. There are merged into the hdf5 field “variants/AF_ALL”. If no such path given, no allele frequencies are merged in.
col_sample_af (str:) – The hdf5 column name for the allele frequency calculated from sample. If left empty, no such column will be calculated or added.