Overview

The software package ancIBD detects Identity-by-Descent (IBD) segments in typical human aDNA data, implementing an algorithm described in this preprint. The input data are imputed and phased genotype data. The default parameters of ancIBD are optimized for imputed data using the software GLIMPSE using the 1000 Genome haplotype reference panel.

Scope

ancIBD can be applied to a substantial fraction of the aDNA record but as it relies on imputation is comparably data-hungry. Our tests showed that ancIBD requires

at least 0.25x average coverage depth for whole-genome-sequencing (WGS) data
1.0x depth on dense target SNPs (corresponding broadly to at least 600k SNPs covered for 1240k or TWIST captured aDNA data, two popular SNP captures in human aDNA)

Close to that coverage limit, imputation starts to break down and false positive IBD rates quickly increase. Therefore, inferred IBD segments for data below that coverage limit have to be interpreted with extreme caution, as false positive and error rates become substantial. Generally, the shorter the IBD and the lower the coverage, the less robust the IBD calls. The minimum output IBD length is 8 centimorgan (cM), but we note that already IBD shorter than 12 cM are enriched for false-positive IBD segments. Please always treat the ancIBD output with necessary caution and not as a black box.

ancIBD relies on imputation with a haplotype reference panel. We observed that using the present-day 1000 Genome reference panel results in robust IBD calls for up to several ten-thousands year-old human genomes for global homo sapiens ancient genomes sharing the out-of-Africa bottleneck (i.e. from Eurasia, Oceania, and Americas), however, some Sub-Saharan ancestries can be problematic as they contain deeply diverged haplotypes that are not represented in the 1000 Genome reference panel. Currently, ancIBD is not applicable to other humans such as Neanderthals or Denisovans due to the lack of a suitable haplotype reference panel for imputation.

Preprocessing the data

We recommend imputing ancient data using the software GLIMPSE, imputing ancient samples one by one following the instructions of ` the official GLIMPSE tutorial <https://odelaneau.github.io/GLIMPSE/glimpse1/tutorial_b38.html>`_. The default parameters of ancIBD are optimized for data imputed using the modern 1000 Genome reference panels and all SNPs in this reference panel, and then downsampling to the so-called 1240k SNP set widely used in human ancient DNA. The imputed 1240k SNP VCF needs to contain the phased diploid genotypes (in the GT field) as well as the three genotype probabilities (in the GP field). This imputed VCF is then transformed into a so-called .hdf5 file - which is the input for ancIBD functions to call and visualize IBD.

Example Data

You can find the test aDNA data used to run the tutorials described throughout this documentation here. This Dropbox folder contains GLIMPSE-imputed .vcf files of a subset of samples from early Neolithic Britain that are part of a published extended pedigree (Fowler, Olalde et al. 2021).

Using `ancIBD` via Python functions

One can run ancIBD using Python functions that are imported from the package. We provide example Jupyter notebooks on how to:

Please find those example notebooks and data here You can modify hose functions and embed them into your Python scripts or Jupyter notebooks.

Using `ancIBD` via the command line (available since v0.5)

Alternatively, one can also run ancIBD from the command line. The commands are automatically added during the installation of the Python package. You can find a detailed walk-through in the section Running ancIBD via bash.

Citing

A pre-print that describes ancIBD and several applications is available here:

ancIBD - Screening for identity by descent segments in human ancient DNA

You can cite this article if you use ancIBD for your scientific work.

Contact

If you have bug reports, suggestions, or any general comments please do not hesitate to reach out - we are happy to hear from you! Your suggestions will help us to improve this software.

You can report bugs as an issue on the ancIBD GitHub page

We are also happy to hear from you via email:

harald_ringbauer AT eva mpg de
yilei_huang AT eva mpg de

(Fill in AT with @ and other blanks with dots)

Harald Ringbauer, Yilei Huang, 2023