December 3, 2024

Inferring language dispersal patterns with velocity field estimation

Yang et al. (2024) introduce the Language Velocity Field (LVF) estimation as a novel computational approach to model language dispersal patterns without relying on phylogenetic trees. This method incorporates horizontal contact such as borrowing and areal diffusion, addressing the limitations of traditional phylogeographic approaches that focus solely on vertical divergence. The LVF creates velocity fields to represent the spatiotemporal dynamics of linguistic evolution and dispersal.

The study validates LVF's effectiveness and robustness through simulations and applies it to four major agricultural language families: Indo-European, Sino-Tibetan, Bantu, and Arawak. The inferred dispersal trajectories align with known migration patterns derived from genetic and archaeological evidence. For instance, the Indo-European dispersal is traced to the Fertile Crescent, supporting the Anatolia hypothesis, while Sino-Tibetan languages are linked to the upper Yellow River plains, corroborating the Northern origin hypothesis.

By accommodating both vertical and horizontal linguistic changes, LVF provides a more comprehensive representation of language evolution. The approach is particularly advantageous for analyzing language families where traditional tree-based models fail to account for complexities like convergence and contact-induced changes. LVF also demonstrates flexibility across diverse linguistic datasets and scenarios.

This study highlights LVF's potential as a powerful tool for reconstructing language dispersal patterns and fostering interdisciplinary research into human history, encompassing linguistics, genetics, and archaeology.

Yang et al. (2024) [Z25]

The freshwater reservoir effect in radiocarbon dating

Philippsen (2013) explores the freshwater reservoir effect (FRE) in radiocarbon dating, which can yield anomalously old ages for samples from freshwater systems. The study highlights that the FRE occurs due to dissolved ancient carbonates, commonly termed the "hardwater effect," and is less acknowledged than the marine reservoir effect despite being equally significant.

Analyzing samples from Northern Germany's Alster and Trave rivers, the study finds age discrepancies of up to 2,000 radiocarbon years within a single river system. Archaeological evidence from Mesolithic pottery suggests that FRE significantly influenced the dating of early inland sites. In estuarine environments, such as Denmark's Limfjord, FRE varied between 250 and 700 radiocarbon years during 5400 BC to AD 700, further complicating chronological reconstructions.

Philippsen underscores that FRE introduces substantial variability depending on local geology, hydrology, and environmental conditions. The study recommends integrating FRE corrections into radiocarbon dating, especially for samples from freshwater or mixed aquatic sources, to improve the accuracy of archaeological and environmental timelines.

Philippsen et al. (2013) [P16]

Mapache: A flexible pipeline to map ancient DNA

Neuenschwander et al. (2023) present Mapache, a robust and scalable bioinformatics pipeline designed for mapping and analyzing ancient and modern DNA data. Implemented in the workflow manager Snakemake, Mapache addresses challenges such as low DNA quality, contamination, and storage inefficiency. Its flexible configuration allows researchers to map sequencing reads to one or multiple reference genomes while efficiently managing intermediate files to minimize storage usage.

Mapache outperforms other tools like PALEOMIX and nf-core/eager by requiring significantly less storage and runtime. For example, it processes datasets with 167 million reads in 3.7 GB of storage compared to 44 GB for nf-core/eager. Its modular design supports optional imputation of low-coverage genomes using GLIMPSE, enhancing its versatility for ancient genome studies.

The pipeline generates comprehensive outputs, including BAM files, mapping statistics, and detailed reports with tables and graphs. Researchers can adapt Mapache to diverse computational infrastructures and analytical requirements by customizing parameters. Additionally, its ability to handle high-throughput datasets and cluster compatibility ensures scalability for large-scale projects.

Mapache's design emphasizes reproducibility, scalability, and efficiency, making it a valuable tool for genomic research involving ancient DNA. It facilitates mapping processes while reducing computational costs, ensuring its utility in advancing studies on human and evolutionary genetics.

Neuenschwander et al. (2023) [N44]

Needs for a conceptual bridge between biological domestication and early food globalization

Liu and Jones (2024) call for a unified framework to connect the protracted processes of biological domestication with the multiregional, globally dispersed nature of early food globalization. They argue that domestication has often been oversimplified as a rapid, localized event, whereas evidence shows it occurred over millennia, intertwined with human cultural practices and environmental shifts. Reviewing archaeological and genetic studies, the authors highlight how domesticated plants and animals underwent evolutionary changes as they adapted to novel environments during their global dispersal. Key processes, such as gene flow between wild and domesticated populations and the development of human dependencies like irrigation and pest control, shaped domestication outcomes.

Examples from barley, wheat, and maize illustrate how crop movement prompted adaptations, such as changes in flowering times and grain size. The authors emphasize the role of culinary traditions in shaping domestication pathways, contrasting East Asian boiling and steaming cuisines with Western grinding and baking practices. These cultural contexts influenced phenotypic traits, selection pressures, and agricultural strategies.

The paper reframes domestication as a continuous, multiregional process, deeply tied to human agency and environmental interactions. Liu and Jones argue that early food globalization was not a secondary outcome but an integral part of the domestication process itself, with significant implications for understanding human history and addressing contemporary challenges in food security.

Liu and Jones (2024) [L115]

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated

Elhaik et al. (2022) critically examine the use of Principal Component Analysis (PCA) in population genetics, highlighting biases and reproducibility issues. PCA, a tool for dimensionality reduction, is extensively used in population genetics for visualizing genetic data, inferring ancestry, and drawing conclusions about historical and biological relationships. However, the authors argue that PCA results can be heavily influenced by dataset composition, parameter choices, and a priori assumptions, often leading to contradictory or erroneous interpretations.

The study evaluates PCA's reliability using color-based and human genetic datasets, revealing that PCA outcomes can be artifacts of data selection and are susceptible to manipulation. Scenarios are presented where PCA produced conflicting results, undermining its validity as a tool for inferring genetic and historical relationships. Furthermore, the authors question the widespread reliance on PCA in over 30,000 studies, urging a reevaluation of findings based on this method.

The paper proposes that PCA-based insights should be reconsidered in light of these limitations, emphasizing the need for alternative, less bias-prone analytical approaches. This critique challenges the foundational assumptions of population genetics and related fields that heavily depend on PCA-derived conclusions.

Elhaik et al. (2022) [E66]

Inferring the Geographic Mode of Speciation by Contrasting Autosomal and Sex-Linked Genetic Diversity

Chu et al. (2013) investigated geographic modes of speciation by comparing autosomal and sex-linked genetic diversity. Using an Approximate Bayesian Computation (ABC) framework, the authors contrasted allopatric speciation (no post-divergence gene flow) with an isolation-with-migration model (gene flow post-divergence). They applied this to two rosefinch species, Carpodacus vinaceus and Carpodacus formosanus.

Their findings strongly supported an allopatric speciation model, with divergence occurring approximately 0.5 million years ago. Despite the allopatric model's predominance, sex-biased genetic patterns were detected, with female effective population sizes estimated to be five times larger than males. This aligns with higher variance in male reproductive success, possibly due to strong sexual selection. No evidence was found for earlier isolation in Z-linked loci compared to autosomal loci, suggesting limited role for ecological speciation.

The study highlights the importance of integrating genomic data and innovative computational tools for understanding speciation processes, particularly in distinguishing geographic and ecological factors driving divergence.

Chu et al. (2013) [C148]

Modelling the demographic history of human North African genomes points to a recent soft split divergence between populations

Serradell et al. (2024) examined the demographic history of North African populations using innovative computational approaches. Employing Approximate Bayesian Computation with Deep Learning (ABC-DL) and a novel Genetic Programming for Population Genetics (GP4PG) algorithm, the study modeled the genetic diversity and demographic history of Amazigh and Arab populations. The analyses identified distinct origins for these groups, with Amazigh populations tracing back to the Epipaleolithic era, while Arab genetic ancestry was largely shaped by the Arabization process.

GP4PG proved more accurate than ABC-DL by incorporating population substructure and continuous gene flow, revealing that genetic diversity in North Africa arises primarily from gradual migration decay rather than discrete admixture events. The findings support a back-to-Africa origin for both groups, with Amazigh divergence from Eurasian populations occurring around 22,300 years ago, predating Arab divergence from Middle Eastern populations (~1,600 years ago). This comprehensive model highlights the importance of soft population splits and migration in shaping the genetic landscape of North Africa.

Serradell et al. (2024) [S227]

Neutral genomic regions refine models of recent rapid human population growth

Gazave et al. (2024) analyzed neutral genomic regions to provide clearer insights into recent rapid human population growth. Avoiding confounding factors like natural selection and population structure, the study sequenced genomic loci distant from coding regions in 500 individuals of European ancestry. Using a high-quality dataset of rare variants, they modeled recent demographic history through site frequency spectrum analyses. Their findings estimate a population growth rate of ~3.4% per generation over the last 3,000-4,000 years, resulting in a population size increase of two orders of magnitude. By addressing assumptions of ancient demography, they reconciled discrepancies among prior studies and revealed the importance of using neutral loci in demographic models.

Gazave et al. (2024) [G75]

The Effect of Recent Admixture on Inference of Ancient Human Population History

Lohmueller et al. (2010) investigate how recent admixture affects genetic inferences about ancient human population history. Focusing on African-American populations, which are about 80% African and 20% European in ancestry, they explore the implications for demographic studies using site frequency spectrum (SFS) and haplotype-based approaches. Simulations and analyses of SNP data reveal that SFS-based methods can estimate population growth parameters relatively accurately in admixed populations, though they are less sensitive to recent admixture. In contrast, haplotype-based methods are significantly influenced by admixture, often producing biased estimates. The authors analyze SNP data from Yoruba and African-American populations, finding growth parameter estimates to be similar when using SFS but divergent with haplotype-based methods. These findings underscore the need to account for admixture explicitly in haplotype analyses to avoid erroneous conclusions. The study has important implications for interpreting genetic studies of admixed populations and their relevance to unadmixed ancestral groups.

Lohmueller et al. (2010) [L114]

November 14, 2024

Reproductive proportion of a human population

In studies on human populations, data on fertility, sex ratio, etc., are not always readily available. Therefore, a large number of workers have estimated N, between N/4 and N/2 (usually, as N/3) (Cavalli-Sforza and Bodmer, 1971; Crow and Morton, 1955: Eriksson, Fellman, Workman, and Lalonel, 1973a; Felsenstein, 1971 ; Harpending and Jenkins, 1974; Imaizumi, Morton, and Harris 1970; Morton, 1969; Morton, Smith, Hill, Frackiewicz, Lew, and Yee, 1976; Pollock, Lalouel, and Morton 1972; Roberts, 1975 Salzano 1971; Smith, 1969; Cavalli- Sforza (1976) In most contemporary populations, there are "three generations: a prereproduct1ve a reproductive, and a postreproductive age group, each comprising roughly one-third of the population. Since N, is intended to deal with the reproductive portion of the population, its value will be about 1/3. Of course, age structure will greatly affect the validity of this approach, as Langaney, Gessain, and Robert (1974) demonstrate empirically. Further demonstration of this effect has been given by Cavalli-Sforza (1958), who calculated the proportion of the "genetically active" population of the Parma Valley to be 0.88. Estimates of N. have in fact varied from 0.12 (Morton and Lalouel, 1973b) to 1.67 (Kimura and Crow, 1963). Nei (1970:695) comments that ""the N,/N ratio in human populations is never constant either spatially or temporally." Thus, estimation of N. should be made with caution, since arbitrary designations may be seriously in error.

The Genetic Structure of Subdivided Human Populations, Jorde CurrentĀ Developments in Anthropological Genetics 1980 [J35]