@shiy05
2023-04-26T23:25:24.000000Z
字数 15162
阅读 118
temp
Reproductive isolation is a fundamental step in speciation. While sex chromosomes have been linked to reproductive isolation in many model systems, including hominids, genetic studies of the contribution of sex chromosome loci to speciation for natural populations are relatively sparse. Natural hybrid zones can help identify genomic regions contributing to reproductive isolation, like hybrid incompatibility loci, since these regions exhibit reduced introgression between parental species. Here, we use a primate hybrid zone (Alouatta palliata × Alouatta pigra) to test for reduced introgression of X-linked SNPs compared to autosomal SNPs. To identify X-linked sequence in A. palliata, we used a sex-biased mapping approach with whole-genome re-sequencing data. We then used genomic cline analysis with reduced-representation sequence data for parental A. palliata and A. pigra individuals and hybrids (n \= 88) to identify regions with non-neutral introgression. We identified ~26 Mb of non-repetitive, putatively X-linked genomic sequence in A. palliata, most of which mapped collinearly to the marmoset and human X chromosomes. We found that X-linked SNPs had reduced introgression and an excess of ancestry from A. palliata as compared to autosomal SNPs. One outlier region with reduced introgression overlaps a previously described “desert” of archaic hominin ancestry on the human X chromosome. These results are consistent with a large role for the X chromosome in speciation across animal taxa and further, suggest shared features in the genomic basis of the evolution of reproductive isolation in primates.
Keywords: speciation, genomic clines, sex chromosomes, gene flow, primates
Recent advances in DNA sequencing have opened the opportunity to explore the genetics of speciation in non-model organisms, allowing for genome-wide identification of loci associated with reproductive isolation and their comparison among divergent lineages. Results from this research corroborate early observations (e.g., Coyne and Orr 1989) that sex chromosomes seem to be important drivers of reproductive isolation in many animal taxa (e.g., birds: Sætre et al. 2003; Carling and Brumfield 2008; Irwin 2018; Drosophila: Presgraves 2008; Mus: Good et al. 2008a, 2008b, 2010; Janoušek et al. 2012; fish: Kitano and Peichel 2012), including humans (Sankararaman et al. 2014, 2016). Some contemporary humans with ancestors from outside of Africa carry as much as ~5% archaic DNA as a result of ancient admixture with Neanderthals and Denisovans (Sankararaman et al. 2014, 2016). However, there are many regions of reduced ancestry from archaic hominins in modern human genomes (i.e., “deserts” of archaic ancestry), which are highly enriched on the X chromosome (Sankararaman et al. 2014, 2016). Further, Neanderthal Y chromosome sequence has not been discovered in contemporary human genomes (Mendez et al. 2016). These findings imply that X and Y chromosomal regions may have been involved in reproductive isolation during the time of hybridization, and thus, archaic ancestry was rapidly purged by selection acting on unfit hybrids, limiting introgression on sex chromosomes. In hybrid flies and mice, loci underlying hybrid male sterility disproportionately map to the X chromosome, consistent with a large role of the X chromosome in speciation (e.g., Presgraves 2008; Turner and Harr 2014).
Because the genomic signature of ancient genetic exchanges is all that is left of our archaic cousins, it is not possible to directly investigate the mechanisms that may have lead to restricted gene flow in certain genomic regions while allowing exchange in others. We address this here by identifying genomic regions that are consistent with the signature of reproductive isolation (i.e., “barrier loci”) between 2 contemporary primate species that form a natural hybrid zone and ask whether deserts of archaic ancestry on the human X chromosome are unique to hominids or shared with other primates. Natural hybrid zones offer unique opportunities to identify barrier loci because reproductive isolation between the parental taxa is still incomplete. In hybrid zones, novel combinations of alleles in hybrids are tested by selection (Dobzhanksy 1936; Muller 1942). Recombination shuffles composite genomes with many generations of backcrossing, and loci that underlie reproductive isolation are expected to have restricted introgression as a result of selection against unfit hybrids carrying incompatible alleles, while neutral and adaptive alleles are free to introgress between species (Barton and Hewitt 1985; Gompert and Buerkle 2011).
The howler monkey hybrid zone (Alouatta palliata × Alouatta pigra) in Mexico (Cortés-Ortiz et al. 2007) has recently developed into a model for comparison of natural populations to laboratory-derived and other traditional model systems (e.g., Mus, Homo) that have been the subject of intensive speciation genomics research. A genome assembly was recently developed for A. palliata, opening the possibility for detailed investigation of fine-scale variation contributing to reproductive isolation and other evolutionary processes. Further, the Alouatta system is the first contemporary primate system in which the genomic architecture of reproductive isolation has been investigated (Baiz et al. 2019; Cortés-Ortiz et al. 2019), thus due to its relatively close phylogenetic proximity, it serves as a model that may yield insight to processes responsible for generating similar patterns observed in human genomes.
Alouatta palliata and A. pigra diverged ~3 MA (Cortés-Ortiz et al. 2003), and the contact zone is likely the result of secondary contact after periods of isolation and expansion (Cortés-Ortiz et al. 2003; Ellsworth and Hoelzer 2006; Ford 2006). We previously analyzed introgression in this system with a limited number of loci and found differential introgression for autosomal markers (i.e., some had reduced introgression, others had neutral, or directional introgression), but markers on the X and Y chromosomes had restricted introgression (Cortés-Ortiz et al. 2019). This observation is consistent with a role for the sex chromosomes in reproductive isolation in this system.
Due to the limited number of markers analyzed in the previous study, we were unable to determine the extent of reduced introgression across the X chromosome. Here, we used comparative whole-genome re-sequencing to identify X chromosome sequence in the A. palliata genome assembly. We then used genomic cline analysis with population-level reduced-representation genome sequence data to explicitly test for reduced introgression of X-linked compared to autosomal SNPs and to quantify the pattern of introgression along the X chromosome. We compare our results to signatures of archaic introgression in the human genome to ask whether there is a shared component of the genomic architecture of speciation in primates.
Since the A. palliata genome assembly is a draft de novo assembly for which contigs have yet to be assigned to chromosomes, we first performed a mapping experiment using whole-genome re-sequencing data from 2 males and 2 females to identify X-linked contigs. To do this, we analyzed differences in contig-specific sequencing coverage between female and male A. palliata individuals with the assumption that X-linked contigs will show significantly greater coverage in females than males because, in XY species, females have 2 copies of the X chromosome and males have 1 copy (we refer to these as “female-biased” contigs). In comparison, autosomal contigs should have equal coverage in males and females (“unbiased” contigs). This method has been used in several study systems to confidently identify sex chromosome sequences (e.g., Chen et al. 2012; Carvalho and Clark 2013; Gamble et al. 2015; Vicoso and Bachtrog 2015; Bracewell et al. 2017; Mongue et al. 2017). Supplementary Figure S1 illustrates an overview of our strategy.
Between 2001 and 2012, we obtained blood samples from 4 wild A. palliata individuals (2 males and 2 females) sampled in Veracruz, Mexico (Supplementary Table S1) and stored them in lysis buffer at −80 °C. We extracted genomic DNA using the QIAGEN DNeasy tissue kit (Qiagen Inc., Valencia, CA) as described in Baiz et al. (2019). Sex was determined in the field by visual assessment and verified using genetic data by amplifying known genes on the sex chromosomes (following Cortés-Ortiz et al. 2019). Specifically, we amplified the Y-linked SRY locus to verify the presence of a PCR product for males and absence of a PCR product for females and genotyped an X-linked microsatellite locus to verify the hemizygous genotype for males (Supplementary Table S1).
Our libraries for whole-genome sequencing were generated and sequenced by the Advanced Genomics Core at the University of Michigan. For each sample, libraries were constructed using the Swift Biosciences PCR-Free DNA Library Kit with a target insert size of 350bp following the manufacturer’s protocol. Libraries were multiplexed and sequenced in a single lane using Illumina HiSeq 4000 to obtain 150bp paired-end reads.
We obtained between ~54 M–123 M reads per individual from the sequencer (Supplementary Table S1). We used Trim Galore! v0.4.2 (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) to trim adapters and low-quality bases (Q < 20) from raw reads and retained only read-pairs where each read was ≥100bp in length after trimming.
To avoid false identification of X-linked contigs due to differences in sequencing coverage among males and females, we subsampled our post-trimmed fastq files to standardize the number of reads across individuals using seqtk v1.2 (https://github.com/lh3/seqtk) before mapping by randomly sampling 50 million read pairs per individual. We used these subsampled files as input for sequence alignment.
Because short read data can potentially map to multiple genomic locations due to the expansion of repetitive sequence, we generated a repeat-masked version of the reference A. palliata genome assembly (PVKV00000000) in order to map reads to unique sequences. To do this, we used RepeatMasker v4.0.6 (Smit et al. 2013) on A. palliata contigs 1 kb and larger (N \= 96 654, 87% of the assembly sequence) using the “primates” RepeatMasker repeat library (i.e., a library of known primate repeats) to perform a low-sensitivity search (option –qq).
We then used bwa-MEM (Li 2013, https://arxiv.org/abs/1303.3997, last accessed 21 May 2019) to align the subsampled reads of our 4 A. palliata individuals to the repeat-masked A. palliata genome assembly. For each individual, we used SAMtools idxsats (Li et al. 2009) to count the number of reads mapped to each masked contig. We then used exact tests in edgeR (Robinson et al. 2010) to detect contigs for which there was a significant difference in read counts between the sexes, where X-linked contigs are expected to have an average male-to-female log2 fold-change (log2FC) of −1 (due to the presence of 2 X chromosomes in females and only 1 in males) and autosomal contigs are expected to have a log2FC of 0. For this analysis, we excluded contigs with low counts across samples as they provide limited power to detect significant differences between groups. Thus, we only retained contigs where 2 or more individuals had a count per million (CPM) >0.2, corresponding to ~15 reads (N \= 78,493 contigs).
We aligned A. palliata repeat-masked contigs for which we detected sex-differences in read count to the marmoset (CalJac3) and human (hg38) genome assemblies to identify orthologous regions. We used these results as a second line of evidence for X-linkage of female-biased contigs since X chromosome sequence is highly conserved across mammals. To do this, we downloaded the masked version of each assembly from the University of California Santa Cruz Genome Browser and used a custom script (available at https://github.com/baizm/Xchr_introgression) to remove scaffolds that have not yet been assigned to any chromosome (i.e., sequences with a header containing “chrUn”). We then used lastz v.1.04.00 (a program designed for efficient alignment of long genomic sequences, Harris 2007) to align the repeat-masked A. palliata contigs to each repeat-masked assembly, requiring at least 50% of the query to be included in the alignment block (--coverage = 50) and using a distance of 20bp between potential seeds (--step = 20). To assess the ability of our method to detect X-linked sequence, we also aligned a set of 2,288 unbiased, likely autosomal contigs for comparison (i.e., the same as the number of female-biased contigs), randomly chosen from the list of contigs that did not have a significant difference in read counts between the sexes.
For female-, male-, and unbiased A. palliata contigs, a larger proportion aligned to the marmoset genome (>55% for each contig type) than to the human genome (Supplementary Figure S2). For male-biased and unbiased A. palliata contigs, the proportion that aligned to autosomes (~94–99%) or sex chromosomes (~1–5%) was similar for the marmoset and human genome (Supplementary Figure S2). Further, the majority (74%) of the female-biased contigs that aligned to the human X chromosome also aligned to the marmoset X chromosome. This pattern is not surprising, given that the divergence time between Alouatta and Homo is greater than between Alouatta and Callithrix (Perelman et al. 2011). Thus, we considered female-biased contigs that aligned to the marmoset X chromosome to be X-linked for Alouatta in this study.
We used qPCR to test for the expected 2-fold higher amplification of putative X-linked regions in female individuals compared to male individuals. We randomly selected 5 putative X-linked contigs and 1 putative autosomal contig to serve as a normalizing sequence assuming 1:1 amplification between the sexes. The 5 putative X-linked contigs were selected from A. palliata female-biased contigs that mapped to the marmoset X chromosome. The putative autosomal contig was randomly selected from A. palliata unbiased contigs that mapped to a marmoset autosome. The putative X-linked contigs mapped to positions that spanned the length of the marmoset X chromosome (between X:6.6 and X:104 Mb) and the putative autosomal contig mapped to marmoset chromosome 1 (Supplementary Table S2). Details of our qPCR strategy are outlined in Supplementary Methods.
We performed reduced-representation sequencing on a geographically broad sample of wild individuals from the allopatric ranges of A. palliata and A. pigra and from the hybrid zone (see Baiz et al. 2019 for details) (Figure 1) and mapped sequence reads to the non-masked A. palliata assembly to generate genotype data for genomic cline analysis to test for reduced introgression of the X chromosome. Because the X chromosome is hemizygous in males and X-linked SNPs will appear to be homozygous, biasing genomic cline estimates, we only included sequence data from female individuals in genomic cline analyses to avoid bias in our X chromosome-autosome comparison of differential introgression. This included 88 female individuals, 48 of which were sampled from the hybrid zone in Tabasco, Mexico, 17 from the allopatric range of A. palliata and 23 from the allopatric range of A. pigra (Figure 1), which we included in our genomic cline analysis. All allopatric individuals included here have been previously shown to be non-admixed (Baiz et al. 2019). We used double digest restriction site-associated DNA sequencing (ddRADseq, Peterson et al. 2012) to generate reduced-representation genome sequence data for these individuals, as described in Baiz et al. (2019).
!An external file that holds a picture, illustration, etc.
Object name is esaa021_fig1.jpg](https://www.ncbi.nlm.nih.gov/core/lw/2.0/html/tileshop_pmc/tileshop_pmc_inline.html?title=Click%20on%20image%20to%20zoom&p=PMC3&id=7525826_esaa021_fig1.jpg)
Map of sampling sites used in this study. The distribution ranges of Alouatta palliata and Alouatta pigra are in light gray and dark gray, respectively (downloaded and modified from IUCN, http://www.iucnredlist.org). Circles represent sampled localities where only one species occurs (i.e., non-admixed populations) and triangles represent sampled localities from the hybrid zone, where admixed individuals occur.
We retained only biallelic SNPs with a minor allele frequency of at least 0.05, a minimum mean read depth of 12 across all individuals, and sites present in 14 or more individuals in either parental population. To reduce the effects of linkage among markers, and because previous analyses indicated that SNPs on the same Alouatta contig generally show consistent patterns of introgression (Baiz et al. 2019), we retained only 1 SNP at random per contig. This resulted in 10 353 SNPs used in the genomic cline analysis. The combined length of the Alouatta contigs containing SNPs used in our analysis represents ~39% of the A. palliata genome assembly. We considered X-linked SNPs to be those on female-biased contigs that mapped to the marmoset X chromosome (*N*SNPs \= 97) and autosomal SNPs to be those on contigs that had no significant difference in read counts between the sexes (*N*SNPs \= 10 256). The set of X-linked and autosomal SNPs represent approximately equal genotyping densities on female-biased (1.9 × 10–5 SNPs/Mb) and unbiased contigs (8.4 × 10–6 SNPs/Mb). All filtering steps were carried out using bcftools v.1.3.1, vcftools 0.1.14 (Danecek et al. 2011), and custom scripts (available at https://github.com/baizm/Xchr_introgression).