A total of 1,456 individuals participated in the variant detection, and the average depth of individual sequencing was more than 20X. love2 was selected as the reference genome, and the sequencing data were aligned to the reference genome using BWA, and duplicates were removed using SAMTOOLS. Subsequently, GATK 4.5 software was used to detect individual variants in the samples and merged to generate VCF files.

(1) SNP Filtering Conditions (SNP.vcf file download)

The SNP files were filtered by the following conditions, and 1,464,428 high-quality loci were finally obtained:

  • SNP loci with quality value lower than 30 were filtered out;
  • The minimum depth value of variant loci was greater than 4;
  • The proportion of deletion genotypes of variant loci in resequencing samples was less than 10%;
  • The minimum allele frequency (MAF) of variant loci was greater than 0.05.

(2) InDel Filtering Conditions (Indel.vcf file download)

The InDel file was filtered by the following conditions, and 141,985 high-quality loci were obtained:

  • InDel loci with quality value lower than 30 were filtered out;
  • The minimum depth value of variant loci was greater than 4;
  • The proportion of deletion genotypes of variant loci in resequenced samples was less than 10%;
  • The minimum allele frequency (MAF) of variant loci was greater than 0.05.

(3) SV Detection

Structural variants (SV) were detected by graphical pangenomics. The main process included:

  • MUMmer for intergenomic comparison and structural variant detection;
  • SyRI for structural variant integration;
  • vg for graphical pangenome construction and structural variant detection at the population level.

A total of 8,000+ structural variant loci were obtained.