INTRODUCTION

Fig. 1. A map of the South African range of the southern African population of Pacific sardine, Sardinops sagax.
The map shows sites at which sardines were caught for genome and transcriptome sequencing. Colors represent mean sea surface temperatures (SSTs). The coastline was divided into five temperature-defined geographical regions (temperate core range: W, west; SW, southwest; S, south; SE, southeast; sardine run: E, east). Cape Agulhas is the boundary between the Atlantic and Indian oceans. The broken line represents the edge of the continental shelf, beyond which the sardines rarely disperse. The black and white arrows represent the approximate path of the Agulhas Current, which transports tropical Indian Ocean water southward and confines sardines participating in the run (blue arrows) to a narrow coastal band of cooler water.
DISCUSSION

Fig. 4. Stock structure of Pacific sardine, S. sagax, in South African waters and sequence of events that results in a sardine run.
MATERIALS AND METHODS
Sampling design
Population structure in the South African population of Pacific sardine, S. sagax, was assessed using genomic and transcriptomic data from sardines collected throughout the species’ core range and from three sardine runs. Genomic data were generated from tissue samples obtained from 284 sardines collected at 40 locations throughout the species’ range (table S2). To confirm the genomic findings independently, transcriptome data were generated using RNA extracted from liver tissue of 20 individuals collected at nine locations (table S2). Sardines from the west coast to the southeast coast were collected as part of pelagic research surveys conducted in autumn/winter (May and June) and spring/summer (October and November) in 2014 and 2015. All of these were used to generate genomic data, with a subset of 14 individuals from seven sites also used to generate transcriptome data. East coast samples for genomic analyses were obtained from artisanal fishers during the 2015 and 2018 sardine runs. As all these samples failed the quality screening for RNA sequencing, we obtained six additional samples from the 2019 sardine run that were not used for genomic analyses. For each sardine run, samples originated from two different locations that were sampled at different times (table S2).
Genome assembly
Transcriptome assembly
The transcriptome data were generated from RNA using sardine livers that were cut into small pieces using a sterile scalpel blade and stored in RNAlater solution (Thermo Fisher Scientific). Although RNA sequence data are often used for gene expression analyses, we considered this approach unsuitable because the sardine livers could not be preserved under controlled conditions. For example, while some sardine livers from the pelagic surveys were preserved immediately, some of the sardine run fish had died at least an hour before preservation. Total RNA was extracted from each liver sample using a combination of mechanical homogenization with TRIzol and QIAGEN RNeasy purification kit (QIAGEN, Hilden, Germany). Then, cDNA libraries were constructed from each extraction, indexed separately, and sequenced on an Illumina HiSeq 4000 platform (Illumina Inc., San Diego, USA) following the manufacturer’s instructions for 2 × 150 paired-end chemistry.
Identification of candidate and selectively neutral SNPs
Datasets of selectively neutral loci were created by relaxing conditions for detection of candidate loci (BayeScan: PO = 1, FDR = 0.05; gINLAnd: logBF = 0.25, i.e., “hardly worth mentioning”) and removing the resulting larger number of candidate loci from the complete dataset. The final genomic dataset comprised 8296 loci, of which 63 and 198 loci were identified as candidate loci using gINLAnd and BayeScan, respectively. Eleven loci that were identified by both methods were used in subsequent analyses. The transcriptome dataset comprised 14,973 loci, 234 of which were identified as candidate loci. Using more relaxed conditions for detection of candidate loci to be removed from the genomic dataset to create a dataset of neutral loci, 85 and 495 loci were identified using gINLAnd and BayeScan, respectively. Of these, 26 were shared, resulting in a total of 554 unique loci that were removed from the complete dataset, with 7742 loci remaining. A total of 1960 putative candidate loci were removed from the transcriptome dataset to create a dataset of neutral loci, but as this exceeded the column limit of GenAlEx, the dataset was further reduced to 8191 loci (corresponding to 16,384 columns).
Assessment of population structure
Acknowledgments
Read more here: Source link