The two-dimensional nature of many pinned Lepidoptera specimens allows us to omit some technical considerations, such as the variable incident angles that need to be carefully considered when imaging 3D objects, and our multi-spectral imaging rig with its custom designed platform can image both the dorsal and ventral sides of specimens (Figs. 1a, 2 and 3). The initial descriptive data can be used for general multispectral property exploration and museum specimen digitization (Fig. 1b); the processed data, which are built on the initial descriptive data, can be used to investigate multispectral colors. After fore- and hindwing reconstruction (Fig. 1c), the design of the universally applicable analytical framework (Figs. 1d and 4a) can objectively quantify wing tails and accommodate different wing shapes and venation systems (Figs. 1e and 4b). The framework can also be applied to systematically survey multispectral color patterns (Figs. 1g, h and 5) while providing basic morphological measurements (Fig. 1f).
The imaging system design
Our high-throughput multispectral imaging system represents a compromise between the speed of traditional imaging and the need for objective spectral data. The system consists of a high-resolution SLR camera (Nikon D800) with its internal UV-IR filter removed to allow for UV–visible-IR imaging, fitted with a 28–80 mm f/3.3–5.6 G Autofocus Nikkor Zoom Lens (Methods). The customized imaging platform was designed to accommodate both ends of a pin, so mounted specimens can be positioned on the platform either dorsally or ventrally (Methods; Fig. 2). A reference bar containing black and white standard references and a scale bar is attached to the imaging platform in each round of imaging by a hook-and-loop fastener (Methods; Fig. 6a). The rough cost excluding the computational cost is ~$4500 (Supplementary Information).
For each set of specimens, a series of seven images (hereafter referred to as “drawer images”; Fig. 3a) are taken in raw format (*NEF) over the course of two minutes. These seven drawer images correspond to the following spectral imaging ranges (with details about light settings included in Methods): UV-only (λ = 360 nm; reflected light filtered through a Hoya U-340 UV-pass filter on the camera; combined UV reflectance and unfiltered visible fluorescence (called UVF hereafter) comprised of both reflected UV and all UV-induced visible fluorescence; two near-IR bands (unfiltered reflected light from λ = 740 nm [NIR] and 940 nm [fNIR] LEDs); and three in the visible (reflected broadband white LED light, λ = 400–700 nm, one unfiltered RGB and two RGB images filtered by linear polarizers at orthogonal angles to detect polarization along that axis), which are later decomposed into red, green and blue channels. Up to thirty-five pinned specimens can be imaged simultaneously, depending on their sizes, with wing sides facing either dorsally or ventrally.
Drawer image processing
All raw (multispectral) drawer images are uploaded to a high-performance computing environment, where we have developed a pipeline to process images automatically. However, a small number of images (<5) can be processed on a desktop with reasonable resources and longer runtime (Methods). To preserve the color gradient of the specimens, images are first converted into linearized 16-bit TIFF format30 by dcraw31 (an open-source program for handling raw image formats). These 16-bit TIFF images are then analyzed using MATLAB scripts. A set of seven drawer images is considered one computing unit, and the same group of specimens in the dorsal unit has a corresponding ventral unit (Methods).
Each computing unit (Fig. 3a) is read into memory, and the standard black and white references are recognized on the white (regular RGB) image by their circular shapes (Fig. 3b). Rather than calculating the exact number of absorbing photons at each sensor13,27, we employ the remote sensing technique32 of converting all pixel values into reflectance (albedo) units (between 0 and 1) according to the black and white reference standards (Methods; Fig. 3c). The scale on the drawer image is recognized automatically by local feature-matching to a reference image of the same scale, and the number of pixels per centimeter is derived (Methods; Fig. 3b).
Specimen pinning variability and optical aberration of the camera lens system would introduce measurement error during the imaging process. We estimate this error range in length measurement to be less than 0.4% (or 0.16 mm of a 4 cm butterfly (Methods; Fig. 7). Even though the error is minute, we leave a clear 5 cm margin around the edges of the stage when specimens are imaged in order to avoid relatively high aberration in the vicinity of the image boundaries (Fig. 7d).
Post-processing is applied to the UV, NIR (740 nm), fNIR (940 nm), and UVF bands to account for the differential sensor sensitivity to these wavelengths in the red, green, and blue channels (Methods; Fig. 3c), except for the RGB-white band, which does not require post-processing. An index of polarization is calculated as the absolute difference between the two orthogonally polarized RGB white images. This single measure of polarization can also provide an indication of the occurrence of structure-induced colorations, suggesting whether additional studies should be carried out to investigate polarization at other viewing or incident light angles33.
Extracting specimen images from drawer images
Our preliminary observations showed that Lepidoptera have the highest contrast with the background in the fNIR (940 nm) band, so we exploited this property to help recognize and extract individual specimen images from drawer images. (Methods; Fig. 3d). Each specimen’s multi-band images were aligned into a layered image stack (Fig. 8a, b) based on affine geometric transformations, such as translation, rotation, scale, and shear. This step is relatively time-consuming, and processing time roughly scales with specimen size. At this stage, the registered multi-band specimen image stack, our initial descriptive data, can either be archived as part of a specimen’s extended data or further transformed by our pipeline into higher-level processed data that produce shape, color and pattern trait data. For convenience, we included an additional binary mask layer with the information needed for background removal (Methods).
The completed initial descriptive data contain registered multi-band images (including UV, blue, green, red, NIR, fNIR, fluorescence [RGB], and polarization [RGB]), a background removal mask, and the scale bar) (Fig. 3a and 8a, b). Although further analysis is required to extract specific trait data from these datasets, they can be powerful visual aids in the discovery of novel wing scale types and structures. For example, the orange patches at the forewing tips of Hebomoia glaucippe (L.) show strong UV reflectance14,15 (Figs. 3c and 8c), but the orange patches on Chrysoritis pyramus (Pennington) do not, suggesting a difference in the underlying physical mechanism producing these colors. Similarly, the white background on Hebomoia glaucippe shows little UV reflectance14, but the white patches on Arhopala wildei Miskin (and many other species with white patches) show significant UV reflectance. (Fig. 8c). With a suitable converter, these initial descriptive data can be used in software packages27,29 for analyses that take into account a range of animal visual systems. There is immense potential for discovery of multispectral phenomena currently hidden within museum collections over the world for centuries by using these initial descriptive data alone.
A series of more complex analytical pipelines were designed to further quantify multispectral reflectance and shape traits. Following a detailed segmentation of different body parts, custom “tail quantification” and “wing-grid coordinate” pipelines are applied to record information about tails, wing shape, and multi-band reflectance traits.
Body-part segmentation
Our initial descriptive data include an overall specimen outline, but in order to segment this outline into different body parts, key landmarks are identified based on conventional geometry, including but not limited to mathematically searching the topology of the specimen outline (labeled as crosses and circles in Fig. 9b). We include two segmentation methods. Basic segmentation (fully automated segmentation of specimen shapes according to landmarks with straight lines) can be used in the absence of data from the more time intensive manual fore- and hindwing segmentation. The manually defined fore- and hindwing segmentation pipeline is semi-automated, with human input through a stand-alone software package adapted from a GitHub repository named “moth-graphcut”32, and the segmentations derived from it are more natural-looking (Fig. 9c). Basic segmentation is highly efficient, requiring no human input, but less accurate (inspection and correction are discussed later). In contrast, manual fore- and hindwing segmentation provides high accuracy of natural wing shape and full-wing reconstruction, with a throughput of approximately 100 specimen images processed per hour. In both methods, further morphological information, such as body size, body length, thorax width, antennal length, antennal width, and antennal curviness, are also automatically measured and collected along with the body-part segmentation (Methods; Fig. 1f).
Once specimens are segmented into body parts, the multispectral reflectance of each body part can be summarized (Fig. 9d). In addition to the analyses that can be done at the individual level with the initial descriptive data, more detailed comparisons can be made between the dorsal and ventral sides of different body parts. For example, by analyzing the reflectance of 17 specimens from 7 different families, we can observe that the dorsal hindwing shows significantly higher UV reflectance than its ventral side (Fig. 9e), possibly to assist in signaling, whereas the ventral side of the body and forewings shows higher fNIR reflectance than the dorsal side (Fig. 9e), possibly to assist in thermoregulation. However, additional processing is needed to produce coherent trait data within individual body parts that are comparable among more distantly related taxa.
Universally applicable wing coordinates
To compare multispectral wing traits across different wing shapes, we developed a generalizable pipeline consisting of four main components (Fig. 4): (1) complete wing shape reconstruction, (2) secondary landmark identification, (3) wing grid generation, and (4) hindwing tail summary. This system overcomes the particular difficulty of accounting for and quantifying diverse hindwing tails, and the processed data generated from this pipeline can also be directly applied in shape analyses.
In Lepidoptera and many other winged insects, a region of the hindwing often overlaps a portion of the forewing, complicating automated shape reconstruction. In our imaging paradigm, a specimen’s hindwing is overlapped by the forewing in the dorsal-side image, and the forewing is overlapped by the hindwing in the ventral-side image (Fig. 4a). In our algorithm, the manually defined fore-hindwing boundaries are used to reconstruct the missing hindwing edge at the dorsal side and the incomplete forewing edge at the ventral side of a specimen. After reconstructing a complete wing, secondary landmarks are identified automatically (Fig. 1d and 4a). Tails on the hindwings are computationally separated from wing bodies before further processing (details about tail analyses can be found in Methods). A set of wing grids is then created according to the secondary landmarks of each wing (Fig. 1d and 4a). This grid system, which divides a specimen’s silhouette according to the centroid of a set of four corners, is robust to the shape differences between different species, even for distantly related Lepidoptera (e.g., Sphingidae and Lycaenidae; Fig. 4b). Furthermore, the majority of these grids remain steady even in the presence of moderate wing damage (IV & VIII in Fig. 4b). The default resolution of these matrices is 32 × 32, but it can also be adjusted to accommodate specimens with larger wing areas.
The quantification of hindwing tails and wing shapes also relies on this gridded system (Methods; Figs. 1e and 4c), and can be applied across the Lepidoptera (Fig.1e), without need for a priori identification of the presence or absence of tails. In contrast to other packages28, our wing grid pipeline allows comparisons of diverse wing shapes, especially hindwings, with differing venation systems and tails (Fig. 4b). The even number of gridded anchors (e.g., 128 points in a 32 × 32 grid system) on the silhouette of a wing can be used as “landmarks” for shape comparison in other applications9,11,34 (Fig. 4b). It can also be used to summarize multispectral wing patterns.
Based on this wing grid system, the average reflectance and variation of each grid can be calculated (Figs. 1g and 5a), and the results of a wing analysis can be stored in a 32 × 32 by N matrix (where N is the number of wavelength bands). The 32 ×32 resolution was determined by the size of the small specimens we handled; for example, it becomes meaningless to summarize data for a wing with 50 × 50 pixels using a finer resolution (e.g., 64 × 64). This standard format facilitates further statistical analyses among a wide variety of lepidopteran groups with different wing shapes.
The results of wing-patterning analyses can be further projected onto an average wing shape of a group for more intuitive interpretation (Figs. 1h and 5b). For example, the mean average reflectance identifies generally brighter wing regions (Fig. 5b) for RGB bands. High UV contrast regions appear to be important in UV intraspecific signaling35, and we find that such regions are more likely to be seen on the dorsal side of Lycaenidae, but on the ventral side of Papilionidae (Fig. 5c, d). We can also compare the variability in the location of these high UV variable regions for a given group of taxa to show where they are highly conserved (low values) versus where they are more labile (high values; Fig. 5e, f). Such conserved regions indicate that UV variation (which could be involved in signaling) in that wing region (whether present or not) is highly constrained and therefore stable across different species. Although these are examples chosen to illustrate a wide variety of wing shapes rather than targeting a specific scientific question, they already begin to provide biological insights for further study, demonstrating the utility of carrying out systematic studies of lepidopteran traits using this approach.
Inspection, manual correction, and visualization
Given the relatively large file sizes (~240 Mb per image) and time intensive post-processing pipelines, most of our protocols are designed to be run in high-performance computing environments (i.e., clusters); however, inspecting and manually correcting the images are inconvenient in such environments. We therefore designed the pipeline to enable a small proportion of the dataset to be downloaded to a local machine for inspection and manual correction. In total, our pipeline has five potential points where inspection and manual correction are possible (Methods). At each inspection point, we also developed corresponding scripts and user interfaces to manually correct the dataset on local machines with minimal resource requirements (low storage, memory, and CPU requirement). Scripts for customized visualization settings were also developed for wing shape (including tails) and wing patterns (Methods, Supplementary Information, and Data availability).
Read more here: Source link