A high-throughput multispectral imaging system for museum specimens

The two-dimensional nature of many pinned Lepidoptera specimens allows us to omit some technical considerations, such as the variable incident angles that need to be carefully considered when imaging 3D objects, and our multi-spectral imaging rig with its custom designed platform can image both the dorsal and ventral sides of specimens (Figs. 1a, 2 and 3). The initial descriptive data can be used for general multispectral property exploration and museum specimen digitization (Fig. 1b); the processed data, which are built on the initial descriptive data, can be used to investigate multispectral colors. After fore- and hindwing reconstruction (Fig. 1c), the design of the universally applicable analytical framework (Figs. 1d and  4a) can objectively quantify wing tails and accommodate different wing shapes and venation systems (Figs. 1e and 4b). The framework can also be applied to systematically survey multispectral color patterns (Figs. 1g, h and 5) while providing basic morphological measurements (Fig. 1f).

Fig. 1: Summary of high-throughput, multi-faceted imaging methods.
figure 1

The workflow runs from top to bottom, and the main features are highlighted. a An exemplar image showing the high throughput capacity of our imaging system. b Multispectral images (upper two rows) can be summarized as false color images (lower row) by principal component analysis, where red corresponds to PC1, green to PC2, and blue to PC3. c Full wing shape can be virtually reconstructed using information from dorsal and ventral segmentation. d A universal coordinate system for each wing can be generated automatically based on four landmarks (labeled as red dots). e Summarized wing shapes of two groups of butterflies: Lycaenidae on the left-hand side and Papilionidae and Nymphalidae on the right-hand side. Tail probabilities (Tail Prob.), curviness (Curv.), and the standard error of tail curvature (S.E. of Tail Curv.) are color coded accordingly. f Body and antennal morphologies can be measured automatically during image processing. g The summary of reflectance of four exemplar spectral bands (RGB and UV) from three specimens are shown. h Variation in UV reflectance of a group of butterflies is summarized as ‘UV signal,’ which represents the average contrasts of UV reflectance. Blue indicates low signal, and red indicates high.

Fig. 2: The design of the imaging platform accommodating both dorsal and ventral sides of specimens.
figure 2

a The hidden pre-cut slots are illustrated in blue. White dashed lines, indicating the region that shows up in the image, were labeled by black sewn lines on the imaging platform for user guidance. The scale bar and black/white standard references are in the lower left corner. Details of the multi-layer foam backing can be found in Fig. 6. b The examples showing how dorsal and ventral sides of specimens being placed on the imaging platform.

Fig. 3: Imaging processing pipeline for each unit (a set of seven drawer images).
figure 3

a A set of seven raw images in NEF format; b Automatic recognition of the scale bar and standard white and black references; c Reflectance calibration according to the standard white and black references; d Individual specimen images are extracted by red bounding boxes.

Fig. 4: Diagram illustrating the full wing reconstruction, universal wing grids, and tail quantification.
figure 4

a Full wing shape can be virtually reconstructed according to dorsal and ventral segmentation. a, b Wing grids can then be generated after c Defining the tail regions. In the left panel, the red boundary represents the reconstructed rough shape of a hindwing based on the top five harmonics after being projected into the frequency domain by elliptical Fourier analysis. b Universal wing grids can accommodate distorted or broken wings (e.g., IV & VIII). I. Smerinthus cerisyi (Sphingidae); II. Catocala connubialis (Erebidae); III. Evenus coronata (Lycaenidae); IV. Heliconius melpomene (Nymphalidae); V. Allancastria cerisyi (Papilionidae); VI. Atrophaneura hector (Papilionidae); VII. Kallima inachus (Nymphalidae); VIII. Corades medeba (Nymphalidae).

Fig. 5: Reflectance summary from exemplar bands according to wing grids.
figure 5

a Summarized gridded reflectances in blue, green, and red bands can be b Projected onto the average wing shapes of a selected group of specimens. c, d UV signal (the average UV contrast among species) showing the most common likely highly UV variable regions on the wings. e, f Variation of UV variable regions between species, indicating regions of UV patterning that are highly conserved (blue) versus those that are more actively changing (red).

The imaging system design

Our high-throughput multispectral imaging system represents a compromise between the speed of traditional imaging and the need for objective spectral data. The system consists of a high-resolution SLR camera (Nikon D800) with its internal UV-IR filter removed to allow for UV–visible-IR imaging, fitted with a 28–80 mm f/3.3–5.6 G Autofocus Nikkor Zoom Lens (Methods). The customized imaging platform was designed to accommodate both ends of a pin, so mounted specimens can be positioned on the platform either dorsally or ventrally (Methods; Fig. 2). A reference bar containing black and white standard references and a scale bar is attached to the imaging platform in each round of imaging by a hook-and-loop fastener (Methods; Fig. 6a). The rough cost excluding the computational cost is ~$4500 (Supplementary Information).

Fig. 6: The design of the imaging platform.
figure 6

a Reference bar and b Corresponding materials. c Imaging platform with the hidden pre-cut slots shown in blue and d Corresponding materials. (1) Black (spectralon black standard AS-001160-960, Labsphere) and white (spectralon white standard AS-001160-060, Labsphere) standard references; (2) Hook loop strips with adhesive (2.4″ × 2.4″); (3) Adhesive forensic evidence labels (2 cm measure; A-6243); (4) Adhesive tear resistant waterproof photo craft paper; (5) Neoprene sponge black foam pads with adhesive (12″ × 8″ × 1/4″); (6) Black cotton thread; (7) Canvas stretcher bar kit (16″ × 20″); (8) Neoprene sponge black rubber foam with adhesive (12″ × 8″ × 1/8″); (9) Neoprene foam anti vibration pads with adhesive (6″ × 6″ × 1/4″); (10) Polyethylene foam (18″ × 16″ × 1.5″). Detailed information can be found in Supplementary Information.

For each set of specimens, a series of seven images (hereafter referred to as “drawer images”; Fig. 3a) are taken in raw format (*NEF) over the course of two minutes. These seven drawer images correspond to the following spectral imaging ranges (with details about light settings included in Methods): UV-only (λ = 360 nm; reflected light filtered through a Hoya U-340 UV-pass filter on the camera; combined UV reflectance and unfiltered visible fluorescence (called UVF hereafter) comprised of both reflected UV and all UV-induced visible fluorescence; two near-IR bands (unfiltered reflected light from λ = 740 nm [NIR] and 940 nm [fNIR] LEDs); and three in the visible (reflected broadband white LED light, λ = 400–700 nm, one unfiltered RGB and two RGB images filtered by linear polarizers at orthogonal angles to detect polarization along that axis), which are later decomposed into red, green and blue channels. Up to thirty-five pinned specimens can be imaged simultaneously, depending on their sizes, with wing sides facing either dorsally or ventrally.

Drawer image processing

All raw (multispectral) drawer images are uploaded to a high-performance computing environment, where we have developed a pipeline to process images automatically. However, a small number of images (<5) can be processed on a desktop with reasonable resources and longer runtime (Methods). To preserve the color gradient of the specimens, images are first converted into linearized 16-bit TIFF format30 by dcraw31 (an open-source program for handling raw image formats). These 16-bit TIFF images are then analyzed using MATLAB scripts. A set of seven drawer images is considered one computing unit, and the same group of specimens in the dorsal unit has a corresponding ventral unit (Methods).

Each computing unit (Fig. 3a) is read into memory, and the standard black and white references are recognized on the white (regular RGB) image by their circular shapes (Fig. 3b). Rather than calculating the exact number of absorbing photons at each sensor13,27, we employ the remote sensing technique32 of converting all pixel values into reflectance (albedo) units (between 0 and 1) according to the black and white reference standards (Methods; Fig. 3c). The scale on the drawer image is recognized automatically by local feature-matching to a reference image of the same scale, and the number of pixels per centimeter is derived (Methods; Fig. 3b).

Specimen pinning variability and optical aberration of the camera lens system would introduce measurement error during the imaging process. We estimate this error range in length measurement to be less than 0.4% (or 0.16 mm of a 4 cm butterfly (Methods; Fig. 7). Even though the error is minute, we leave a clear 5 cm margin around the edges of the stage when specimens are imaged in order to avoid relatively high aberration in the vicinity of the image boundaries (Fig. 7d).

Fig. 7: Empirically measured lens-induced measurement distortion.
figure 7

a Illustration of scale standards placed at different heights on pins. b View of the image taken by the imaging system. c, d Frequency distribution c and spatial distribution d of size anomalies in absolute percentage compared to a 4 cm butterfly. e, f The raw value for d & e is provided. The dashed line in c & e indicates median and zero, respectively. The region bounded by the red dashed line in d, f are where we place our specimens for imaging.

Post-processing is applied to the UV, NIR (740 nm), fNIR (940 nm), and UVF bands to account for the differential sensor sensitivity to these wavelengths in the red, green, and blue channels (Methods; Fig. 3c), except for the RGB-white band, which does not require post-processing. An index of polarization is calculated as the absolute difference between the two orthogonally polarized RGB white images. This single measure of polarization can also provide an indication of the occurrence of structure-induced colorations, suggesting whether additional studies should be carried out to investigate polarization at other viewing or incident light angles33.

Extracting specimen images from drawer images

Our preliminary observations showed that Lepidoptera have the highest contrast with the background in the fNIR (940 nm) band, so we exploited this property to help recognize and extract individual specimen images from drawer images. (Methods; Fig. 3d). Each specimen’s multi-band images were aligned into a layered image stack (Fig. 8a, b) based on affine geometric transformations, such as translation, rotation, scale, and shear. This step is relatively time-consuming, and processing time roughly scales with specimen size. At this stage, the registered multi-band specimen image stack, our initial descriptive data, can either be archived as part of a specimen’s extended data or further transformed by our pipeline into higher-level processed data that produce shape, color and pattern trait data. For convenience, we included an additional binary mask layer with the information needed for background removal (Methods).

Fig. 8: The data structure of the initial descriptive data, with preliminary multispectral insights.
figure 8

a The multi-layer format (“cell format”, the technical term used in MATLAB) is applied to contain the descriptive data for a single side of a specimen. b The appearance of some exemplar layers for the South African lycaenid, Chrysoritis pyramus. The actual output layer order can be found in the Supplementary Information. c Variation in scale properties affecting UV reflectance can be found within a single specimen of Hebomoia glaucippe, compared here with C. pyramus on the left and A. wildei on the right.

The completed initial descriptive data contain registered multi-band images (including UV, blue, green, red, NIR, fNIR, fluorescence [RGB], and polarization [RGB]), a background removal mask, and the scale bar) (Fig. 3a and 8a, b). Although further analysis is required to extract specific trait data from these datasets, they can be powerful visual aids in the discovery of novel wing scale types and structures. For example, the orange patches at the forewing tips of Hebomoia glaucippe (L.) show strong UV reflectance14,15 (Figs. 3c and 8c), but the orange patches on Chrysoritis pyramus (Pennington) do not, suggesting a difference in the underlying physical mechanism producing these colors. Similarly, the white background on Hebomoia glaucippe shows little UV reflectance14, but the white patches on Arhopala wildei Miskin (and many other species with white patches) show significant UV reflectance. (Fig. 8c). With a suitable converter, these initial descriptive data can be used in software packages27,29 for analyses that take into account a range of animal visual systems. There is immense potential for discovery of multispectral phenomena currently hidden within museum collections over the world for centuries by using these initial descriptive data alone.

A series of more complex analytical pipelines were designed to further quantify multispectral reflectance and shape traits. Following a detailed segmentation of different body parts, custom “tail quantification” and “wing-grid coordinate” pipelines are applied to record information about tails, wing shape, and multi-band reflectance traits.

Body-part segmentation

Our initial descriptive data include an overall specimen outline, but in order to segment this outline into different body parts, key landmarks are identified based on conventional geometry, including but not limited to mathematically searching the topology of the specimen outline (labeled as crosses and circles in Fig. 9b). We include two segmentation methods. Basic segmentation (fully automated segmentation of specimen shapes according to landmarks with straight lines) can be used in the absence of data from the more time intensive manual fore- and hindwing segmentation. The manually defined fore- and hindwing segmentation pipeline is semi-automated, with human input through a stand-alone software package adapted from a GitHub repository named “moth-graphcut”32, and the segmentations derived from it are more natural-looking (Fig. 9c). Basic segmentation is highly efficient, requiring no human input, but less accurate (inspection and correction are discussed later). In contrast, manual fore- and hindwing segmentation provides high accuracy of natural wing shape and full-wing reconstruction, with a throughput of approximately 100 specimen images processed per hour. In both methods, further morphological information, such as body size, body length, thorax width, antennal length, antennal width, and antennal curviness, are also automatically measured and collected along with the body-part segmentation (Methods; Fig. 1f).

Fig. 9: Details of body part segmentation with preliminary observations.
figure 9

a Background removal based on fNIR image. b With primary landmarks (represented by crosses and circles) and vectors identifying the symmetrical axes (represented by segments in the solid red line and the blue dashed line), the specimen mask can be automatically segmented c With or without manually defined fore-hindwing division of the overlap region. The species shown in (ac) is Oxylides faunus. d Statistical summary (mean, variation, and patch size) of all bands for both sides (dorsal and ventral) of all body parts (four wings and the body) can be calculated. e Statistical results of exemplar bands based on 17 specimens from 7 different families. Center line represents median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. Statistically significant levels of difference between dorsal and ventral sides (under two-tailed t-test) are labeled by asterisks: *, 0.05; **. 0.01.

Once specimens are segmented into body parts, the multispectral reflectance of each body part can be summarized (Fig. 9d). In addition to the analyses that can be done at the individual level with the initial descriptive data, more detailed comparisons can be made between the dorsal and ventral sides of different body parts. For example, by analyzing the reflectance of 17 specimens from 7 different families, we can observe that the dorsal hindwing shows significantly higher UV reflectance than its ventral side (Fig. 9e), possibly to assist in signaling, whereas the ventral side of the body and forewings shows higher fNIR reflectance than the dorsal side (Fig. 9e), possibly to assist in thermoregulation. However, additional processing is needed to produce coherent trait data within individual body parts that are comparable among more distantly related taxa.

Universally applicable wing coordinates

To compare multispectral wing traits across different wing shapes, we developed a generalizable pipeline consisting of four main components (Fig. 4): (1) complete wing shape reconstruction, (2) secondary landmark identification, (3) wing grid generation, and (4) hindwing tail summary. This system overcomes the particular difficulty of accounting for and quantifying diverse hindwing tails, and the processed data generated from this pipeline can also be directly applied in shape analyses.

In Lepidoptera and many other winged insects, a region of the hindwing often overlaps a portion of the forewing, complicating automated shape reconstruction. In our imaging paradigm, a specimen’s hindwing is overlapped by the forewing in the dorsal-side image, and the forewing is overlapped by the hindwing in the ventral-side image (Fig. 4a). In our algorithm, the manually defined fore-hindwing boundaries are used to reconstruct the missing hindwing edge at the dorsal side and the incomplete forewing edge at the ventral side of a specimen. After reconstructing a complete wing, secondary landmarks are identified automatically (Fig. 1d and 4a). Tails on the hindwings are computationally separated from wing bodies before further processing (details about tail analyses can be found in Methods). A set of wing grids is then created according to the secondary landmarks of each wing (Fig. 1d and 4a). This grid system, which divides a specimen’s silhouette according to the centroid of a set of four corners, is robust to the shape differences between different species, even for distantly related Lepidoptera (e.g., Sphingidae and Lycaenidae; Fig. 4b). Furthermore, the majority of these grids remain steady even in the presence of moderate wing damage (IV & VIII in Fig. 4b). The default resolution of these matrices is 32 × 32, but it can also be adjusted to accommodate specimens with larger wing areas.

The quantification of hindwing tails and wing shapes also relies on this gridded system (Methods; Figs. 1e and 4c), and can be applied across the Lepidoptera (Fig.1e), without need for a priori identification of the presence or absence of tails. In contrast to other packages28, our wing grid pipeline allows comparisons of diverse wing shapes, especially hindwings, with differing venation systems and tails (Fig. 4b). The even number of gridded anchors (e.g., 128 points in a 32 × 32 grid system) on the silhouette of a wing can be used as “landmarks” for shape comparison in other applications9,11,34 (Fig. 4b). It can also be used to summarize multispectral wing patterns.

Based on this wing grid system, the average reflectance and variation of each grid can be calculated (Figs. 1g and 5a), and the results of a wing analysis can be stored in a 32 × 32 by N matrix (where N is the number of wavelength bands). The 32 ×32 resolution was determined by the size of the small specimens we handled; for example, it becomes meaningless to summarize data for a wing with 50 × 50 pixels using a finer resolution (e.g., 64 × 64). This standard format facilitates further statistical analyses among a wide variety of lepidopteran groups with different wing shapes.

The results of wing-patterning analyses can be further projected onto an average wing shape of a group for more intuitive interpretation (Figs. 1h and 5b). For example, the mean average reflectance identifies generally brighter wing regions (Fig. 5b) for RGB bands. High UV contrast regions appear to be important in UV intraspecific signaling35, and we find that such regions are more likely to be seen on the dorsal side of Lycaenidae, but on the ventral side of Papilionidae (Fig. 5c, d). We can also compare the variability in the location of these high UV variable regions for a given group of taxa to show where they are highly conserved (low values) versus where they are more labile (high values; Fig. 5e, f). Such conserved regions indicate that UV variation (which could be involved in signaling) in that wing region (whether present or not) is highly constrained and therefore stable across different species. Although these are examples chosen to illustrate a wide variety of wing shapes rather than targeting a specific scientific question, they already begin to provide biological insights for further study, demonstrating the utility of carrying out systematic studies of lepidopteran traits using this approach.

Inspection, manual correction, and visualization

Given the relatively large file sizes (~240 Mb per image) and time intensive post-processing pipelines, most of our protocols are designed to be run in high-performance computing environments (i.e., clusters); however, inspecting and manually correcting the images are inconvenient in such environments. We therefore designed the pipeline to enable a small proportion of the dataset to be downloaded to a local machine for inspection and manual correction. In total, our pipeline has five potential points where inspection and manual correction are possible (Methods). At each inspection point, we also developed corresponding scripts and user interfaces to manually correct the dataset on local machines with minimal resource requirements (low storage, memory, and CPU requirement). Scripts for customized visualization settings were also developed for wing shape (including tails) and wing patterns (Methods, Supplementary Information, and Data availability).

Read more here: Source link