Jenkins, Abigail; Baugh, Makinnon; Frandsen, Paul; White, Alexander; Dikow, Rebecca (Brigham Young University)
Faculty Advisor: Frandsen, Paul (Life Sciences, Plant and Wildlife Sciences)
Historically, physical plant specimens have been preserved and mounted on paper sheets and stored in plant collections, or herbaria. Herbarium collections are used for a wide variety of research purposes, including plant taxonomy, ecology, and evolutionary biology. The process of digitizing the herbarium sheets is simple and involves taking high resolution photos of each sheet and recording the corresponding metadata and attributes of the particular sample. Digitized herbarium sheets can be useful for a variety of purposes, and, by making images freely available online, they become immediately accessible to the scientific community, facilitating remote analysis. In addition, in a digital format, the images become computable and usable for purposes such as training deep learning models for classification or analysis of morphology.
While the process of digitizing is simple, herbarium sheets contain other features not directly representative of the plant, such as annotations, labels, museum stamps, color palettes, and rulers. There are additional inconsistencies in the herbarium sheets that are introduced through staining, record keeping, and natural degradation. Taken together, this information can contribute a substantial amount of noise if one is to use the image for downstream research analysis concerning the pattern, shape, or color of the specimen. We have developed a pipeline to filter this extraneous information, using image segmentation (whereby the specimen material is partitioned from the background) and deep learning.
We present this pipeline for generating training data for image segmentation tasks along with a novel dataset of highly resolved image masks segmenting plant material from background noise. We used this dataset to train a neural network to segment plant material in herbarium sheets more generally, and our method is applicable to other museum data sources where masking may be useful for quantitative analysis of patterns and shapes.