Development of Beginners’ and Advanced Corpora of Written Materials for People with Communication Disorders Who Are Members of the Church of Jesus Christ of Latter-day Saints Skip to main content
Utah's Foremost Platform for Undergraduate Research Presentation
2025 Abstracts

Development of Beginners’ and Advanced Corpora of Written Materials for People with Communication Disorders Who Are Members of the Church of Jesus Christ of Latter-day Saints

Author(s): Cora Goulding, Joseph Howard
Mentor(s): Dallin Bailey
Institution BYU

Vocabulary use varies between individuals based on many different factors, making it more challenging to decide which words might be best to target in therapy for communication disorders. People have many different parts of their lives that are important to them, and religious activity is often one of these important elements. Written materials produced by The Church of Jesus Christ of Latter-day Saints include specialized vocabulary that is important to understanding their content. We present the development of two corpora of written materials produced by the Church as a tool for studying the vocabulary used in these written materials. The introductory corpus consists of 161,977 tokens and is composed of materials geared toward adults who may be less familiar with the Church. The advanced corpus contains 43,765,115 tokens with more than 123,000 unique words and includes scriptures, study manuals, and other materials compiled for adult members of the Church. The corpus was constructed by manually scraping the texts from the Church’s website and by gathering additional texts that were scraped by the WordCruncher team. To clean the corpus and reduce noise in the analysis, a set list of types of potentially irrelevant material was removed from each text. The types of material removed include indices, descriptions of images embedded in the text, footnote numbers, and copyright information. The beginner and advanced corpora were then analyzed in AntConc for frequency of words and compared to another corpus to generate a list of keywords from the corpora. Future uses of the corpora include developing materials for AAC symbol sets based on keywords in the corpora, as well as identifying therapy targets for adults with communication disorders who are members of the Church.