Reducing the Cost of Language Test Creation Through Artificial Intelligence Skip to main content
Utah's Foremost Platform for Undergraduate Research Presentation
2025 Abstracts

Reducing the Cost of Language Test Creation Through Artificial Intelligence

Author(s): Anna Johnson, Joanna Clark
Mentor(s): Matthew Wilcox
Institution BYU

The cost of developing high-quality language test content has long been a challenge for educational institutions and testing organizations, particularly due to the need for skilled human input in creating, evaluating, and updating test materials. Artificial Intelligence (AI) offers a solution to significantly reduce these costs while maintaining, or even improving, the quality of the tests. This research explores how AI can be employed in the design and production of language test content, leading to more cost-efficient testing processes. AI-powered systems, particularly through natural language processing (NLP) and machine learning (ML), can automate key steps in test content creation. These systems can generate a wide variety of language test items—such as reading comprehension passages, vocabulary exercises, and grammar questions—at a large scale. This research will also explore the effectiveness of AI in creating listening tasks, particularly in relation to how authentic AI-generated passages sound when not recorded directly by voice talent. AI-driven tools can generate thousands of test items in a fraction of the time it would take a human team, cutting down on labor costs and development time. Additionally, AI can create adaptive tests that modify question difficulty in real-time, ensuring that fewer test items are required to accurately assess a candidate’s language proficiency, further streamlining test length and cost. For AI to create language tests accurately, in addition to efficiently, it must meet the current standards of the American Council of Teaching Foreign Language (ACTFL). ACTFL categorizes students in one of the following categories based on their language proficiency: novice, intermediate, advanced, superior, and distinguished. Each level has its own set of criteria a student must meet.  AI must have the ability to create passage length and complexity that coincide with each of these five levels to accurately categorize a student’s proficiency level. The accuracy and fluidity of each test item must be equal among all languages tested. Tests will be created in multiple languages, focusing on Chinese and Portuguese, according to the ACTFL guidelines. These tests will be evaluated by experts to assure their validity and accuracy. In conclusion, integrating AI into the language test creation process provides a promising path toward reducing the costs associated with test development. Optimistically, given a clear prompt and set of criteria, AI will generate test items with greater speed and convenience. This approach will ensure scalable, high-quality content while minimizing the resource-intensive efforts traditionally required for language testing.