Skip to main content
Utah's Foremost Platform for Undergraduate Research Presentation
2024 Abstracts

Using Pair Blocking to Better Identify Unique Matches

Authors: Erica Webb, Joseph Price
Mentors: Joseph Price
Insitution: Brigham Young University

Blocking is a strategy used in machine learning to reduce the number of comparisons that need to be considered. In this paper, we develop a blocking strategy based on the characteristics of two people in the same household. We apply this pair blocking approach to link US census records and show that it dramatically reduces the number of possible matches to consider and can directly identify millions of unique matches at the blocking step. We apply our method to linking records for a few groups that have been harder to link using previous methods, including inter-racial couples and German-born Americans. Both of these are groups that have changed how they report aspects of their identity (race of birthplace) over time. Our approach allows us to dramatically increase the match rates for these couples across adjacent census years.