Skip to main content
Utah's Foremost Platform for Undergraduate Research Presentation
2024 Abstracts

Meta-Analysis Of 58 Human RNA-seq Datasets To Predict Mechanisms and Markers for Resistance in ER+ Breast Cancer Treated with Letrozole (an aromatase inhibitor)

Authors: Brett Pickett, Lincoln Sutherland, Jacob Lang
Mentors: Brett Pickett
Insitution: Brigham Young University


Breast cancer is one of the most prevalent types of cancer present in society today, and is the second leading cause of cancer death for women. Approximately 13% (1 in 8) of women will develop invasive breast cancer, with 3% of women (1 in 39) dying from this type of cancer. Three important classifications used when formulating a treatment plan for breast cancer are the presence or absence of Estrogen Receptor (ER), Progesterone Receptor (PR), or Hormone Receptor (HR). Treating Estrogen Receptor Positive (ER+) breast cancer with aromatase inhibitors, such as Letrozole, is the current standard treatment for all postmenopausal women.

A prior study by Lee et. al. identified PRR11 as the only gene that was significantly overexpressed in resistant vs non-resistant cancers among the 51 genes in chromosome arm 17q23. The goal of the current study is to perform a secondary analysis of this valuable dataset to identify genes, signaling pathways, and biomarkers across the whole human transcriptome that are significantly associated with treatment resistance in ER+ patients.


We retrieved, preprocessed and analyzed 58 ER+ breast cancer samples from patients who had been treated with Letrozole, which are publicly available in the NCBI Gene Expression Omnibus (GEO) database. The Automated Reproducible MOdular Workflow for Preprocessing and Differential Analysis of RNA-seq Data (ARMOR) was used to process our data downloaded from NCBI. This workflow trimmed low quality reads from the RNA-sequence reads, mapped and quantified our data to generate a DEG list. Gene ontology enrichment with camera was also performed. Next, the genes were mapped to common gene identifiers and input to the signaling pathway impact analysis (SPIA) algorithm to identify intracellular signaling pathways that were enhanced by our DEGs. With that information, Pathway2Target was used to identify known drug targets within our identified pathways. Finally, a decision tree-based machine learning approach was used to predict features/expressed genes that could be used to most accurately classify responders vs nonresponders to Letrozole.


Our comparison of 36 responders versus 22 non-responders detected a total of 18,735 genes and identified 105 that were statistically significant (p-value < 0.05) after applying a false-discovery rate (FDR) correction, including SOX11, S100A8/S100A8, and IGLV3-25. We then used the Signaling Pathway Impact Analysis (SPIA) algorithm to determine whether any known intracellular signaling pathways were significantly enriched in DEGs (Bonferroni-adjusted p-value < 0.05). This analysis identified 4 pathways that were statistically significant in Non-Responders to Letrozole Treatment.

We then used the pathway results to predict 60 existing therapeutic targets that could be repurposed to treat the resistance phenotype. Notably, the predicted targets for the non-response phenotype included VEGFA, a current target for solid tumors as well as ESR1, an Estrogen Receptor.

We next wanted to determine whether we could predict transcriptional biomarkers that could aid with identifying patients that do not respond to treatment. To do so, we used the read counts for all samples as the input for this analysis and identified 278 transcriptional biomarkers. Performance metrics for all biomarkers identified yielded an area under the receiver-operator characteristic (AUROC) curve of 0.972 (Figure 2), indicating an exceptional ability to classify Letrozole responders vs non-responders by the transcriptional profile. Sensitivity for all transcriptional biomarkers was measured at 100%, and specificity at 94%. We used the top two biomarkers from our first analysis as input for a second analysis to determine the performance of a smaller subset. Our second analysis determined that PRDX4 and E2F8 together yielded an AUROC of 0.823 and an overall accuracy of 88.2%.


Our results identify additional DEGs, pathways, targets and biomarkers for further exploration in the treatment and categorization of ER+ breast cancer.