Gregory Yauney
I'm a sixth-year CS PhD candidate at Cornell, advised by David Mimno. He is great.
I am interested in the impact of data curation in ML, rigorous evaluation of language models, the intersection of NLP with ML theory, and digital humanities. I also like photography!
CVGoogle ScholardblpGitHubInstagramTwitter
I'm on the postdoc and industry research job market! Please reach out if you think I'd be a good fit for your team:
Stronger Random Baselines for In-Context Learning
Gregory Yauney and David Mimno
COLM 2024
Paper  Code  Reproduction

A Pretrainer’s Guide to Training Data:
Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Shayne Longpre, Gregory Yauney, Emily Reif, Katherine Lee, Adam Roberts,
Barret Zoph, Denny Zhou, Jason Wei, Kevin Robinson, David Mimno, and Daphne Ippolito
NAACL 2024
Outstanding Paper Award
Paper  Slides  Poster

The Afterlives of Shakespeare and Company in Online Social Readership
Maria Antoniak, David Mimno, Rosamond Thalken, Melanie Walsh, Matthew Wilkens, Gregory Yauney
Journal of Cultural Analytics 2024
Paper  Code

Data Similarity is Not Enough to Explain Language Model Performance
Gregory Yauney, Emily Reif, and David Mimno
EMNLP 2023
Paper  Code  Poster  Reviews

Probing Heterogeneous Pretraining Datasets with Small Curated Datasets
Gregory Yauney, Emily Reif, and David Mimno
Data-Centric Machine Learning Research Workshop at ICML 2023
Paper  Poster

Comparing Text Representations: A Theory-Driven Approach
Gregory Yauney and David Mimno
EMNLP 2021
Paper  Code  Poster  Blog

Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents
Gregory Yauney, Jack Hessel, and David Mimno
EMNLP 2020
Paper  Code  Talk

Network Analysis Finds Shifts in the History of Modern Architecture
Gregory Yauney and David Mimno
Poster at Digital Humanities 2020
Abstract  Poster  Code  Data

Combatting the Challenges of Local Privacy for Distributional Semantics with Compression
Alexandra Schofield, Gregory Yauney, and David Mimno
Privacy in Machine Learning Workshop at NeurIPS 2019
Paper  Poster

Computational Prediction of Elapsed Narrative Time
Gregory Yauney, Ted Underwood, and David Mimno
Workshop on Narrative Understanding at NAACL 2019
Paper  Poster