“Big Data and the Humanities” with grad Uni co-hosts James Johndrow (Statistics) and Allen Riddell (Literature).
James Johndrow is a PhD Candidate in the Department of Statistical Science. His research focuses on developing novel Bayesian methods for unordered categorical data. He is motivated by the challenge of modeling human behavior, and therefore take special interest in problems inspired by “micro-level” data (e.g. fMRI and neural electrophysiology data) as well as “macro-level” data (e.g. financial time series, web history, choice experiment, and click data) that is pertinent to understanding human decision-making.
Title: “Big data and the future of human knowledge”
Abstract: Big data is very much a part of the current zeitgeist, and has become a buzzword in academia, business, and government. James will discuss the origins of this phenomenon, how it has changed various fields, and the challenges inherent in handling very large datasets. He will describe several active areas of research in methods for analyzing big data, as well as the limitations of these methods. Early attempts to build industries and fields around analytics are discussed. He will conclude with some thoughts on how data and analytics will shape the future of human endeavors, with a special focus on the use of mathematical models to predict future events. He suggests that there are fundamental limits to the capabilities of such models, and discuss the importance of keeping these limitations in mind as we move toward an increasingly quantitative understanding of the world .
Allen Riddell is a PhD Candidate in the Program in Literature. His current work explores how intellectual and literary historians might use statistics and machine learning to study very large text collections (of books, academic journals, newspapers, etc.). His presentation is called:
Title: “How to Read 22,198 Journal Articles: Studying the History of German Studies with Topic Models”
Abstract: Academic journals record the development of German Studies in the United States over the 20th century. Reading through and documenting trends in tens of thousands of journal articles presents a challenge. In this presentation, Allen will consider alternative ways of reading articles published between 1928 and 2006 in Monatshefte, New German Critique, The German Quarterly, and German Studies Review. One approach, a probabilistic topic model, captures major trends, including the relative decline in articles about language pedagogy and the rise of literary history and criticism.