Computational Archival Science and “Big Data” Production Records

Computational Archival Science (CAS) is a broad trans-disciplinary field that works to leverage analytical approaches from computer science towards the examination of archival records. A current thread of my research deals with the analysis and historical salience of software production records. What can be done to better organize them for historical inquiry? How can we take methodologies designed for commercial software analysis and shift them to a critical and historical context?

My position on the rise of production data studies (in computer gaming in particular) was recently published in the new game history journal ROMChip. Entitled, “Attending to Process and Data: A Research Alignment for Historical Game Production Archives,” the article is a field building exercise in aligning methodological insights from computer science with science and technology studies (STS), archival science and software production studies.

Recent work in this domain has included analysis of an 18 terabyte data set of production prototypes provided by the Entertainment Technology Center at Carnegie Mellon University. The data set contains 546 projects from 2001-2019 constituting 9.2 million files and around 5000 file formats. I am currently in the process of producing a summary characterization of the data set that has resulted in two publications at the Computational Archival Science Workshop at IEEE’s Big Data conference. The long term goal is the articulation of software tools and methodological techniques to help with the location of information within the data set pertaining to historical software development work and archival requirements.

The ETC data set is the source for multiple projects in my newly formed Software History Futures and Technologies (SHFT) research group.