Situating Big Data

One of the defining questions for education over the next decade is, how do we shift education from a data poor to a data rich activity? Over the previous decade, we have seen a rise in shared national and state (Common Core) standards and frameworks that articulate what we, as a country, believe young people and adults should be able to think, know, and do in order to be scientifically literate, but we are only now beginning to see a concomitant rise in large scale, data rich strategies for assessing such knowledge, skills, and dispositions.

While inroads into so called “big data” techniques (the capture, curation, storage, and analysis of massive, complex data sets spanning large numbers of individuals in aggregate) have been made at least in relation to technologies for learning, they have not yet caught up to our more
sophisticated and inclusive frameworks for science learning goals. Take for example the National Academy of Science’s (Bell & Lewenstein, 2009) “six strands of science learning” framework that unifies goals across both formal and informal science learning environments to include not only content knowledge (strand 2) and inquiry practice (strand 3) but also interest (strand 1), epistemological disposition (strand 4), identity development within the domain (strand 6), and longer term participation in the field (strand 5). Big data techniques applied to learning have made some progress in areas such as content knowledge and, to a lesser extent, inquiry practice (both areas in which more traditional techniques already fare well), but it has not yet made progress in the more challenging assessment areas that link interest, identity, participation, and epistemology – let alone putting such constructs in conversation with one another to create a more coherent and convincing data ecology for making strong inferences about learning.

If we want to catalyze progress toward more expanded frameworks for STEM learning goals that include tricky variables such as identity and dispositions, then we must make traction on their empirical measure in ways that are commensurate with contemporary “data rich” corpora and techniques — and we must do so in ways that are theory-driven and comprehensive, and not simply a list of strands or themes.

The Situating Big Data project seeks to marry theories of situated cognition to the big data movement by connecting clickstream data from technologies in isolation to key forms of multimodal data available from their contexts of use. Contextual data include individual and group discourse (online and in-room), individual and curricular artifacts, classroom assessments, and school performance data (grades and test scores). We study learning technologies (games, especially) using diverse datasets, data types, and analyses: Clickstream telemetry data, a shared online community forum, and multiple formal and informal learning environments. We goal to generate a more data-driven methodology for investigating situated cognition as well as new models for data-driven design. Situating big data in this way, we argue, can create radically new and better models for data-driven education more broadly.