An interdisciplinary project works to solve Big Data challenges
When Elise Young ’14 describes the work she is doing with the Digital Problem Solving Initiative, or DPSI, it almost sounds as if she is telling a joke. Three Harvard Law School students, several computerscientists, a physicist and a design student walk into a room.
But in fact, their mission is quite serious: finding methods for organizing streams of data from students in online classes, without violating the students’ privacy rights.
Young, along with David Gobaud ’15 and Lindsay Lin’15, is working on “Developing Big Data Analysis Tools,” aka Big Data, one of five DPSI projects that are bringing together students, researchers, and staff from across Harvard University to focus on the challenges and opportunities posed by technology in educational settings. Housed in the Berkman Center for Internet & Society, the initiative is headed by Professor of Practice Urs Gasser LL.M. ’03, and evolved from conversations between Gasser and Dean Martha Minow about the future of education and the role of technology.
THE RIGHT STUFF
Law students and programmers work together on software that can track online students—but not too closely.
Illustrations by James Yang
edX sends information to Big Data in unorganized log files compiling weeks of user activity. Deciding which data can be used and how requires legal insight from the very beginning.
The engineers take over. Programmers take the raw data and reformat it, fixing bugs and linking disparate data points so that the final product will be useful for researchers.
edX takes the now-legible data, eliminates key identifiers and sends it back to Big Data. The team then tries to “re-identify” students from the stripped data. If no more than 5 percent of students can be unmasked, the program is a success.
The Big Data team, led by Harvard’s chief technology officer, Jim Waldo, has been working with edX—the Massive Open Online Course platform founded by Harvard and MIT—to build software that analyzes the massive amount of data gathered by MOOCs on student users.
When students participate in MOOCs, their every key stroke and click are tracked. That information is valuable to educators seeking t0 improve both educational content and the way they deliver it. But releasing this data could run up against the federal Family Educational Rights and Privacy Act, or FERPA, which protects the privacy of student education records.
Although eliminating obvious identifying factors such as social security numbers is easy, FERPA also requires obscuring information that, put together, could identify an individual. According to Young, this is tougher than it sounds, particularly because the combination of available information and enormous computing power means that even small bits of seemingly unrelated information can eradicate anonymity.
Over the course of the academic year, the students and researchers have been working together to address these issues.
“This is what I imagine it would be like to work for a small company,” said Young, who learned about the Big Data project from her involvement with the Harvard Journal of Law & Technology. “What’s been really great is having all these programmers ask questions and having to figure out how to explain [legal and policy issues] to them in a way that makes sense, whilegetting rid of all the extraneous details.”
Young, with the help of Gobaud and Lin, wrote a memo for the general counsel of edX recommending ideal processes for removing identifying markers, processes often used by experts who need to anonymize data. The general counsel relied on the memo to make recommendations to edX’s 30 educational partners, and it was a key document at a data conference held in December at Stanford University.
“This has been the most fun thing I’ve done this year, but I [also] see this having the greatest impact on my career trajectory,” said Young. She is working on a paper that she hopes to publish in a Harvard journal arguing that FERPA, which comes into play when federal funding is involved, does not actually apply to institutions like edX that only tangentially receive federal funds and offer classes for free.
Although Young has led the students’ policy research, Lin and Gobaud bring a deeper interest in computer science to the team. They recently created an email service called Pluto that lets users “unsend” and edit emails that have already been sent.
Gobaud said he relished the opportunity to work on the Big Data team. “I get to go talk to people who are working in a cutting-edge area, where the Department of Education doesn’t really know what to do yet,” he said. “There’s a chance to really impact policy.”
All of the DPSI work will be completed by the end of the semester (view information on the four other projects). But Gasser hopes the initiative will continue and that he will find new problems to tackle each year.
“The program is an incubator for future ideas,” Gasser said. “Linking the formal educational setting with the more informal mode of learning has been working extremely well.”