Protecting Anonymity

Researchers search for ways to safeguard the data of study participants.

As the social sciences move into the realm of big data, researchers need to rethink their data handling practices to ensure the privacy and security of study participants, says Dr Liam Magee, a senior research fellow at Western’s Institute for Culture and Society.

Magee and colleagues are working on a hybrid social computing project, investigating techniques that will allow social science researchers to analyse unprecedented levels of experimental data in a secure and ethical way.

The techniques being examined by the team, and their applications in the social sciences, are distinctive because of the way data can be encrypted, and used in its encrypted form, explains Magee.

Universities go to great lengths to protect the data they hold for research purposes. This can include anonymising datasets before storing and processing information. Despite their effort, hackers can employ machine-learning algorithms to identify participants from stored information about age, background and education. “It’s not a theoretical threat,” says Magee. “This has happened.”

One possible solution, he says, is a database system that embeds the necessary security features from the start. The San-Shi computing system, designed by Japanese telecommunications company, NTT, fits the bill. “It’s one of a relatively new breed of systems that is looking at ways analysis can be conducted while the data is still encrypted,” he explains.

Need to know

  • Universities need to safeguard sensitive information derived from research participants.    
  • The San-Shi system allows analysis to be conducted on encrypted data on an aggregate level.
  • This preserves data privacy and security.

San-Shi breaks each participant’s dataset apart, encrypting the chunks of information separately and storing them across multiple computer servers. Data about a person’s age, for instance, is split into separate fragments, none of which are meaningful in their own right. The platform can carry out an analysis on the whole dataset by combining information shared from each of the servers. But crucially, if any one server is hacked, only a meaningless part of any person’s dataset will be revealed. “The hacker won’t be able to make any sense of the partial information,” says Magee.

While this security technique is widely known in the information technology industry, it has only recently been able to handle large datasets. As a result, it has not been widely used with social research datasets, says Magee.  But its promise is significant: “At no point can the researcher gain access to an individual record in a system of this sort, it only ever provides data at an aggregate level,” explains Magee, “so the individual remains for all intents and purposes obscured by the system as a part of its design.”

Magee’s team at Western is collaborating with Dimension Data, an NTT subsidiary, to test how this system will work in social science research applications. To do this, Magee and his colleagues developed code to create synthetic or proxy datasets — similar to what would be gathered in real-life health or education surveys, to ensure the experiments are realistic — and to analyse them within San-Shi’s encrypted environment. 

In one experiment, Magee and the team posited a case where researchers wanted to link demographic information with student attitudes regarding course relevance to work-readiness. The team uploaded survey responses that included student identification numbers on to the San-Shi system, and separately added a copy of student enrolment records with demographic information, where both were encrypted. In this fictional scenario, they were then able to obtain useful statistical information based on the aggregate data without being able to identify sensitive information from specific students.

Over the next year, the team will process actual research data. Magee is optimistic about the opportunities. “Ultimately, we could imagine live data being fed into this system, being used for research purposes and providing researchers with the kind of results they need, without revealing records at the individual level.” 

Magee warns that such technical security safeguards are only part of a much wider reappraisal of data ethics and governance. The challenges posed by big data, AI and cloud computing require a continued focus on the controls and consent users can exercise over their data. 

Meet the Academic | Dr Liam Magee

Dr Liam Magee is a Senior Research Fellow at ICS. Liam's principal research interests focus on the application of social methods and information technology to the areas of urban development and sustainability.

His doctoral dissertation, completed in 2010, examined the importance of cultural assumptions in the emerging world of interconnected knowledge systems, including emerging systems such as the Semantic Web. His current work extends this research into the areas of urban development and sustainability.

He is presently investigating how online games, simulations and other information technologies can facilitate greater clarity and visibility of sustainability objectives among urban communities and stakeholder groups.

This research includes study of the underlying technological requirements for such tools (data structures, communication and visualisation), as well as the social research methods for evaluating those tools in practice.


© Abscent84/iStock/Getty © Patrick George/Ikon Images/Getty
Future-Makers is published for Western Sydney University by Nature Research Custom Media, part of Springer Nature.