### The Privacy Puzzle Improved

A team of Reed Statistics students won first place in a prestigious national competition for an innovative algorithm that helps researchers glean information from datasets—without compromising individual privacy. That right – a technique for ensuing your privacy while allowing statistical analysis of data!

Zeki Kazan ’20, Kaiyan Shi ’20, and Simon Couch ’21 (seen here) won the Undergraduate Statistics Research Project Competition for their project, “A Differentially Private Wilcoxon Signed-Rank Test,” which outlines a new algorithm for hypothesis testing that upholds the privacy of the underlying data. In fact, their technique is twice as powerful as the standard private method, meaning that it requires less than half as much data to achieve the same statistical power.

Simply put, the problem is that big databases hold immense promise for answering scientific questions, but many organizations won’t allow researchers access to them because of the risk of an inadvertent breach of privacy—even when obvious markers like name and address have been stripped away. In 2014, for example, the New York City Taxi and Limousine Commission released a giant database of taxi rides in response to a freedom-of-information request. The commission attempted to anonymize the data, but enterprising journalists were able to piece together various clues to identify rides taken by celebrities.

To understand the Reed project, you need to know that statisticians often compare two sets of data using a tool known as a hypothesis test. Each hypothesis test requires a certain amount of data before it can detect a relationship between the two sets—the less data it needs, the more statistical power it has.

Now to go deeper …

There are many different types of hypothesis tests. The Reed team focused on the Wilcoxon Signed-Rank Test, which is commonly used when there is paired-sample data—where there is a natural association between the two sets (e.g. a patient’s blood pressure before and after watching a horror movie). It compares the sets in an attempt to determine whether there is a statistically significant relationship.

The team reworked the Wilcoxon test to ensure privacy, and employed an innovative technique to reduce the amount of data it required. With these two seemingly simple tweaks, the enhanced algorithm turned out to be much more powerful, yielding significant real-world implications. When tested, their model had a statistical power that was much closer to public-setting tests: achieving the same statistical power with only 40% of the data required by the earlier private-setting model. Because of this increased efficiency, the Reed algorithm can be used on smaller datasets, whereas previous models required enormous quantities of data. (source: here). No wonder this group of math geeks look happy in the picture above!