Last year, Apple Inc. kicked off a massive experiment with new privacy technology aimed at solving an increasingly thorny problem: how to build products that understand users without snooping on their activities.
Continue Reading Below
Its answer is differential privacy, a term virtually unknown outside of academic circles until a year ago. Today, other companies such as Microsoft Corp. and Uber Technologies Inc. are experimenting with the technology.
The problem differential privacy tries to tackle is that modern data-analysis tools are capable of finding links between large databases. Privacy experts worry these tools could be used to identify people in otherwise anonymous data sets.
Two years ago, researchers at the Massachusetts Institute of Technology discovered shoppers could be identified by linking social-media accounts to anonymous credit-card records and bits of secondary information, such as the location or timing of purchases.
"I don't think people are aware of how easy it is getting to de-anonymize data," said Ishaan Nerurkar, whose startup LeapYear Technologies Inc. sells software for leveraging machine learning while using differential privacy to keep user data anonymous.
Differentially private algorithms blur the data being analyzed by adding a measurable amount of statistical noise. This could be done, for example, by swapping out the answer to one question (have you ever committed a violent crime?) with a question that has a statistically known response rate (were you born in February?).
Continue Reading Below
Someone trying to find links in the data would never be sure which question a particular person was asked. That lets researchers analyze sensitive data such as medical records without being able to tie the data back to specific people.
Differential privacy is key to Apple's artificial intelligence efforts, said Abhradeep Guha Thakurta, an assistant professor at University of California, Santa Cruz. Mr. Thakurta worked on Apple's differential-privacy systems until January of this year.
Apple has faced criticism for not keeping pace with rivals such as Alphabet Inc.'s Google in developing AI technologies, which have made giant leaps in image and language recognition software that powers virtual assistants and self-driving cars.
While companies such as Google have access to massive volumes of data required to improve artificial intelligence, Apple's privacy policies have been a hindrance, blamed by some for turning the company into a laggard when it comes to AI-driven products such as Siri.
"Apple has tried to stay away from collecting data from users until now, but to succeed in the AI era they have to collect information about the user," Mr. Thakurta said. Apple began rolling out the differential-privacy software in September, he said.
Users must elect to share analytics data with Apple before it is used.
Originally used to understand how customers are using emojis or new slang expressions on the phone, Apple is now expanding its use of differential privacy to cover its collection and analysis of web browsing and health-related data, Katie Skinner, an Apple software engineer, said at the company's annual developer's conference in June.
The company is now receiving millions of pieces of information daily -- all protected via this technique -- from Macs, iPhones and iPads running the latest operating systems, she said.
"Apple believes that great features and privacy go hand in hand," an Apple spokesman said via email.
Google, one of differential privacy's earliest adopters, has used it to keep Chrome browser data anonymous. But while the technology is good for some types of analysis, it suffers where precision is required. For example, experts at Google say it doesn't work in so-called A/B tests, in which two versions of a webpage are tested on a small number of users to see which generates the best response.
"In some cases you simply can't answer the questions that developers want answers to," said Yonatan Zunger, a privacy engineer at Google. "We basically see differential privacy as a useful tool in the toolbox, but not a silver bullet."
Researchers are coming up with "surprisingly powerful" uses of differential privacy, but the technology is only about a decade old, said Benjamin Pierce, a computer science professor at the University of Pennsylvania. "We're really far from understanding what the limits are," he said.
Differential privacy has seen wider adoption since Apple first embraced it. Uber employees, for example, use it to improve services without being overexposed to user data, a spokeswoman said via email.
Microsoft is working with San Diego Gas & Electric Co. on a pilot project to make smart-meter data available to researchers and government agencies for analysis, while making sure "any data set cannot be tied back to our customers," said Chris Vera, head of customer privacy at the utility.
The U.S. Census Bureau confronted the problem of links between data sets a decade ago. By 2005, the bureau was worried large databases outside its control could be used to de-anonymize censor participants, said John Abowd, chief scientist at the bureau. After meeting with some of the creators of differential privacy, the bureau became an proponent.
In 2008 the Census released its first product to use this technology -- a web-based data-mapping portal called OnTheMap -- and the bureau is now "making an intense effort to apply differential privacy to the publication of the 2020 census," Mr. Abowd said.
Write to Robert McMillan at Robert.Mcmillan@wsj.com
(END) Dow Jones Newswires
July 07, 2017 07:14 ET (11:14 GMT)