(Reuters Health) - - For years, researchers have been studying medical conditions using huge swaths of patient data with identifying information removed to protect people’s privacy. But a new study suggests hackers may be able to match “de-identified” health information to patient identities.
In a test case described in JAMA Network Open, researchers used artificial intelligence to link health data with a medical record number. While the data in the test case was fairly innocuous - just the output of movement trackers like Fitbit - it suggests that de-identified data may not be so anonymous after all.
“The study shows that machine learning can successfully re-identify the de-identified physical activity data of a large percentage of individuals, and this indicates that our current practices for de-identifying physical activity data are insufficient for privacy,” said study coauthor Anil Aswani of the University of California, Berkeley. “More broadly it suggests that other types of health data that have been thought to be non-identifying could potentially be matched to individuals by using machine learning and other artificial intelligence technologies.”
Aswani and colleagues used one of the largest publicly available patient databases, the National Health and Nutrition Examination Survey, or NHANES. Included in the database were recordings from physical activity monitors, during both a training run and an actual study mode, for 4,720 adults and 2,427 children.
The researchers showed their computer the data from the training runs for each person and included six demographic characteristics: age, gender, educational level, annual household income, race/ethnicity, and country of birth. The training data for each person was given a made-up record number.
Then Aswani and his colleagues fed the computer the second set of activity data, including the six demographic factors. For 95 percent of the adults and 86 percent of the children, the computer successfully matched the two sets.
What are the practical implications of that matchup?
Aswani offers a hypothetical situation. “Say your employer is giving a discount for participation in a wellness program and will be collecting demographic information as well as physical activity data,” he said. “At the same time, your health insurance company might have a program to try to get insureds to lose weight. They also collect demographic information and physical activity data, but remove identifying information.”
Theoretically, your employer could link the two data sets and “then they will accurately be able to link to the rest of your medical record,” Aswani said.
Another scenario, Aswani said, is that your smart phone is collecting your movement data as part of a health app. If your insurer also has movement data, the app maker might be able to link your name to your medical record and then sell the information to others.
Dr. Elliott Haut worries that studies like this one will spark fears in the public, which might call for cessation of research using de-identified data. That would be a mistake, said Haut, vice chair of quality, safety in the department of surgery at the Johns Hopkins School of Medicine and an associate professor of health policy and management at the Johns Hopkins University Bloomberg School of Public Health.
While Haut acknowledges the risk that patient data could be relinked to patient identities, the benefits of research with this kind of data far outweigh those risks and can change medical practices for the better, he said.
For example, he said, as a trauma surgeon, he wondered if the common practice of spine immobilization - putting a neck collar on and buckling a patient to a back board - is helpful or harmful for gunshot victims. The goal is to prevent movement and thus possibly paralysis.
“We looked at the data and not only is this not beneficial, but it also could be harmful because the first responder takes five to 10 minutes doing this procedure instead of going directly to the hospital where we can start fixing them,” Haut said. “If you are critically injured, that five minutes makes a huge difference.”
SOURCE: bit.ly/2EDCm8k JAMA Network Open, online December 21, 2018.
Our Standards: The Thomson Reuters Trust Principles.