MIT研究表明,手机定位数据存在匿名风险
发布时间: 2021-07-08
科研人员研究表明,匿名手机定位信息并非很多人以为的无关紧要,事实上,仅凭几条定位信息就可以非常容易地识别一个手机用户。
每当手机开机,联网都意味着手机的位置和移动可以被精准绘制出来。而这些数据经常被匿名提供给第三方,以精准投放广告和相关服务。《Scientific Reports》报道的一项研究称,人类的移动模式极度可预测,只通过四个数据点就几乎可以识别一个用户。
智能手机和其APP的日益普及,给运营和发布这些APP的公司带来了海量的用户数据,这些数据有时会以匿名数据或聚合数据集的形式公开发布。
这些数据对于广告商和服务提供商来说自然具有非凡的价值,但对于那些规划购物中心、分配紧急服务的人和当代社会科学家来说也十分重要。
然而,定位数据的传播和广泛应用已经使得评估其对隐私的侵犯变得极为复杂。例如,卫星导航仪制造商长期以来一直在使用来自手机和卫星导航仪本身的定位数据,通过计算用户在特定路段的移动速度来改善交通报告。这种计算中使用的数据是有效匿名的--没有实际的手机号码或个人详细资料与数据相关联。
但有一些实例则表明,“名义上”的匿名数据也可以与个人联系起来,一个典型案例是美国在线在2006年故意发布的一批数据,概述了2000万次匿名的网络搜索,《纽约时报》对这些数据进行了一番调查便轻松确定了“搜索者4417749”的身份。
越来越多的研究表明,无论人类的运动模式看起来多么随机和不可预测,实际上其范围非常有限,甚至可以作为一种个人独特标识来说明移动的人的身份。
麻省理工学院(MIT)和卢万天主教大学的研究人员研究了150万人15个月的匿名手机记录。
他们分析手机定位的移动痕迹时发现,只需要四个地点和时间,就足以识别一个特定用户。该研究的主要作者——麻省理工学院的Yves-Alexandre de Montjoye接受BBC采访时说:“在20世纪30年代,人们发现需要12个点来唯一地识别和描述一个指纹。我们在这里所做的是完全相同的事情,但有移动痕迹。我们的移动方式和行为是如此独特,以至于四个点就足以识别95%的人,而且这种数据比人们想象的更容易获得。”
但团队也表示,他们的目的是提供一个数学联系——一个适用于所有移动数据的公式来量化匿名性和诗句实用性的权衡,他们希望这项工作能引发关于这种大数据和个人隐私的相对优点的辩论。
国际隐私组织的Sam Smith则补充:“我们的手机往往会向具有不同隐私政策的多个组织报告位置和背景数据,但我们从这些组织提供的服务中得到的好处都远远超过了这些趋势对我们的隐私构成的威胁,尽管我们被告知我们可以选择提供多少信息,但实际上个人没有任何选择。科学和技术不断地使我们更难生活在一个隐私受到政府保护、企业尊重和个人珍惜的世界里——文化规范总是在落后于科技进步。"
原文:
Scientists say it is remarkably easy to identify a mobile phone user from just a few pieces of location information.
Whenever a phone is switched on, its connection to the network means its position and movement can be plotted.
This data is given anonymously to third parties, both to drive services for the user and to target advertisements.
But a study in Scientific Reports warns that human mobility patterns are so predictable it is possible to identify a user from only four data points.
The growing ubiquity of mobile phones and smartphone applications has ushered in an era in which tremendous amounts of user data have become available to the companies that operate and distribute them - sometimes released publicly as "anonymised" or aggregated data sets.
These data are of extraordinary value to advertisers and service providers, but also for example to those who plan shopping centres, allocate emergency services, and a new generation of social scientists.
Yet the spread and development of "location services" has outpaced the development of a clear understanding of how location data impact users' privacy and anonymity.
For example, sat-nav manufacturers have long been using location data from both mobile phones and sat-navs themselves to improve traffic reporting, by calculating how fast users are moving on a given stretch of road.
The data used in such calculations are "anonymised" - no actual mobile numbers or personal details are associated with the data.
But there are some glaring examples of how nominally anonymous data can be linked back to individuals, the most striking of which occurred with a tranche of data deliberately released by AOL in 2006, outlining 20 million anonymised web searches.
The New York Times did a little sleuthing in the data and was able to determine the identity of "searcher 4417749".
Recent work has increasingly shown that humans' patterns of movement, however random and unpredictable they seem to be, are actually very limited in scope and can in fact act as a kind of fingerprint for who is doing the moving.
The new work details just how "low-resolution" these location data can be and still act as a unique identifier of individuals.
Researchers at the Massachusetts Institute of Technology (MIT) and the Catholic University of Louvain studied 15 months' worth of anonymised mobile phone records for 1.5 million individuals.
They found from the "mobility traces" - the evident paths of each mobile phone - that only four locations and times were enough to identify a particular user.
"In the 1930s, it was shown that you need 12 points to uniquely identify and characterise a fingerprint," said the study's lead author Yves-Alexandre de Montjoye of MIT.
"What we did here is the exact same thing but with mobility traces. The way we move and the behaviour is so unique that four points are enough to identify 95% of people," he told BBC News.
"We think this data is more available than people think. When you think about, for instance wi-fi or any application you start on your phone, we call up the same kind of mobility data.
"When you share information, you look around you and feel like there are lots of people around - in the shopping centre or a tourist place - so you feel this isn't sensitive information."
The team went on to quantify how "high-resolution" the data need to be - the precision to which a location is known - in order to more fully guarantee privacy.
Co-author Cesar Hidalgo said that the data follow a natural mathematical pattern that could be used as an analytical guide as more location services and high-resolution data become available.
"The idea here is that there is a natural trade-off between the resolution at which you are capturing this information and anonymity, and that this trade-off is just by virtue of resolution and the uniqueness of the pattern," he told BBC News.
"This is really fundamental in the sense that now we're operating at high resolution, the trade-off is how useful the data are and if the data can be anonymised at all. A traffic forecasting service wouldn't work if you had the data within a day; you need that within an hour, within minutes."
Dr Hidalgo notes that additional information would still be needed to connect a mobility trace to an individual, but that users freely give away some of that information through geo-located tweets, location "check-ins" with applications such as Foursquare and so on.
But the authors say their purpose is to provide a mathematical link - a formula applicable to all mobility data - that quantifies the anonymity/utility trade-off, and hope that the work sparks debate about the relative merits of this "Big Data" and individual privacy.
Sam Smith of Privacy International said: "Our mobile phones report location and contextual data to multiple organisations with varying privacy policies."
"Any benefits we receive from such services are far outweighed by the threat that these trends pose to our privacy, and although we are told that we have a choice about how much information we give over, in reality individuals have no choice whatsoever," he told BBC News.
本文转载自:BBC www.bbc.com
原文作者:Jason Palmer
原文地址:https://www.bbc.com/news/science-environment-21923360
-