De-Anonymization: A background primer
The modern-day western society is heavily reliant upon technology and data to perform everything from day-to-day activities to highly complex financial transactions. Each of these activities requires some level of data transfer, whether it be the result of logging in to a social media website or something as passive as making a phone call. Each activity will ultimately result in some level of data being passed to another party and although a single piece of this data, i.e., a phone number, may not be useful, it may provide a very detailed image of who a person is when combined with other datasets. The ability to amass quantities of data so vast that innocuous and distinct pieces of data can be linked together with enough accuracy to de-anonymize individuals is very alarming. For example, people have been trained to look for secure web browsing sessions, but they are not exposed to what companies may do with data once they obtain it securely. The secure transmission of data is very different from the knowledge that data will be held securely and not re-transmitted, processed, hosted, or sold for any other reason than it was initially obtained. De-anonymization not only risks personal privacy, but also results in a breach of trust that can directly reduce the positive impacts of technological advances.
At its most basic level, data de-anonymization is the act of taking data which has had personally identifiable information (PII) stripped out and using certain datapoints or auxiliary information to link the anonymized data to real-life identities. For purposes of this explanation, anonymous or semi-anonymous data can include phone call records, web browsing histories, or geolocation data, whereas any data that can be conclusively linked to a specific identity with no further analysis or aggregation is not anonymized. Although data de-anonymization may be startling from a personal privacy point of view, it comes with real-world consequences that could include criminal activity, nefarious corporate interactions, or other more catastrophic acts, such as those perpetrated by countries Egypt, Russia, and China when a person speaks out against the government.
Many large companies provide anonymized data for purposes such as marketing, development, and research, but other capabilities exist to simply extract user information directly off websites and other subscribing systems. Regardless of the case, this privacy-sensitive information contains rich graph structural characteristics, meaning adversaries or even corporate entities can link anonymized data to individual identities if enough auxiliary information is available (Lee, Liu, Ji, Mittal & Lee, 2017). Furthermore, social networking sites are currently providing attackers the richest datasets available, simply because people are not considering the effects of sharing nearly every aspect of their life on a corporate-owned website. As explained by Wondracek, Holz, Kirda, and Kruegel (2010), “information about the group memberships of a user is often sufficient to uniquely identify this user. When unique identification is not possible, then the attack might still significantly reduce the size of the set of candidates that the victim belongs,” indicating an attempt to de-anonymize using only the data obtained from a single website can still be sufficiently successful so that auxiliary information may result in a full de-anonymization.
Since the topic of data de-anonymization has taken center stage, many techniques have been developed to assist in ensuring the anonymity of data. One such example is the k-anonymity model, which helps in providing a trivial level of anonymity under certain circumstances, but as Narayanan and Shmatikov (2009) point out in their groundbreaking research, “The fundamental problem with k-anonymity is that it is a syntactic property…Crucially, all of these defenses impose arbitrary restrictions on the information available to the adversary and make arbitrary assumptions about the properties of the social network.” In an effort to enhance privacy, people have turned to technological means such as the Tor network.
In simple terms, the Tor network is a very popular point-to-point (P2P) network of encrypted traffic which attempts to anonymize user activity by virtue of routing traffic along a circuit of other Tor nodes. Traffic travels from node to node, and each node knows only of the previous node and the next node, which is intended to ensure no single node would know of the entire path a specific transmission has taken. Unfortunately, much research has shown that technical limitations in place within the Tor network can result in reliable de-anonymization of specific transmissions and therefore real-world identities (Yang, Gu, Ling, Yin, & Luo, 2017). Although de-anonymization of Tor traffic is highly malicious and primarily of interest only to government agencies with the technologies capable of such an attack, the mere threat of de-anonymizing highly encrypted traffic may have a chilling effect on journalism and geopolitical reporting. Individuals in police states have a very legitimate reason to protect their identities and the international community benefits from transparency in reporting, but as de-anonymization techniques and technologies become more advanced and commonplace, the safety and security of individuals may be put at greater risk.
The Elderly, Social Media, and Technology: A current literature review
To fully grasp the complex relationship between the elderly and technology, and where de-anonymization fits in to this picture, background literature will provide a foundation for understanding. Much research exists on the elderly and their technology use, their use of social media, and privacy attitudes and concerns of the elderly, which will be discussed before looking at the question of if the elderly are more vulnerable to potential privacy harms than other age groups, and if those vulnerabilities translate into vulnerabilities to de-anonymization harms.
Technology plays a major role in the lives of the elderly, although they use technology somewhat differently than younger generations. Many elderly are active online, despite common myths that they shun technology. According to Vosner, Bobek, Kokol, and Krecic (2016), “70% of the elderly use the Internet on a daily basis” (p. 230). Many elderly spend their time online writing and checking emails and using search engines (Maass, 2011). The importance of search engines is corroborated by Tao & Shuijing (2016), who state, “Search engines are the most important Internet functions for older people, they use them to find information which they are interested in, such as news and health advice” (p. 286). Seniors also frequently participate in online forums, as the forums serve as a place where the elderly feel comfortable discussing a wide range of subjects, from private subject matters to public or casual conversations (Nimrod, 2010). Vosner et al. (2016) found that the elderly mostly use social networks for message writing and chat room participation. Additionally, while the elderly have never been the most active demographic on social media sites, their use of social media has been growing rapidly (Maass, 2011).
The elderly have many different motivations for spending time online, including “satisfaction with access to information, positive learning experiences, and the utility of some Internet activities such as online financial services and shopping” (Maass, 2011, p. 239). Additionally, many seniors gain a sense of connectedness from being online, even from activities as simple as exchanging emails with family members and friends (Maass, 2011). According to Jung, Walden, Johnson, and Sunday (2017), connectedness is a primary reason that many elderly begin using social networking sites.
Many elderly enjoy using Facebook to be able to see what their family members are doing, allowing the elderly to feel more present with their family (Jung et al., 2017). A study by Hage, Wortmann, Offenbeek, and Boonstra (2016) showed that social media increases connectivity with friends and family who are geographically far apart, although not with people who are geographically close. Gatto and Tak (2008) showed that when seniors connect with friends and family online, face-to-face time with loved ones is not affected, and the elderly reduce their time spent watching TV and listening to the radio to offset time spent online. The effect of switching out time spent using more passive media for the more interactive media of the Internet leads to overall greater connectedness for the elderly, as seniors end up spending more time overall engaging socially.
The elderly face barriers when it comes to technology use, however. Many elderly feel frustrated that learning computer skills take a long time to grasp, and frustrated with the physical computer itself (Maass, 2011). The elderly often reject technology because they find it difficult to learn how to use (Chou, Lai, & Liu, 2013). Maass (2011) also showed that many elderly are unsure whether they can trust information that they find online, and this uncertainty is reflected in their cautious attitude towards online privacy.
55% of people ages 55 and older have concerns about confidentiality online, compared to 36% of people 25 years old (Willis, 2006). Chou et al. (2013) found that the elderly have a tendency to believe that any actions online that involve money have a likelihood of being hacked, and therefore, the elderly are wary about these online activities. Maass (2011) supported this conclusion, determining that the elderly have many privacy concerns with online shopping because of a fear of data being stolen and used in inappropriate ways. A study from AARP (2000) also showed the same conclusion that many elderly worry their information is vulnerable during online transactions, and that personal data should not be shared without permission (as cited in Eastman & Iyer, 2004, p. 210).
These potential privacy violations lead to many elderly avoiding online activities that have any risk for identity theft or the misuse of their data (Gatto & Tak, 2008). In Jung et al. (2017), many participants who were interviewed expressed concern about social media sites sharing their personal information, and so they declined to use social media. In one case, a participant stopped using Facebook after he began questioning the reach of Facebook, viewing it as a nuisance when he continued to receive dozens of “spam” every single day (Jung et al., 2017, p. 1077).
The elderly believe that anonymity is the most vital aspect of online privacy, and the aspect that they can have the most control over. Part of this reasoning may be because the elderly are not as aware of the privacy violations that can occur from data collection from search engines, including de-anonymization of online data. As shown in Maass (2011), the elderly are often comfortable discussing and disclosing personal information online in chat rooms and on forums, as long as they can retain anonymity. Screen names are important for this reason, and having to provide a real, full name on Facebook may be one of the reasons then that many elderly feel less comfortable disclosing personal information and posting personal stories and details on Facebook as compared to on other forum websites. Privacy concerns regarding Facebook are a serious worry among the elderly, and are a primary reason many elderly are apprehensive about joining Facebook (Jung et al., 2017; Luders & Brandtzaeg, 2014). Many non-users of social media cite concerns around sharing personal information with the social networking site providers, not having control over personal information that can be viewed by online contacts, and the risks of being hacked and getting viruses (Luders & Brandtzaeg, 2014).
When the elderly are on Facebook, many do not want to post updates about their own lives for their family members to see; the elderly want to participate passively without revealing their own personal affairs online (Jung et al., 2017). Part of the reason for this one-sided attitude towards Facebook stems from privacy concerns, as one user in Jung et al. (2017) describes:
I don’t put stuff on there. I can’t tolerate other people looking at my things. I like privacy. I know there are privacy restrictions and … I don’t like chitchat. I don’t like to put silly little sayings on there like “Heading for a cup of coffee, wish me luck.” (p. 1075)
What is especially concerning, however, is the lack of awareness that many of the elderly have about potential privacy violations from online activities other than social media sites. One person stated that they have chosen to stay away from social media because they wanted to control the dissemination of their personal information online. However, this same person then goes on to say that they frequently surf the web, stream media, use email, shop online, and text every day (Luders & Brandtzaeg, 2014). And, as referenced above, many elderly view the ability to remain anonymous online as the most vital aspect of online privacy. These competing interests show that while many elderly are extremely cautious when it comes to online privacy and social media, they are unaware of other sources of data collection, such as from search engines, email, and online shopping. Online shopping is a considerable concern as the proportion of the elderly who participate in online shopping is increasing (Tao & Shuijing, 2016). Research has verified this point as well, showing that the elderly who go online often do not have enough knowledge to make educated decisions about their privacy (Chai, Bagchi-Sen, & Upadhyaya, 2008).
To summarize this literature so far, the elderly go online to participate in communities, preserve connections with friends and family, search for information, and engage in transactions such as online shopping. The elderly willingly participate in these activities because of a degree of trust that they have that their personal information will either be protected by the online service providers, or protected by the elderly themselves through using screennames to preserve anonymity. However, the lack of knowledge around data collection methods and being unaware of the possibilities of de-anonymization leaves the elderly in a vulnerable position when it comes to potential privacy violations.
Chakraborty, Rao, and Uphadhyahy (2009) state that the elderly are more vulnerable than other age groups because of two primary reasons—firstly, the elderly are more trusting of other people, due to having grown up in a world that was more honest than the world today, and secondly, the elderly are not as knowledgeable about online fraud as other age groups because the elderly spend comparatively less time online (as cited in Maass, 2011, p. 241). Additionally, the elderly exhibit less cautious behavior online regarding their information privacy by being less likely to stop the downloads of suspicious files, and by being more open to visiting websites that appear to pose potential security issues (Chai et al., 2008). Furthermore, the elderly receive far less attention regarding online privacy than other age groups even though the elderly continue to be one of the fastest growing populations of technology adopters (Chai et al., 2008).
The elderly are also less likely to use PETs, or privacy enhancing technologies, for two reasons; the first being that PETs are frequently created without any consideration of the elderly, and so they are often not user friendly for the elderly, and the second being the relatively slow rate of adoption of new technologies by the elderly combined with the high rate of creation of new technologies (Chai et al, 2008). Often by the time the elderly have embraced a PET, the PET is worth far less for protecting privacy as new technologies have been invented that can bypass that PET.
Passwords are another factor that leads to elderly vulnerability (Ahmed et al., 2017). The elderly more commonly have difficulties creating and remembering different passwords than other ages. These difficulties translate into vulnerabilities as the elderly respond by using a single password for all of their accounts, keeping their passwords written down or stored somewhere that is not secure, or using commonly found information such as birthdates or maiden names.
Shuijing and Tao (2017) studied elderly Internet users in China and found evidence of this large trend of the elderly using familiar numbers such as birthdates for passwords or PINs. Much of the concern around this phenomenon stems from the elderly not realizing that this information has frequently been made publicly available to anyone who tries to search it out, and in some cases, the elderly made it public knowledge themselves without realizing it, perhaps through another website or by providing that information to another entity that they wrongly assumed would keep it private and not share that information (Shuijing & Tao, 2017).
A 2002 study by Mouallem examined the basis of online fraud and the elderly. The study described how the elderly have been known to be the most vulnerable age group for telemarketing fraud, for reasons including lack of mobility, feelings of loneliness, isolation, and depression, the greater likelihood of the elderly continuing to listen to the telemarketer rather than hanging up, and the often greater financial resources that the elderly may have (Mouallem, 2002). All these reasons can be applied to internet fraud, which is why the elderly continue to be the most vulnerable age group to violations of their online privacy.
What does not currently exist in the literature is information on the threats of de-anonymization to the online privacy of the elderly. As shown through this extensive literature review on the background of the elderly, technology, and online privacy, concern for the elderly and harms from de-anonymization have a solid basis. Building off this foundation, the next section will examine the arguments for why the elderly especially are susceptible to de-anonymization harms.