Outside the United States
Request For Research Presentations For the PrivacyCon Conference
In the early age of the internet users enjoyed a large level of anonymity. At the time web pages were just hypertext documents; almost no personalisation of the user experience was offered. The Web today has evolved as a world wide distributed system following specific architectural paradigms. On the web now, an enormous quantity of user generated data is shared and consumed by a network of applications and services, reasoning upon users expressed preferences and their social and physical connections. On one hand, advertising networks follow users' browsing habits while they surf the web, continuously collecting their traces and surfing patterns. On the other hand, social applications allow people everyday to know more about themselves, their friends and their surrounding. To use such services, users grant them a certain level of access to their private data. This data include details about their identity, their whereabouts and in some situations even the company they work for. This level of access is obtained leveraging on third parties, most of the times larger social networks or identity providers, like Facebook or Google, which offer login technologies, allowing the application to identify the user and receive precise information about them. Once the user grant access to their data, the application stores it and assumes control over how it is further shared. This ubiquitous streams of data that users create while they use different applications and surf the web can be seen as a network of interconnected data snippets. Information shared on the web can be linked together so that it is possible to construct semantic connections between user's activity data. A possible attacker could therefore try to link data between different source of information to identify and target users both online and offline. Users become more frequently exposed to social engineering attacks that can now leverage on facts gathered online about their personal offline lives. We model the user's activity as series of events belonging to a certain identity. Each event is a document containing different information. We can formally defined this as a hypermedia document i.e. an object possibly containing graphics, audio, video, plain text and hyperlinks. We call the hyperlinks selectors and we use these to build the connections between the user's different identities or events. Each identity is a profile that the user has created onto a service or platform. This can be an application account or a social network account, such as their LinkedIn or Facebook unique IDs. An event is an action performed by the user, like visiting a website or creating a post on a blog. We aggregate keywords each time the user creates a new event by visiting a different url. These keywords constitute the user profile of interests (Figure 1). A tractable model of the user profile as a probability mass function (PMF) is proposed to express how each keyword contributes to expose how many times the user has indirectly expressed a preference toward a specific category. We consider that the user expresses a preference when, for example, they visit a webpage categorised with a certain keywords. This model follows the intuitive assumption that a particular category is weighted according to the number of times this has been counted in the user profile. We present our findings in three different situations, supported by published papers. The first case study analyses privacy enhancing techniques in reccommendation systems. The second case study analyses privacy threats and attacks in location-based social applications. The third case study analyses profiling of users browsing habits. We also consider the importance of providing users and organisations with simple visualisation tools able to show the user their online footprint and allowing them to take action to masquerade their interests profile or simply block certain networks.