SESSION: SESSION: Session 1: Network Privacy
Classification of Encrypted IoT Traffic despite Padding and Shaping
It is well-known that when IoT traffic is unencrypted it is possible to identify the active devices based on their TCP/IP headers. And when traffic is encrypted, packet-sizes and timings can still be used to do so. To defend against such fingerprinting, traffic padding and shaping were introduced. In this paper we show that even with these mitigations, the privacy of IoT consumers can still be violated. The main tool we use in our analysis is the full distribution of packet-size---as opposed to commonly used statistics such as mean and variance. We evaluate the performance of a local adversary, such as a snooping neighbor or a criminal, against 8~different padding methods. We show that our classifiers achieve perfect (100% accuracy) classification using the full packet-size distribution for low-overhead methods, whereas prior works that rely on statistical metadata achieved lower rates even when no padding and shaping were used. We also achieve an excellent classification rate even against high-overhead methods. We further show how an external adversary such as a malicious ISP or a government intelligence agency, who only sees the padded and shaped traffic as it goes through a VPN, can accurately identify the subset of active devices with Recall and Precision of at least 96%. Finally, we also propose a new method of padding we call the Dynamic STP (DSTP) that incurs significantly less per-packet overhead compared to other padding methods we tested and guarantees more privacy to IoT consumers.
Splitting Hairs and Network Traces: Improved Attacks Against Traffic Splitting as a Website Fingerprinting Defense
The widespread use of encryption and anonymization technologies---e.g., HTTPS, VPNs, Tor, and iCloud Private Relay---makes network attackers likely to resort to traffic analysis to learn of client activity. For web traffic, such analysis of encrypted traffic is referred to as Website Fingerprinting (WF). WF attacks have improved greatly in large parts thanks to advancements in Deep Learning (DL). In 2019, a new category of defenses was proposed: traffic splitting, where traffic from the client is split over two or more network paths with the assumption that some paths are unobservable by the attacker.
In this paper, we take a look at three recently proposed defenses based on traffic splitting: HyWF, CoMPS, and TrafficSliver BWR5. We analyze real-world and simulated datasets for all three defenses to better understand their splitting strategies and effectiveness as defenses. Using our improved DL attack Maturesc on real-world datasets, we improve the classification accuracy wrt. state-of-the-art from 49.2% to 66.7% for HyWF, the F1 score from 32.9% to 72.4% for CoMPS, and the accuracy from 8.07% to 53.8% for TrafficSliver BWR5. We find that a majority of wrongly classified traces contain less than a couple hundred of packets/cells: e.g., in every dataset 25% of traces contain less than 155 packets. What cannot be observed cannot be classified. Our results show that the proposed traffic splitting defenses on average provide less protection against WF attacks than simply randomly selecting one path and sending all traffic over that path.
Padding-only Defenses Add Delay in Tor
Website fingerprinting is an attack that uses size and timing characteristics of encrypted downloads to identify targeted websites. Since this can defeat the privacy goals of anonymity networks such as Tor, many algorithms to defend against this attack in Tor have been proposed in the literature. These algorithms typically consist of some combination of the injection of dummy "padding'' packets with the delay of actual packets to disrupt timing patterns. For usability reasons, Tor is intended to provide low latency; as such, many authors focus on padding-only defenses in the belief that they are "zero-delay.'' We demonstrate through Shadow simulations that by increasing queue lengths, padding-only defenses add delay when deployed network-wide, so they should not be considered "zero-delay.'' We further argue that future defenses should also be evaluated using network-wide deployment simulations.
Sauteed Onions: Transparent Associations from Domain Names to Onion Addresses
Onion addresses offer valuable features such as lookup and routing security, self-authenticated connections, and censorship resistance. Therefore, many websites are also available as onionsites in Tor. The way registered domains and onion addresses are associated is however a weak link. We introduce sauteed onions, transparent associations from domain names to onion addresses. Our approach relies on TLS certificates to establish onion associations. It is much like today's onion location which relies on Certificate Authorities (CAs) due to its HTTPS requirement, but has the added benefit of becoming public for everyone to see in Certificate Transparency (CT) logs. We propose and prototype two uses of sauteed onions: certificate-based onion location and search engines that use CT logs as the underlying database. The achieved goals are consistency of available onion associations, which mitigates attacks where users are partitioned depending on which onion addresses they are given, forward censorship-resistance after a TLS site has been configured once, and improved third-party discovery of onion associations, which requires less trust while easily scaling to all onionsites that opt-in.
SESSION: Session 2: Privacy Preserving Protocols
Fisher Information as a Utility Metric for Frequency Estimation under Local Differential Privacy
Local Differential Privacy (LDP) is the de facto standard technique to ensure privacy for users whose data is collected by a data aggregator they do not necessarily trust. This necessarily involves a tradeoff between user privacy and aggregator utility, and an important question is to optimize utility (under a given metric) for a given privacy level. Unfortunately, existing utility metrics are either hard to optimize for, or they only indirectly relate to an aggregator's goal, leading to theoretically optimal protocols that are unsuitable in practice. In this paper, we introduce a new utility metric for when the aggregator tries to estimate the true data's distribution in a finite set. The new metric is based on Fisher information, which expresses the aggregators information gain through the protocol. We show that this metric relates to other utility metrics such as estimator accuracy and mutual information and to the LDP parameter \varepsilon. Furthermore, we show that under this metric, we can approximate the optimal protocols as \varepsilon \rightarrow 0 and \varepsilon \rightarrow \infty, and we show how the optimal protocol can be found for a fixed \varepsilon, although the latter is computationally infeasible for large input spaces.
PRSONA: Private Reputation Supporting Ongoing Network Avatars
As an increasing amount of social activity moves online, online communities have become important outlets for their members to interact and communicate with one another. At times, these communities may identify opportunities where providing their members specific privacy guarantees would promote new opportunities for healthy social interaction and assure members that their participation can be conducted safely. On the other hand, communities also face the threat of bad actors, who may wish to disrupt their activities or bring harm to members. Reputation can help mitigate the threat of such bad actors, and there has been a wide body of work on privacy-preserving reputation systems. However, previous work has overlooked the needs of small, tight-knit communities, failing to provide important privacy guarantees or address shortcomings with common implementations of reputation. This work features a novel design for a privacy-preserving reputation system which provides these privacy guarantees and implements a more appropriate reputation function for this setting. Further, this work implements and benchmarks said system to determine its viability in real-world deployment. This novel construction addresses shortcomings with previous approaches and provides new opportunity to its target audience.
Data Protection Law and Multi-Party Computation: Applications to Information Exchange between Law Enforcement Agencies
Pushes for increased power of Law Enforcement (LE) for data retention and centralized storage result in legal challenges with data protection law and courts-and possible violations of the right to privacy. This is motivated by a desire for better cooperation and exchange between LE Agencies (LEAs), which is difficult due to data protection regulations, was identified as a main factor of major public security failures, and is a frequent criticism of LE. Secure Multi-Party Computation (MPC) is often seen as a technological means to solve privacy conflicts where actors want to exchange and analyze data that needs to be protected due to data protection laws. In this interdisciplinary work, we investigate the problem of private information exchange between LEAs from both a legal and technical angle. We give a legal analysis of secret-sharing based MPC techniques in general and, as a particular application scenario, consider the case of matching LE databases for lawful information exchange between LEAs. We propose a system for lawful information exchange between LEAs using MPC and private set intersection and show its feasibility by giving a legal analysis for data protection and a technical analysis for workload complexity. Towards practicality, we present insights from qualitative feedback gathered within exchanges with a major European LEA.
Secure Maximum Weight Matching Approximation on General Graphs
Privacy-preserving protocols for matchings on general graphs can be used for applications such as online dating, bartering, or kidney donor exchange. In addition, they can act as a building block for more complex protocols. While privacy-preserving protocols for matchings on bipartite graphs are a well-researched topic, the case of general graphs has experienced significantly less attention so far. We address this gap by providing the first privacy-preserving protocol for maximum weight matching on general graphs. To maximize the scalability of our approach, we compute an 1/2-approximation instead of an exact solution. For N nodes, our protocol requires O(N log N) rounds, O(N^3) communication, and runs in only 12.5 minutes for N=400.
SESSION: Session 3: Privacy Policies and Preferences
Is Your Policy Compliant?: A Deep Learning-based Empirical Study of Privacy Policies' Compliance with GDPR
Since the General Data Protection Regulation (GDPR) came into force in May 2018, companies have worked on their data practices to comply with the requirements of GDPR. In particular, since the privacy policy is the essential communication channel for users to understand and control their privacy when using companies' services, many companies updated their privacy policies after GDPR was enforced. However, most privacy policies are verbose, full of jargon, and vaguely describe companies' data practices and users' rights. In addition, our study shows that more than 32% of end users find it difficult to understand the privacy policies explaining GDPR requirements. Therefore, it is challenging for the end users and law enforcement authorities to manually check if companies' privacy policies comply with the requirements enforced by GDPR. In this paper, we create a privacy policy dataset of 1,080 websites annotated by experts with 18 GDPR requirements and develop a Convolutional Neural Network (CNN) based model that can classify the privacy policies into GDPR requirements with an accuracy of 89.2%. We apply our model to automatically measure GDPR compliance in the privacy policies of 9,761 most visited websites. Our results show that, even after four years since GDPR went into effect, 68% of websites still fail to comply with at least one requirement of GDPR.
Darwin's Theory of Censorship: Analysing the Evolution of Censored Topics with Dynamic Topic Models
We present a statistical analysis of changes in the Internet censorship policy of the government of India from 2016 to 2020. Using longitudinal observations of censorship collected by the ICLab censorship measurement project, together with historical records of web page contents collected by the Internet Archive, we find that machine classification techniques can detect censors' reactions to events without prior knowledge of what those events are. However, gaps in ICLab's observations can cause the classifier to fail to detect censored topics, and gaps in the Internet Archive's records can cause it to misidentify them.
A Study of Users' Privacy Preferences for Data Sharing on Symptoms-Tracking/Health App
Symptoms-tracking applications allow crowdsensing of health and location related data from individuals to track the spread and outbreaks of infectious diseases. During the COVID-19 pandemic, for the first time in history, these apps were widely adopted across the world to combat the pandemic. However, due to the sensitive nature of the data collected by these apps, serious privacy concerns were raised and apps were critiqued for their insufficient privacy safeguards. The Covid Nearby project was launched to develop a privacy-focused symptoms-tracking app and to understand the privacy preferences of users in health emergencies.
In this work, we draw on the insights from the Covid Nearby users' data, and present an analysis of the significantly varying trends in users' privacy preferences with respect to demographics, attitude towards information sharing, and health concerns, e.g. after being possibly exposed to COVID-19. These results and insights can inform health informatics researchers and policy designers in developing more socially acceptable health apps in the future
SESSION: Session 4: Machine Learning and Privacy
UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks against Split Learning
Training deep neural networks often forces users to work in a distributed or outsourced setting, accompanied with privacy concerns. Split learning aims to address this concern by distributing the model among a client and a server. The scheme supposedly provides privacy, since the server cannot see the clients' models and inputs. We show that this is not true via two novel attacks. (1) We show that an honest-but-curious split learning server, equipped only with the knowledge of the client neural network architecture, can recover the input samples and obtain a functionally similar model to the client model, without being detected. (2) We show that if the client keeps hidden only the output layer of the model to ''protect'' the private labels, the honest-but-curious server can infer the labels with perfect accuracy. We test our attacks using various benchmark datasets and against proposed privacy-enhancing extensions to split learning. Our results show that plaintext split learning can pose serious risks, ranging from data (input) privacy to intellectual property (model parameters), and provide no more than a false sense of security.
SplitGuard: Detecting and Mitigating Training-Hijacking Attacks in Split Learning
Distributed deep learning frameworks such as split learning provide great benefits with regards to the computational cost of training deep neural networks and the privacy-aware utilization of the collective data of a group of data-holders. Split learning, in particular, achieves this goal by dividing a neural network between a client and a server so that the client computes the initial set of layers, and the server computes the rest. However, this method introduces a unique attack vector for a malicious server attempting to steal the client's private data: the server can direct the client model towards learning any task of its choice, e.g. towards outputting easily invertible values. With a concrete example already proposed (Pasquini et al., CCS '21), such training-hijacking attacks present a significant risk for the data privacy of split learning clients. In this paper, we propose SplitGuard, a method by which a split learning client can detect whether it is being targeted by a training-hijacking attack or not. We experimentally evaluate our method's effectiveness, compare it with potential alternatives, and discuss in detail various points related to its use. We conclude that SplitGuard can effectively detect training-hijacking attacks while minimizing the amount of information recovered by the adversaries.
Adversarial Detection of Censorship Measurements
The arms race between Internet freedom technologists and censoring regimes has catalyzed the deployment of more sophisticated censoring techniques and directed significant research emphasis toward the development of automated tools for censorship measurement and evasion. We highlight Geneva as one of the recent advances in this area. By training a genetic algorithm such as Geneva inside a censored region, we can automatically find novel packet-manipulation-based censorship evasion strategies. In this paper, we explore the resilience of Geneva in the face of censors that actively detect and react to Geneva's measurements. Specifically, we develop machine learning (ML)-based classifiers and leverage a popular hypothesis-testing algorithm that can be deployed at the censor to detect Geneva clients within two to seven flows, i.e., far before Geneva finds any working evasion strategy. We further use public packet-capture traces to show that Geneva flows can be easily distinguished from normal flows and other malicious flows (e.g., network forensics, malware). Finally, we discuss some potential research directions to mitigate Geneva's detection.
SESSION: Session 5: Privacy in Mobile Systems
Fingerprinting and Personal Information Leakage from Touchscreen Interactions
The study aims to understand and quantify the privacy threat landscape of touch-based biometrics. Touch interactions from mobile devices are ubiquitous and do not require additional permissions to collect. Two privacy threats were examined - user tracking and personal information leakage. First, we designed a practical fingerprinting simulation experiment and executed it on a large publicly available touch interactions dataset. We found that touch-based strokes can be used to fingerprint users with high accuracy and performance can be further increased by adding only a single extra feature. The system can distinguish between new and returning users with up to 75% accuracy and match a new session to the user it originated from with up to 74% accuracy. In the second part of the study, we investigated the possibility of predicting personal information attributes through the use of touch interaction behavior. The attributes we investigated were age, gender, dominant hand, country of origin, height, and weight. We found that our model can predict the age group and gender of users with up to 66% and 62% accuracy respectively. Finally, we discuss countermeasures, limitations and provide suggestions for future work in the field.
Privacy and Security Evaluation of Mobile Payment Applications Through User-Generated Reviews
Mobile payment applications are crucial to ensure seamless day-to-day digital transactions. However, users' perceived privacy- and security-related concerns are continually rising. Users express such thoughts, complaints, and suggestions through app reviews. To this aim, we collected 1,886,352 reviews from the top 50 mobile payment applications. Furthermore, we conducted a mixed-methods in-depth evaluation of the privacy- and security-related reviews resulting in a total of 163,210 reviews. Finally, we implemented sentiment analysis and did a mixed-methods analysis of the resulting 52,749 negative reviews. Such large-scale evaluation through user reviews informs developers about the user perception of digital threats and app behaviors. Our analysis highlights that users share concerns about sharing sensitive information with the application, confidentiality of their data, and permissions requested by the apps. Users have shown significant concerns regarding the usability of these applications (48.47%), getting locked out of their accounts (38.73%), and being unable to perform successful digital transactions (31.52%). We conclude by providing actionable recommendations to address such user concerns to aid the development of secure and privacy-preserving mobile payment applications.
Casing the Vault: Security Analysis of Vault Applications
Vault applications are a class of mobile apps used to store and hide users' sensitive files (e.g., photos, documents, and even another app) on the phone. In this paper, we perform an empirical analysis of popular vault apps under the scenarios of unjust search and filtration of civilians by authorities (e.g., during civil unrest). By limiting the technical capability of adversaries, we explore the feasibility of inferring the presence of vault apps and uncovering the hidden files without employing sophisticated forensics analysis. Our analysis of 20 popular vault apps shows that most of them do not adequately implement/configure their disguises, which can reveal their existence without technical analysis. In addition, adversaries with rudimentary-level knowledge of the Android system can already uncover the files stored in most of the vault apps. Our results indicate the need for more secure designs for vault apps.
SESSION: Session 6: Web Privacy
Tracking the Evolution of Cookie-based Tracking on Facebook
We analyze in depth and longitudinally how Facebook's cookie-based tracking behavior and its communication about tracking have evolved from 2015 to 2022. More stringent (enforcement of) regulation appears to have been effective at causing a reduction in identifier cookies for non-users and a more prominent cookie banner. However, several technical measures to reduce Facebook's tracking potential are not implemented, communication through the cookie banner and cookie policies remains incomplete and may be deceptive, and opt-out mechanisms seem to have no effect.
All Eyes On Me: Inside Third Party Trackers' Exfiltration of PHI from Healthcare Providers' Online Systems
In the United States, sensitive health information is protected under the Health Insurance Portability and Accountability Act (HIPAA). This act limits the disclosure of Protected Health Information (PHI) without the patient's consent or knowledge. However, as medical care becomes web-integrated, many providers have chosen to use third-party web trackers for measurement and marketing purposes. This presents a security concern: third-party JavaScript requested by an online healthcare system can read the website's contents, and ensuring PHI is not unintentionally or maliciously leaked becomes difficult. In this paper, we investigate health information breaches in online medical records, focusing on 459 online patient portals and 4 telehealth websites. We find 14% of patient portals include Google Analytics, which reveals (at a minimum) the fact that the user visited the health provider website, while 5 portals and 4 telehealth websites contained JavaScript-based services disclosing PHI, including medications and lab results, to third parties. The most significant PHI breaches were on behalf of Google and Facebook trackers. In the latter case, an estimated 4.5 million site visitors per month were potentially exposed to leaks of personal information (names, phone numbers) and medical information (test results, medications). We notified healthcare providers of the PHI breaches and found only 15.7% took action to correct leaks. Healthcare operators lacked the technical expertise to identify PHI breaches caused by third-party trackers. After notifying Epic, a healthcare portal vendor, of the PHI leaks, we received a prompt response and observed extensive mitigation across providers, suggesting vendor notification is an effective intervention against PHI disclosures.
Your Consent Is Worth 75 Euros A Year - Measurement and Lawfulness of Cookie Paywalls
Most websites offer their content for free, though this gratuity often comes with a counterpart: personal data is collected to finance these websites by resorting, mostly, to tracking and thus targeted advertising. Cookie walls and paywalls, used to retrieve consent, recently generated interest from EU DPAs and seemed to have grown in popularity. However, they have been overlooked by scholars. We present in this paper 1) the results of an exploratory study conducted on 2800 Central European websites to measure the presence and practices of cookie paywalls, and 2) a framing of their lawfulness amidst the variety of legal decisions and guidelines.