If someone wanted to track a letter without opening it, they could follow the postman’s route to learn more about its sender and recipient. In a similar way, attackers can infer online activity by observing how data packets move across networks, even when the content itself is encrypted. To guard against such web fingerprinting, today’s internet architecture carries various privacy protocols.
For example, a data transfer protocol called Quick UDP Internet Connections (QUIC) enables faster connections between user devices and servers, as well as encrypts their communication, hiding the content of data exchanges from anyone observing the network in between. It supports both the Domain Name System (DNS) that converts web URLs into machine-readable addresses, as well as HTTP/3, which facilitates packet exchange and loading webpage content.
Initial tests suggested such encryption could protect privacy, but many of these either ran on older protocols before QUIC or did not incorporate HTTP/3 traffic. This prompted researchers at the A*STAR Institute for Infocomm Research (A*STAR I2R) to ask whether the data exchange patterns visible to a passive observer still carry enough distinctive characteristics to enable the same website identification possible with older protocols.
Led by A*STAR I2R Senior Principal Scientist Dinil Mon Divakaran and Senior Scientist Levente Csikor, the team collaborated with researchers from the National University of Singapore to simulate web fingerprinting attacks powered by artificial intelligence (AI). They developed an AI-based transformer model that analysed encrypted internet traffic, training it on 500 QUIC-enabled websites while using traces from over 74,000 additional websites to evaluate its performance in realistic browsing scenarios.
Their experiments revealed that when filtering on encrypted DNS traffic alone, the first 200 packets were enough to capture the complete DNS exchange in nearly all website visits. These packets already carried the unique characteristics of DNS requests that help reveal which website was visited. By analysing solely these DNS packets, the transformer model could correctly identify 70 percent of monitored websites at 90 percent precision.
“Modern websites trigger a characteristic sequence of DNS lookups, ranging from the domain itself to analytics and ads. Together, the patterns form a very distinctive web ‘signature’ that still broadcasts its identity, as our experiments have shown,” said Csikor.
When combining DNS-over-QUIC with HTTP/3 web traffic, the model’s performance improved further to approximately 80 percent recall at 90 percent precision. By contrast, an older deep learning approach achieved less than 10 percent recall at the same precision, highlighting how transformers’ ability to weigh relationships across the entire packet sequence allows for spotting patterns that previous models missed.
The findings further show that traditional defences like packet padding, which adds extra data to disguise traffic patterns, are ineffective against modern AI-based attacks trained on the latest QUIC protocols.
“We need defences that scramble the relationships between packets so the AI models can’t tell which one matters, much like adding white noise to a conversation to confuse a speech recogniser,” said Divakaran.
“Ultimately, users must become more privacy-conscious by using privacy-enhancing tools like Tor or VPNs to make any eavesdropping attacks more difficult to execute,” he added.
While websites’ fingerprinting signatures remain difficult to mask, the team has released their dataset and tools publicly to enable the broader community to build on their work and develop stronger defences against emerging privacy threats.
The A*STAR-affiliated researchers contributing to this research are from the A*STAR Institute for Infocomm Research (A*STAR I2R).