Getting Pwned. PHOTO: Troy Hunt.

Why Companies Are Hoarding Your Personal Data

Have I Been Pwned? founder Troy Hunt explains

David Braue

Melbourne, Australia – Apr. 13, 2021

Troy Hunt is pretty sure cybercriminals haven’t yet breached Have I Been Pwned?, the searchable database of stolen email credentials that has grown from a 2013 passion project into a worldwide service with over 11.1 billion credentials aggregated from previous, publicly available breaches.

The fact that all of its data is freely available elsewhere means there is little to be gained from targeting the site apart from the bragging rights of being the one to pwn Have I Been Pwned?, Hunt — a security researcher and Microsoft regional director and MVP based in Brisbane, Australia — told Cybercrime Magazine.

“There’s really nothing of value,” he said. “The worst that can happen is that the existing email addresses, against existing data breaches that are already in circulation, would be accessible to someone.”

The site stores no personal information apart from the email addresses of those on the site’s mailing list — whose loss would be “absolute worst-case scenario” but would represent just a fraction of the personal information leaked in many other breaches.

This paucity of information is by design, Hunt explains, calling out the practices of many websites that still collect all kinds of personal information that is irrelevant to their functioning.

One site for cat lovers, for example, collects members’ date of birth as part of the registration process. Some observers have suggested that this is to ensure compliance with the United States’ COPPA rules — which requires sites to ascertain whether an online user is 13 years old or older — but he says the same purpose could have been achieved with a simple Boolean record.

“I do a lot of training for organizations around security and privacy,” he explains, “and one of the really simple, fundamental tenets is that you can’t lose what you don’t have. It is ridiculous that this site requests data of birth, because they are now sitting on a piece of data that’s often used as knowledge-based authentication as well.”

The implications of such issues emerge in unexpected ways: when cleaning up the more than 509m records published in the recent Facebook data breach, for example, Hunt found that “I didn’t have a paradigm for phone numbers before — so there was a bit of last-minute coding going on during the week, but no big deal.”

He had considered adding phone numbers in the past, but data-quality and -consistency issues made it a largely futile exercise until the latest breach, whose “(mostly) well-formatted files… were all normalized into a nice consistent format with a country code.”



“This data set completely turned all my reasons for not doing this on its head,” Hunt said.

The adding of phone number-based searching to the site — which helped many people find breached records that didn’t include an email address — proved easy to accommodate technically given a serverless cloud-based design that helped the site accommodate “near-unprecedented traffic” after the Facebook breach was announced.

“I am not at all against the premise of Facebook,” Hunt pointed out, “and I use it every single day myself. But we’ve got to be a little more sensible about understanding what it is and how much we want to share, and what are the potential consequences of sharing information?”

“Maybe the problem here is that there’s just not awareness of expectations that align to the reality of a large social media platform.”

Adjusting expectations

Around half of the email addresses entered into Have I Been Pwned? turn out to have been compromised, Hunt said, meaning that the credentials of many millions of users still have not been compromised — yet.

But as companies and users continue to walk the line between data collection and privacy, all indications are that current habits will continue to provide rich pickings for cybercriminals who continue to plunder personal data from websites large and small.

“Organizations view other people’s data as an asset,” he said, “and they really rarely view it as a liability. And I’d really like organizations to think more about the liability that they have by holding this data and the responsibility that’s been inferred onto them.”

That means considering questions like “How are we going to use this data?”, “How long do we need to keep it for?”, and “If a user hasn’t posted in, say, three years, do we really still need to have that data?”

“If you never had data that was more than three years old for non-active users,” Hunt said, “you would massively reduce your footprint of risk.”

“And that, to me, just seems like a very, very simple decision to make. But because organizations believe data is money and data is control and power — and we want to hold onto as much as we can – we end up in a situation like this.”

As the breaches continue and history repeats itself over and over again, Have I Been Pwned will continue building out its database.

Hunt has also been talking with a number of commercial partners, who are increasingly looking for ways to leverage credential-aggregation sites for seamless checking of compromised credentials from their own threat-intelligence and other security tools.

“There’s a handful of companies that reach out and want to do good things,” Hunt said, “and there’s always a bit of discussion after that. There have been some interesting ideas pop up in the wake of this, and I hope to be able to talk about some of those a bit more, soon.”

Ultimately, however users tap its insights, Have I Been Pwned? is both a worldwide security resource and a precautionary tale for users that continue to show remarkable complacency in protecting their online information.

“This is a shared responsibility,” Hunt said. “Everyone gets to make their own decisions about how they want to manage their security hygiene — like if you use the same password or not.”

“We all make that decision in terms of the way we present ourselves, so you need to make the choice. You’ve got to work on the assumption that bad people are going to do bad things, and that we’ve got to try and do whatever we can to stop the bad people. But you’ve also got to take responsibility.”

– David Braue is an award-winning technology writer based in Melbourne, Australia.

Go here to read all of David’s Cybercrime Magazine articles.