I’ve investigated hundreds of data breaches over the years (there are 514 of them in Have I Been Pwned as I write this), and for the most part, the situation with Gab is just another day on the internet. But Gab is also different, having grown dramatically in recent months as an alternative to mainstream incumbent platforms such as Twitter and Facebook and drawing a crowd primarily focused on right wing American politics.
A couple of days ago, I posted a thread about their alleged breach. I want to go back through that thread here, explain the thinking further and then provide some commentary on the actual data that was exposed. It all began here:
So, the @getongab data breach situation: Let’s start the bizarreness with their CEO’s ridiculous statement tweeted yesterday: https://t.co/NyKmEPI0I8
— Troy Hunt (@troyhunt) March 2, 2021
Much of the problem with objectively discussing this breach is that it’s impossible to escape the transphobic slurs and religious rhetoric being dished out from the guy at the top. I don’t care which god (or demon) you’ve picked, nor what gender you were born with (or if you decided to change it at some time), nor do I care whose politics you like and whose you don’t, I only care about the data. More specifically, I care about the data that’s been exposed in the breach, especially when that data may include my own (I’m very serious).
This came a couple of days after their post about an “alleged data breach” which is full of pretty bizarre statements: https://t.co/qmSIkdKg4l
— Troy Hunt (@troyhunt) March 2, 2021
It’s pretty standard practice for an organisation to post a public statement following a breach or even, as the opening sentence of that page suggest, an “alleged” breach. Most organisation begin with “we take the security of your data seriously”, layer on lawyer speak, talk about credit cards not being exposed and then promise to provide further updates as they come to hand. Gab’s approach… differs:
For example, because they couldn’t find any public discussion about the breach they assumed that @WIRED reporters were “essentially assisting the hacker in his efforts to smear our business”. There are *always* discussions held in private about a breach before it’s made public.
— Troy Hunt (@troyhunt) March 2, 2021
Because Gab “searched high and low for chatter on the breach on the Internet and found nothing”, they’ve drawn the conclusion that reporters are maliciously working with hackers. I’ve had dozens of occasions where I’ve known about a breach, there’s been no public discussion on it, and I’ve worked with reporters to help get to the bottom of what’s happened. This is normal. It’s so normal that the last time I did this was earlier this week with Lawrence Abrams from Bleeping Computer on the Dutch Ticketcounter breach.
“It is standard practice for passwords to be hashed. If the alleged breach has taken place as described, your passwords have not been revealed.” This is misleading and ignores the simplicity of hash cracking. If your password is “maga2020!” (or similar), it has been revealed.
— Troy Hunt (@troyhunt) March 2, 2021
If you’re not familiar with hashing, how it’s not the same as encryption and how it can still leave passwords vulnerable, read this primer from September first. As it relates to passwords being revealed, you can’t “unhash” a hash in the same way as you can decrypt an encrypted piece of text, however, you can always guess passwords, hash them with the same algorithm (and salt if present) and see if the output matches what was stored. For example, when I wrote about the Dropbox hack in 2016, I was able to verify my own record simply by hashing the password I had stored in 1Password and comparing the output to the one in the breach. It matched, therefore verifying the legitimacy of the breach. The following year I showed how even though CloudPets had chosen the very robust bcrypt algorithm for password storage, I was still able to crack a bunch of them courtesy of their extremely weak password rules:
“It is entirely possible for a user of the site to be unidentifiable based on the information they provide at login.” You login with your email address. This (almost always) identifies you, it’s literally how people communicate with *you*! pic.twitter.com/86klDc37nF
— Troy Hunt (@troyhunt) March 2, 2021
I do actually agree with the quoted sentence insofar as someone could create an email address completely disassociated with them, register for Gab and then login with that account. But that almost never happens because Gab is used by normal humans just wanting to interact with other normal humans and it’s not a platform where people are likely to take extra precautions to conceal their true identity. When faced with a registration form that requests an email address, the vast majority of people will simply provide the same email address they use everywhere else, hence my “almost always” comment.
“In our subscriber records we do not collect health or financial information; we do not collect dates of birth; we do not collect [blah blah].” When you’ve just had your neo-Nazi hate speech associated to your email address leaked, DoB is the least of your worries!
— Troy Hunt (@troyhunt) March 2, 2021
This isn’t an unusual response to a data breach; many companies try to downplay the significance in order to reduce the perceived impact of it. I wrote about this in 2015, specifically as it relates to organisations focusing on the security of credit cards which are one of the most easily replaceable and low-impact classes of data to have exposed. All of the classes of data Gab mentions pale in comparison to the impact of having extremist messaging exposed in connection to a personally identifiable data attribute such as someone’s email address. And regardless of your political persuasion, it’s clear that a platform designed to have a bare minimum of controls on content (although they do define content standards) is going to attract and retain more extreme views; that’s part of the attraction for many people.
“Every major tech company – from Facebook to Twitter – has been the target of multiple and continued data breaches.” AFAIK, neither of these companies have ever had their entire DB dumped in the style @getongab appears to have, nor would that be an excuse if they had.
— Troy Hunt (@troyhunt) March 2, 2021
This is also fairly common to see in a post-breach announcement, either in generic terms (“as you know, data breaches are very common”) or in Gab’s case, directly pointing the finger at competing services. The comment is intended to normalise the data breach and downplay its significance, the exact opposite of what we want to encourage in this industry. A few years ago I wrote about how to construct a breach disclosure notice and paid particular attention to how well the Red Cross Blood Service handled theirs. It’s little things like apologising; rather than downplaying the incident and directing attention elsewhere, we need to see organisations standing up, copping it on the chin and acknowledging their faults.
Then there’s the @WIRED piece from @a_greenberg, a top-notch journo I’ve got a lot of respect for based on previous pieces he’s written and many discussions I’ve had with him personally: https://t.co/JHp0nMNE0a
— Troy Hunt (@troyhunt) March 2, 2021
The WIRED piece is well worth a read and sheds more light on the events leading up to the breach. I’ve always found Andy Greenberg to be not just a very switched on infosec journalist, but also a genuinely nice guy I’ve enjoyed speaking with in the past. I can’t imagine Andy being anything but professional in his interactions with Gab and it was only whilst writing this very paragraph that I saw a tweet which might explain why he was treated with such disdain – he may have picked the wrong religion:
As per my policy of not communicating with non-Christian and/or communist journos, I will not be replying to this non-story.
It’s not a real email address, therefore it is not checked. It’s just a placeholder email I used when creating the account almost five years ago. pic.twitter.com/BK1bCldFNe
— Gab.com (@getongab) March 3, 2021
As much as I didn’t want this post to touch on religion, it’s hard to ignore a comment like that which literally excludes the vast majority of the earth’s population (and I’m guessing a fair chunk of Christians would be appalled by this statement as well).
DDoSecrets has a @getongab page saying: “Due to these concerns, along with presence of passwords and other PII, this dataset is currently only being offered to journalists and researchers.” I’d love to get this into @haveibeenpwned, if anyone knows anyone there, ping them for me.
— Troy Hunt (@troyhunt) March 2, 2021
Following this tweet, I did indeed get in touch with someone and obtain a copy of the data. But before I delve into that, there’s just one more tweet in that thread I want to embed:
Weak, pathetic, and emasculated men like you are why the West is failing.
— Gab.com (@getongab) March 2, 2021
I’m amused by this, more than anything. For the most part I thought my analysis was pretty objective and Gab (whose account seems to simply be the mouthpiece of their CEO, Andrew Torba) hasn’t really made it clear which bit they disagreed with, so let’s solider on and objectively look at the data just like with any other data breach.
In a 2.99GB file called accounts.sql, there are just over 4M rows of data largely consisting of user records. Because I myself have a Gab account which I created when started making commentary on them and Parler in Jan, naturally the first thing I did was to pull out my own record:
Looking into the (alleged) @getongab data breach, many records don’t have an email address or a password hash (mine has the former, but not the latter). But for verification, don’t those dates and times look… similar. Coincidence? Or real breach? (Aus time in @1Password) pic.twitter.com/13ihm27lsV
— Troy Hunt (@troyhunt) March 3, 2021
Per the tweet, there’s no hash against my record so I can’t verify the password matched the long random one I created in 1Password, but it’s obviously pretty clear the data is legit based on the alignment of the dates. In total, the file has 43,015 unique email addresses (including mine) which is a far cry less than the total row count. Why? At a guess it would come down to how the data was dumped. There are actually bcrypt password hashes against many records, but they also only represent a subset of the total with 7,097 of them in all. Having access to these hashes gives us an opportunity to debunk Gab’s earlier claim that “your passwords have not been revealed”, an exercise that’s made particularly easy due to their password criteria which can be seen on the registration page:
Requiring 8 characters isn’t unusual (it’s possibly even on the high side), but that’s the only criteria. What that means is that it’s easy to take a list of the most common passwords of 8 characters or more, pass them into hashcat and bingo, “your password has been revealed”:
Yes, apparently Gab will let you have a password that is literally “password”.
Andy mentioned the presence of a chatlog.txt file in his story and the data is pretty limited here at only 9.53MB in size. The content ranges from an extensive amount of religious scripture to very intimate messaging between 2 members to someone sharing a radio show which they close with “We hope you enjoy the show and share it with white families”. To be clear, these are intended to be private messages and not something Gab should be responsible for moderating (for obviously privacy reasons), but they do give an insight into the interests of their members. It also speaks to my earlier point about this breach being significant as it ties identities to their messaging. Some of the private messages are by most standards, recalcitrant, and they sit alongside the Gab username which then exists in the accounts.sql file which then points to their online profile and may also include their email address. Plus, there are multiple messages in which people have shared their personal phone number, often to take the conversation onto WhatsApp. You can immediately see the risk to individuals.
The groups.sql file Andy also mentioned is much more benign. It’s 31.8MB worth of Gab group information spread over nearly 32k lines. I suspect there’s little risk posed by the exposure of the data other than that it simplifies the exercise of analysing the nature of the groups people have created. One thing that seeing this file helped me understand is that as much as Gab has gained notoriety for housing certain types of content, there’s a heap of run of the mill stuff that’d barely raise an eyebrow. For example, there’s the German Shepherds group, the brewing group or even the Dads of Gab group which is all about “A group by fathers, for fathers. Topics should be about how to properly parent your sons…”, ok, good, this is sounding good “…and how to police your wives”. Aw crap. I honestly tried to focus on the positive but it’s very hard to go far without running into content which, well, let’s just say “doesn’t sit well with most people”. Micah Lee from The Intercept did a quick analysis on the largest groups:
Of the top 20 biggest groups (sorted by most members) on Gab:
5 are devoted to Trump/election misinformation
5 are devoted to QAnon pic.twitter.com/L0sLHJtEAh— Micah (@micahflee) March 2, 2021
And then there’s the big file – statuses.sql with 62.4GB of data in it. This appears to be precisely what the file name suggests – statuses posted to Gab. For example, the first row appears as follows:
105295113355799222 3146 {"id": "105295113355799222", "url": "https://gab.com/mwill/posts/105295113355799222", "card": null, "poll": null, "tags": [], "group": null, "quote": null, "emojis": [], "reblog": null, "content": "@TImW381 There's a reason I blocked you.", "language": "en", "mentions": [{"id": "979864", "url": "https://gab.com/TImW381", "acct": "TImW381", "username": "TImW381"}], "pinnable": false, "has_quote": false, "reblogged": false, "sensitive": false, "created_at": "2020-11-29T18:52:04.042Z", "expires_at": null, "favourited": false, "revised_at": null, "visibility": "public", "quote_of_id": null, "rich_content": "", "spoiler_text": "", "reblogs_count": 0, "replies_count": 0, "in_reply_to_id": "105294344326089419", "plain_markdown": null, "favourites_count": 0, "media_attachments": [], "pinnable_by_group": false, "bookmark_collection_id": null, "in_reply_to_account_id": "979864"} 2020-11-29 13:52:04.042 \N