Apparently Twitter has not a singular problem, but six of them, if Tadhg Kelly is to be believed. One of the six problems is harassment, or, if you’re of Kelly’s particular ideological ilk, unorthodox thought. In making his case, he namedrops the now infamous Randi Harper, serial Twitter abuser and “developer” of the ggautoblocker. Kelly’s implication is ggautoblocker is a tool for preventing harassment.
We, of course, know better.
I talked about this before, but ggautoblocker is not an anti-harassment tool. What it is designed to do is issue a passive-aggressive threat to everyone active on Twitter that, should they choose to follow people who commit crimethink, their Twitter handle could be given to professional organizations and prospective employers with the implication the Twitter user in question is a harasser.
The threat of guilt by association blacklisting is overt, per both the ggautoblocker github,and the ggautoblocker page of the “developer” herself. That might be part of the reason that the false positive rate of ggautoblocker block list members as “harassers” is 99.35% based on data from WAM in November 2014. It could also be why ggautoblocker couldn’t prevent Tauriq Moosa from getting “harassed” off of Twitter, either.
Long story short, ggautoblocker fails all its stated requirements as an anti-harassment tool. But what does it do, exactly. For that, we’re going to dig into the code.
To start I’m going to ignore all the unprofessional names of variables, arrays, and so on, and focus on methodology. Using insulting names in variables and arrays gets you fired from software development organizations in the real world, so it should get talked about, but this commentary is not the place to do it. Further, any time someone defending ggautoblocker claims it wasn’t intended to be a blacklist, remind them the name of sourcelist.txt was blacklist.txt. Then, watch them squirm, hedge, and weasel around that fact.
The methodology of ggautoblocker is to take a base list of names contained in the text file sourcelist.txt consisting of the following:
- Nero (Breitbart Commentator Milo Yiannopoulos)
- PlayDangerously (Author Mike Cernovich)
- Roguestargamez (Indie Game Developer Slade Villena)
- TheRalphRetort (The Ralph Retort owner Ethan Ralph)
Each of the names in this list is sent through a subroutine called get_followers, which does exactly as it says on the tin. So all the followers for the names in sourcelist.txt are assembled. If a person shows up as a follower at least 2 of the names in sourcelist.txt, the names are added to one of two files, block_names.txt or shared_names.txt.
Notice ggautoblocker’s colossal design flaw. It does not block users based on the content of their tweets; rather, it only blocks based on association. The corollary to this is if you’re of conservative or libertarian political leanings who is routinely harassed by the army of perpetually offended authoritarian progressives on Twitter, this tool has a 100% chance of failing as written.
Defenders of autoblocker are going to say the user can always change the list of names in source_list.txt file and run the tool to create their own. Strictly speaking, that’s true; however, association shouldn’t be the basis by which Twitter users block people or attempt to prevent their own harassment. Further, the configuration files alienate everyone not of the authoritarian progressive slant. The message sent by autoblocker and its “developer” is simple: “If you don’t believe in anti-progress authoritarian progressive ideologies, you do not deserve relief from online harassment.”
So let’s fix it.
The Anatomy of a Tweet
Here’s an excerpt of a tweet from my Twitter timeline from a few days ago:
I got this excerpt by directing a browser to my timeline, and doing a “Save page as,” then selecting “Webpage, HTML Only” as the file type. If someone wanted to read their Twitter feed in native html format, they can then open the file in a text editor, like Notepad.
I picked this particular excerpt for a couple of reasons. First, this excerpt contains all the data we need for assembling a proper block list of Twitter users based exclusively on the content of their tweets. Second, I chose this excerpt specifically so we can talk about performance a little later on. Looking at Twitter timelines in their native format creates some huge files—my small timeline snapshot was 686KB, and I’m a bit of a Twitter hermit—that we’ll want to extract the content sections of tweets in a user’s timeline to a separate text file to increase performance.
Here’s how our block list tool is going to work. First, the script takes a snapshot text file of a user’s Twitter timeline as an input and extract parts of the feed similar to the excerpt above. Then, the script compares entries in the “p” tag against a list of user provided words and phrases that the user deems “harassing.” The words and phrases can be anything. The script will take the user supplied list of harassing words and phrases and compare the list to the text file excerpts
For any positive check in the “p” tag, the script gets the screen name of the person who sent the tweet. Once the script gets through all the “p” tags extracted from the original Twitter feed text file, the script provides a list of potential blocked Twitter screen names to the user for approval. The script gives the user the opportunity to remove flagged users from the potential block list or approve the block list wholesale.
Finally, the script operates on the list of screen names and executes the blocks, cleans itself up and exits.
Voila! We’ve designed a script to create Twitter blocks exclusively on the content of what was tweeted, instead of creating a blacklist based on association with a user’s ideological enemies.
Obviously, this is not the only solution, nor the best solution. It’s a solution, based a fair bit off of the way ggautoblocker was designed originally—in the software biz, that’s called “reuse”—with an improved methodology for creating the blocklist based on content.
Tadhg Was Right, Kinda
The fact that Twitter’s customers have to design third party scripts to do mass filtering and/or blocking highlights a major hole in functionality of the service. On that, Kelly and I are in total agreement. That said, if Twitter’s customers are going to have to develop their own tools to plug the functionality holes, then the tools that are created ought to actually do what their developers say they do. Ggautoblocker is not that tool, no matter how much name dropping writers like Tadhg Kelly do.
Raptors: Time to weigh in. If you were designing a tool to prevent harassment on Twitter, how would you do it?