Crypto Notes

The difficulty of automatically detecting harassment.

This article is rated as standard to read using the Flesch-Kincaid reading ease scale.

This article should take around 4 minutes 7 seconds to read.

Content warning: Course and misogynic language.

Due to the nature of this post, I will use examples of personal attacks and I’ve hidden below a “Read more” break to prevent the casual observer finding it on the front page.

Forewarned is forearmed.

It sounds an easy task.

Why can’t we automatically detect and remove harassment online?

We can detect negative comments

If you want to play along at home, I’m using Microsoft’s Cognitive Services text analysis for this, other services exist, but this I know this.

We can use techniques known as “Sentiment Analysis” to determine if comments are positive or negative. these scores are ranged from 0% (extremely negative) to 100% (extremely positive) so the phrase “Fuck off you stupid bitch” gets an unsurprising low score of 0.004946023225784301 (aka less than 1% positive), whereas “Well done, That was an amazing speech you gave” gets a score of 0.946958065032959

It may appear that we can use sentiment analysis to determine if something is harassment or not.

However, this only tells us if the language used is positive or not, so, for example, the phrase “Universal credit is a stupid idea” scores 0.10983416438102722 (11%) it isn’t a nice phrase, however, the target isn’t a person or a group of people but a concept and being critical of a concept is in no way the same as attacking a person.

It is possible to programmatically look at the text and determine if it’s about a person or an idea, so assuming you can do that you are left with a two by two matrix of options

Harassment matrix
positive post negative post
About a person 1 -1
About an idea 0 0

Where -1 is an abusive message, 1 is a supportive one and 0 are messages you don’t care about.

So it may appear we have our silver bullet, a way of spotting harassment online, but sadly…

Tone policing

Let’s take the following “discussion”.

Person A: “No women should be in politics, they should just stay at home and wash up”

Person B: “Fuck off, you misogynist”

Who is doing the harassment here?

Well according to the sentiment of the text…

“No women should be in politics, they should just stay at home and wash up” Scores 0.74786901473999023, i.e. 75% positive whereas

“Fuck off, you misogynist” Scores 0.023883551359176636, i.e. 3%

Therefore, clearly we should ban person B!

As anyone who has been on the receiving end of online harassment knows, harrasers have a “lovely” way of making their comments sound like debates, and making the victims appear like unhinged looneys. This is especially true when messages are taken out of context as they would be in most content filtering algorithms.

This is not to say that content filtering does not have a place, it certanly does, but more that the issue is a societal one rather than an IT one. Within the world of computer security, there is a phrase, “there is no patch for human stupidity” it means, while you can release updates for software (patch it) you can’t release a patch to stop people doing something “stupid” (writing their password on a post it note and sticking it to their monitor). The best you can do is make it so difficult for hackers to break in that they go somewhere else. I think the same applies to harassment online, you can force people to stop being hateful, but you can protect the victims from the effects of that hate and make it harder for the haters to hurt their victims.

But a software soloution will only go so far and could easily end up hurting the very people you’re trying to help.