WELL WELL, would ya look at that, another machine-learning algorithm from Google that contains racial bias; who'd have thunked' it.
Research conducted by the University of Washington, Carnegie Mellon University, and Allen Institute for Artificial Intelligence came to the conclusion that Google's Perspective speech recognition tool, which is a product of its Jigsaw division used to detect abuse in social media, has tagged slang used by black Americans as toxic rather than recognising certain phrases as part of their vernacular.
The clever folks applied the Perspective tool to a pair of databases that are widely used in hate speech detection, whereby it would assign a "toxidity score" on social media posts tagged as abusive and offensive to deem just how nasty a certain tweet is, for example.
Surprise, surprise, it was found that the tool seems to spit out a higher toxicity score for black American speech patterns.
The problem here appears to be two-fold. Firstly, there's a baked-in bias in the databases used to train such smart algorithms as words and phrases that are deemed offensive are tagged as so without taking in the consideration of who said them and why.
For example, the use of the word 'queer' can be offensive if used by a heterosexual person as a derogative, yet used by a homosexual person it can be a positive term; the same can apply to the N-word, whereby is reprehensibly offensive when used as a derogatory term by a white person but more a form of slang when used by a black person.
The researchers then tried the tool against their own annotations for tweets and found the same bias with Perspective. It was only when they added gave annotators some priming knowledge of whether the tweet was written by a black person or using language common in black English vernacular, did the rate of bias reduce.
The results of the research shouldn't be seen a means to brand the perspective tool or the annotators of the databases it's trained upon as racist. Rather it shows that there needs to be extra effort into ensuring bias doesn't get subconsciously sipped into such systems and data sets.
"We find strong evidence that extra attention should be paid to the confounding effects of dialect so as to avoid unintended racial biases in hate speech detection," the researchers said.
That might be easier said than done, as detecting unconscious bias is a tricky thing. But having more diverse data sets and people to assess them, as well as developers from diverse background could go some-way to squashing bias out of such systems. µ
What can a hacker hack if a hacker hacks hackers...
But we doubt people will be lining up to buy it
'Prolific' duo netted more than $100m in spree
But its library is lacking here in Blighty