NSA's Machine Learning Algorithm May Be Killing Innocent People

Published: February 16, 2016 7:40 PM /

Machine learning algorithms are incredibly useful. Generating search engine results and assisting in scientific research are among the numerous applications of such algorithms. Despite their utility, perhaps the existing algorithms aren't good enough to decide who is a terrorist deserving of death and who is an innocent civilian. That's the point made by Patrick Ball, a data scientist and executive director of Human Rights Data Analysis Group, when he talked with Ars Technica.

Last year, documents published by The Intercept revealed a machine learning program by the NSA known as SKYNET. SKYNET spies on the mobile phone networks in Pakistan and observes social media, travel patterns and other aspects of an individual's life to determine if they are terrorists or not. Out of 192 million people in the country, the documents suggests that 55 million were analyzed by SKYNET. The documents are from 2011 and 2012, but references in them suggest the program goes as far back as 2007, meaning the algorithms were at least under development by that time if not in active use. While it is impossible to know for sure if anyone has actually been killed by a drone strike because they were flagged as a terrorist by this algorithm, at the very least it is cause for concern to label individuals as terrorists based on this model.

Ball has heavily criticized SKYNET, telling Ars Technica that it is "completely bullshit." Ball draws particular attention to the way the NSA trains and evaluates SKYNET. The NSA uses a group of 100,000 randomly selected individuals and 7 known terrorists. 6 terrorists are used to train the system, and it is determined to be successful if it can pick out the last terrorist from the random individuals. This method is deficient both due to the small number of terrorists used as training data, as well as the fact that they are not randomly selected. The result is the possibility of falsely identifying innocents as terrorists, but also for overlooking genuine terrorists who differ statistically from the training set.

The NSA itself calculates the false positive rate at 0.18%. In another application, a failure rate that low would be outstanding, but in this case its still a cause for concern. Being labeled a terrorist could potentially result in being targeted for a drone strike. Even with a very low failure rate, this system could still result in innocent lives being lost, not just those who are targeted but in collateral damage caused by the strikes as well. Ball stated that he would leave it to lawyers to determine if the usage of SKYNET is a war crime, but he did say:

It's bad science, that's for damn sure, because classification is inherently probabilistic. If you're going to condemn someone to death, usually we have a 'beyond a reasonable doubt' standard, which is not at all the case when you're talking about people with 'probable terrorist' scores anywhere near the threshold. And that's assuming that the classifier works in the first place, which I doubt because there simply aren't enough positive cases of known terrorists for the random forest to get a good model of them.

Ball isn't the only one raising concerns about SKYNET. Security expert Bruce Schneier also spoke to Ars, and he had serious concerns about the program. "Government uses of big data are inherently different from corporate uses," he stated, "The accuracy requirements mean that the same technology doesn't work. If Google makes a mistake, people see an ad for a car they don't want to buy. If the government makes a mistake, they kill innocents."

Edit: Originally this article suggested that with a population of 55 million and a false positive rate of 0.18%, then thousands would be falsely identified as terrorists. This was a mistake made by comparing the false positive rate against the entire population when it should be compared to the number of positive results. Since we do not know the actual number of people being positively identified as terrorists the actual impact of the false positive rate is also unknown. However, even a small number of people being falsely identified as terrorists by this system is a cause for concern. Thanks to commentor Timothy Riggs for pointing this out.

Is the use of SKYNET to identify terrorists a cause for concern? Leave your comments below.

Have a tip, or want to point out something we missed? Leave a Comment or e-mail us at tips@techraptor.net