Computers have been able to match or surpass humans at many tasks that we consider intelligent for quite some time (e.g. chess). However, for much of the time since Deep Blue (aka “AI Winter”) even something as supposedly simple as object recognition remained elusive.
Now, in the age of large datasets and GPUs, deep networks, once deemed unusable, have finally allowed us to achieve human-level performance at this task (actually superhuman, because humans are not that great at distinguishing dog breeds).
This has led to a lot of interesting discussion on how human vision and deep net vision are related. It’s pretty impressive how, when it comes to explaining neural data, deep nets are capable of defeating even the most principled models of human vision.
Does this mean the networks are doing the same thing humans are doing? No. The only thing this means is that these networks are the current best model of vision we have. That’s an important distinction that people are starting to discuss.
In honor of Friday the 13th, I wanted to touch on this short (somewhat cobbled-together) paper on spooooooky behavior in neural networks. Specifically this paper explores how ghostly noise in images, noise that is imperceptible to humans can actually fool a network into misclassifying an image.
Take a look at these example images below:
The left is the original image, the right is the spookified image. The center is the adversarial ghost (the difference between the two images).
If you’re interested in seeing more samples of this, there’s been a contest on this topic as well.
What do we conclude from this? These networks may perform as well as humans do, but humans are much more robust. Perhaps the next phase for computer vision?