Learning like you - so what? Aren't computers already doing that?
Humans and other mammals instinctively separate noises in a crowded environment. You can focus on a specific voice, or piece of music, or hear your name in a busy restaurant through a phenomenon known as the cocktail party effect.
Neural Networks have significantly improved in the endeavor to replicate this ability in machine form. Using directional sensors to help inform powerful systems trained over thousands and thousands of hours of real and artificially generated audio data, they can filter out background noise like the roar of a crowded bar or sounds that are coming from a different direction than the speaker. These advancements are improving the listening experience for the hard of hearing, or for those who are simply overwhelmed by the auditory assault of the modern world.
But what if we could build a computer that hears like we do? That can recognize speakers based on the tone and quality of their voice, even the voice of a stranger who quickly becomes familiar, that can zero in and transmit only that voice. That can, as you do naturally, transition between speakers and filter sound even when someone is being talked over.
Enter, Density Networks.
While the output of traditional neural networks may appear to be similar to that of a Density Network, these systems are vastly different. This article will help articulate the important differences in how a Density Network learns and understands the world compared to its neural network counterparts.