A bark or a grunt? A look at what makes animal calls sound different
or technically,
Time as a supervisor: temporal regularity and auditory object learning
[See Original Abstract on Pubmed]
Authors of the study: Ronald W. DiTullio, Chetan Parthiban, Eugenio Piasini, Pratik Chaudhari, Vijay Balasubramanian, Yale E. Cohen
As you walk through the parking lot it probably doesn’t take much effort to recognize the laughter of your friends behind you, the gentle hum of a car engine rolling down the neighboring aisle, or the clatter of wheels rolling on someone’s shopping cart. Each one of these sounds is what neuroscientists call an auditory object. While we can often effortlessly recognize auditory objects, determining exactly how we do so is difficult. It might be obvious to us that a particular sound is a car engine or a laugh, but we would be hard-pressed to name exactly what features of the sound distinguish one from another. That’s why recent NGG graduate Dr. Ronald DiTullio and a team of researchers at the University of Pennsylvania set out to figure out what makes auditory objects sound different.
Identifying the features that define auditory objects has at least two important applications in neuroscience and beyond. First, it might give experimenters clues as to how the brain learns to differentiate sounds. For example, once researchers have identified features that separate human voices from the sound of kitchen appliances, they can look at brain activity and ask whether the brain learns the same features. Second, being able to measure and quantify useful features of sounds allows scientists to build artificial systems that can distinguish them as well. For example, your Google Home or Amazon Alexa system might benefit from looking for features that will help it learn to distinguish your voice from a tea kettle boiling in the background.
DiTullio and his team aimed to find the features that differentiate auditory objects by studying the different types of vocalizations that rhesus macaque monkeys make. They chose to study macaque vocalizations because they have structure, just like human speech, and vocalizations can be grouped into a small number of types. In this case, analyzing the differences between macaque vocalizations is easier than looking at something as complicated as spoken language. “There is a general pattern in nature,” DiTullio explained. “We know that macaque vocalizations should share similar structure with human vocalizations.” Because sounds in nature share similar features, the team hopes that what they learn about the differences between macaque vocalizations will apply to other types of sounds.
The research team started with a key observation: the sounds that make up auditory objects change slowly and smoothly over time. This is a property called temporal regularity. We can understand the concept of temporal regularity by thinking about playing notes on the piano. When you play and hold a note, the note fades away slowly and the components of the sound are largely the same over time. This example has high temporal regularity. However, if you randomly play and release a note, you quickly switch between silence and the note and the components of the sound change a lot from moment to moment. Thus, this example has low temporal regularity. That is one way DiTullio and his team think we can distinguish one note from another. Temporal regularity is a known feature of many auditory objects, including macaque vocalizations. “The underlying idea is to find a pattern we think exists in natural sounds and that the brain could get useful information from,” said DiTullio. “Our main motivation was to ask, ‘is temporal regularity that pattern?’”.
To test their idea, the team first demonstrated that temporal regularities can be used to distinguish auditory objects. To do this, they took recordings of four different types of macaque vocalizations (coo, harmonic arch, shrill bark, and grunt) and three different types of noise and used statistical methods to quantify different features of each audio clip. One of these statistical methods, called slow feature analysis, looked for temporal regularities in the audio clips, whereas the other methods relied on different types of features. The group found that the temporal regularities identified by slow feature analysis did a better job of distinguishing auditory objects than the features identified by the other statistical methods. This showed that temporal regularities are in fact an important feature of macaque vocalizations that can be used to differentiate them.
The team next applied their idea to another challenge for our auditory systems: the ability to recognize auditory objects in the presence of noise. For example, most people will have no problem distinguishing the statement “I have plans” from “I have plants” in a quiet room, but that becomes much more difficult when having a conversation at a crowded cocktail party. To test whether temporal regularities might help solve this problem, the team applied the same statistical methods discussed previously to audio clips of the same types of macaque vocalizations, but with noisy backgrounds. They found that the temporal regularities identified by the slow feature analysis did a better job than any other features at distinguishing macaque vocalizations with noisy backgrounds. The team showed that temporal regularities are a useful way the brain might solve the problem of identifying sounds in the noisy environments we encounter in our everyday life.
Taken together, these experiments identify an important feature of auditory objects that is useful for distinguishing them: temporal regularity. This opens the door to lots of exciting future work. One open question is whether the brain uses the same features when learning the differences between auditory objects. “We hope that more people will look for these kinds of signals in the brain,” said DiTullio. “People know that temporal regularities are an important part of audition, but they don’t necessarily look for how... the brain might learn these [features].” Another interesting direction for future work would be to investigate different types of sounds, like spoken language. DiTullio’s team has already done follow-up work applying these ideas to bird and human vocalizations. It seems like this is only the start of what temporal regularities may be able to teach us about what makes the sounds we encounter every day sound so different.