A dolphin handler gives the signal for “together” with his hands, followed by “close”. The two trained dolphins disappear underwater, exchange sounds and then emerge, turn on their backs and lift their tails. They have developed a trick of their own and performed it in tandem, just as requested. “It doesn’t prove that there is language,” says Aza Raskin. “But it certainly makes sense that if they had access to a rich, symbolic way of communicating, it would make this task much easier.”
Raskin is the co-founder and president of the Earth Species Project (ESP), a non-profit group in California with a bold ambition: to decode non-human communication using a form of artificial intelligence (AI) called machine learning, and make all knowledge public available, thereby deepening our connection with other living species and helping to protect them. A 1970 album of whale songs galvanized the movement that led to the banning of commercial whaling. What can a Google Translate for the animal kingdom create?
The organization, founded in 2017 with the help of major donors such as LinkedIn co-founder Reid Hoffman, published its first scientific paper last December. The goal is to unlock communication in our lives. “The end we’re working towards is, we can decode animal communication, detect non-human language,” says Raskin. “On the way and just as important, we are developing technology that supports biologists and conservation now.”
Understanding animal vocalizations has long been the subject of human fascination and study. Different primates give alarm calls that vary depending on the predator; dolphins address each other with signature whistles; and some songbirds can take elements of their calls and rearrange them to communicate different messages. But most experts stop short of calling it a language, as no animal communication meets all the criteria.
Until recently, decoding has mostly relied on painstaking observation. But interest has grown in using machine learning to handle the vast amounts of data that can now be collected by modern animal-borne sensors. “People are starting to use it,” says Elodie Briefer, an associate professor at the University of Copenhagen who studies vocal communication in mammals and birds. “But we don’t quite understand yet how much we can do.”
Briefer has developed an algorithm that analyzes pig grunts to tell if the animal is experiencing a positive or negative emotion. Another, called DeepSqueak, judges whether rodents are in a stressed state based on their ultrasonic calls. A further initiative – Project CETI (which stands for Cetacean Translation Initiative) – plans to use machine learning to translate the communication of sperm whales.
Still, ESP says its approach is different, because it’s not focused on decoding the communications of one species, but all of them. While Raskin acknowledges that there will be a higher likelihood of rich, symbolic communication between social animals—such as primates, whales, and dolphins—the goal is to develop tools that can be used across the animal kingdom. “We’re species agnostic,” says Raskin. “The tools we’re developing … can work across the whole of biology, from worms to whales.”
* * *
The “motivating intuition” for ESP, Raskin says, is work that has shown that machine learning can be used to translate between different, sometimes distant human languages—without the need for any prior knowledge.
This process starts with the development of an algorithm to represent words in a physical space. In this multidimensional geometric representation, the distance and direction between points (words) describe how they meaningfully relate to each other (their semantic relationship). For example, “king” has a relationship to “man” with the same distance and direction as “woman” has to “queen”. (The mapping is not done by knowing what the words mean, but by looking, for example, at how often they occur near each other.)
It was later noticed that these “forms” are similar for different languages. And then, in 2017, two groups of researchers working independently found a technique that made it possible to achieve translation by adjusting the shapes. To get from English to Urdu, align their forms and find the point in Urdu that is closest to the point of the word in English. “You can translate most words decently well,” says Raskin.
ESP’s ambition is to create such representations of animal communication – to work with both individual species and many species simultaneously – and then explore questions such as whether there is overlap with the universal human form. We don’t know how animals experience the world, says Raskin, but there are emotions, such as sadness and joy, that some seem to share with us and may well communicate about with others of their species. “I don’t know which will be the most incredible—the parts where the shapes overlap and we can directly communicate or translate, or the parts where we can’t.”
He adds that animals do not only communicate vocally. Bees, for example, let others know a flower’s location via a “waggle dance”. There will also be a need to translate across different means of communication.
The goal is “like going to the moon,” Raskin admits, but the idea isn’t to get there all at once either. ESP’s roadmap rather involves solving a number of smaller problems that are necessary for the bigger picture to be realised. This should see the development of general tools that can help researchers trying to use AI to unlock the secrets of species under study.
For example, ESP recently published a paper (and shared its code) on the so-called “cocktail party problem” in animal communication, where it is difficult to distinguish which individual in a group of the same animals is speaking in a noisy social environment.
“To our knowledge, no one has done this end-to-end de-filtering [of animal sound] before, says Raskin. The AI-based model developed by ESP, which was tested on dolphin signature whistles, macaque coo calls and bat vocalizations, performed best when the calls came from individuals on which the model had been trained; however, with larger data sets, it was able to distinguish mixtures of calls from animals not in the training cohort.
That could lead to a step change in our ability to help the Hawaiian crow come back from the brink
Christian Rutz
Another project involves using AI to generate new animal calls, with humpback whales as a test species. The novel calls – made by splitting vocalizations into microphones (distinct units of sound lasting hundredths of a second) and using a language model to “speak” something whale-like – can then be played back to the animals to see how they respond. If AI can identify what makes a random change versus a semantically meaningful one, it brings us closer to meaningful communication, Raskin explains. “It’s letting AI speak the language, even if we don’t know what that means yet.”
A further project aims to develop an algorithm that determines how many call types a species has at its command using self-supervised machine learning, which requires no labeling of data by human experts to learn patterns. In an early test case, it will mine audio recordings made by a team led by Christian Rutz, a professor of biology at the University of St Andrews, to produce an overview of the vocal repertoire of the Hawaiian crow – a species that Rutz discovered has the ability to make and use tools for searching and are thought to have a significantly more complex set of vocalizations than other crow species.
Rutz is particularly enthusiastic about the project’s conservation value. The Hawaiian crow is critically endangered and exists only in captivity, where it is bred for reintroduction into the wild. The hope is that by taking recordings made at different times, it will be possible to trace whether the species’ call repertoire has been eroded in captivity – specific alarm calls may have been lost, for example – which could have consequences for reintroduction; that the loss can be resolved with intervention. “It could provide a step change in our ability to help these birds come back from the brink,” says Rutz, adding that manually detecting and classifying the calls would be laborious and error-prone.
Meanwhile, another project seeks to automatically understand the functional meanings of vocalizations. It is pursued with the laboratory of Ari Friedlaender, professor of marine science at the University of California, Santa Cruz. The laboratory studies how wild marine mammals, which are difficult to observe directly, behave underwater and runs one of the world’s largest tagging programs. Small electronic “biologging” devices attached to the animals capture their location, type of movement and even what they see (the devices can include video cameras). The laboratory also has data from strategically placed sound recorders in the sea.
ESP aims to first apply self-supervised machine learning to the tag data to automatically measure what an animal is doing (for example, whether it is feeding, resting, traveling or socializing) and then add the sound data to see if functional meaning can be given to calls associated with this the behavior. (Playback experiments can then be used to validate any findings, along with calls that have been decoded previously.) This technique will be applied to humpback whale data initially – the lab has tagged several animals in the same group so that it is possible to see how signals are given and is received. Friedlaender says he “hit the ceiling” in terms of what tools are currently available that can tease out the data. “Our hope is that the work ESP can do will provide new insights,” he says.
* * *
But not everyone is equally keen on the power of AI to achieve such big goals. Robert Seyfarth is a professor emeritus of psychology at the University of Pennsylvania who has studied social behavior and vocal communication in primates in their natural habitat for more than 40 years. While he believes machine learning can be useful for some problems, such as identifying an animal’s vocal repertoire, there are other areas, including the discovery of the meaning and function of vocalizations, where he is skeptical that it will add much.
The problem, he explains, is that while many animals may have sophisticated, complex societies, they have a much smaller repertoire of sounds than humans. The result is that the exact same sound can be used to mean different things in different contexts, and that is only by studying the context – who the individual calling is, how they are related to others, where they fall in the hierarchy, who they have. interaction with – that meaning can hope to be established. “I just think these AI methods are insufficient,” says Seyfarth. “You have to go out and see the animals.”
There is also doubt about the concept – that the form of animal communication will overlap in a meaningful way with human communication. Applying computer-based analytics to human language, which we are so intimately familiar with, is one thing, says Seyfarth. But it may be “quite different” to do so for other species. “It’s an exciting idea, but it’s a big stretch,” says Kevin Coffey, a neuroscientist at the University of Washington who helped create the DeepSqueak algorithm.
Raskin acknowledges that AI alone may not be enough to unlock communication with other species. But he points to research that has shown that many species communicate in ways “more complex than humans have ever imagined”. The stumbling blocks have been our ability to collect sufficient data and analyze it on a large scale, and our own limited perception. “These are the tools that allow us to take off the human glasses and understand entire communication systems,” he says.