The fascinating tech behind Merlin, the bird ID app

The fascinating tech behind Merlin, the bird ID app

In June, the Merlin app launched Sound ID, a way for people to instantly identify the birds around them by allowing their phone to listen to the calls and songs happening nearby. Now people can use Merlin to ID birds through sound, photographs, or even physical description, technology that’s delighted the app’s tens of thousands of users. But eBird users may not know it was their amateur photographs and sound recordings (about a million of them) that made this innovation possible.

March 03 2022 Jillian Anthony


On a walk through the streets of Austin, Texas, I opened the Merlin Bird ID app. I clicked Sound ID, record, then held my phone above my head, pointed toward a tree packed with dozens of loud, whistling birds. The app showed me a spectogram—a horizontal visualization of the sounds I was hearing that looks like a seismograph—and quickly identified the rambunctious bunch as European Starlings, an invasive species that now inhabits most of North America.

I tried out the app on travels to New York City and Pasadena, California, identifying House Sparrows and Purple Martins and Red-Bellied Woodpeckers. As an amateur birder, I’ve just begun learning how to identify species by their specific calls. I was delighted that such a simple, easy-to-use tool could help me learn the songs of the birds around me. And I’m far from alone: Merlin users uploaded about 2.5 million recordings with the app in August alone, says Merlin Project Coordinator Drew Weber. (Cornell does not currently keep the photos or audio uploaded through Merlin.)

Screenshot of Bird IP App
Screenshot of Bird ID Wizard via Merlin

“Last April, May and June were crazy,” Weber says. “I’d say normally we were seeing like 20- to 30- to 40-percent growth year-over-year, each month, but in the pandemic it was more like 100 percent.”

Merlin comes from the Cornell Lab of Ornithology and is built off data gathered over decades from eBird, Cornell’s online database of birding observations, and stored in Cornell’s Macaulay Library. The app has come a long way since its beginnings in 2014; it launched with the ability to identify about 250 birds, and added the photo ID feature in 2015.

“We’re just over 8,000 of the almost 11,000 species now,” says Weber. “It’s basically Africa and Southeast Asia that we’re really missing at this point. So hopefully we’ll get to that in the next year or two.”

When Sound ID launched in June, it was able to identify 458 of the most popular bird species in the U.S. and Canada. (More birds can be identified through the Photo ID tool, or by providing your location, the date, and the physical description of the bird you saw there.) A silver lining of Covid (at least for the birding community) proved to be that the experts who were typically busy giving birding tours suddenly had the time to help Merlin write out definitive descriptions for most of the birds in the world. But what takes the most gathering time (and data) to make Merlin’s identification features tick are the millions of photographs and audio files needed to train an artificial intelligence recognition system.

“[The Macaulay Library’s] collection has grown to about 30 million photos,” Weber says. “And we’re pulling all of those in to train the photo ID tool. Ideally it has photos in the archive that are every single angle—good quality, terrible quality, ones that are backlit, the whole spectrum of what you would expect to see out in the field. So it’s always an interesting messaging to say, Yeah, we do want your crappy photos, they actually do help. A standard machine learning technique is to distort the images a whole lot, which helps fill in some of those gaps where we might not have all the angles you would expect.”

Most of the sound recordings used to build the app’s tech came through the eBird checklist system, meaning the majority came from amateur, non-scientific birders, Weber says. Once the library hit a million audio recordings in spring 2020, the Cornell team felt they had enough data to build a Sound ID functionality. But the biggest challenge was getting the app to live-ID the correct birds, even if multiple birds were singing at the same time.

“We pulled in a bunch of sound ID experts, and they would go through each of these files [in which multiple birds were singing at once], and they would draw a box around where the target species was singing, and then all the other background species as well,” says Weber. “So if there’s a Northern Cardinal singing five times, but in the background there’s a Gray Catbird, it boxed the Gray Catbird as well.”

Merlin isn’t always right (the app has identified a Red-Shouldered Hawk or Swainson’s Hawk instead of a Blue Jay for me several times), but in my experience, it’s usually highly accurate. More importantly for my birding experience and enjoyment, its Sound IDs often set me off on a hunt to try and see the nearby species I’ve now identified by melody.

Many birders have expressed similar satisfaction to Weber and his team. But one unexpected response to the app was the enthusiastic reaction of the hearing-impaired and deaf community, who enjoy using the spectogram to experience birding visually. ​​

“As soon as [the app launched], we were getting daily emails from people saying, ‘Oh, this changed my life. I lost my ability to hear these birds 30 years ago, and now it feels like I can experience that,’” Weber says. “Or folks that have never been able to. Now they can’t actually hear it, but they can tell that it’s there. And then they’re able to go hunt it down and see it. It’s something we want to dive into more, the accessibility side.”

Still, Weber finds the most satisfaction from helping people identify the birds in their areas they had long tracked and cherished, but had so far been unable to identify.

“Someone had Chimney Swift flying over their house,” Weber says, “and for some reason there was just never the right way for them to figure out what a Chimney Swift was. But they had Sound ID, and the [birds] flew over and made a little chittering noise, and it showed them [what they were], and all that information is right there. Stories like that, I think are really cool—people who just didn’t have the right tool before to make that connection.”

Photo Credit: Kayla Farmer (via Unsplash)