About — BirdNET+ Taxonomy

What is this?

This is the species metadata database behind BirdNET, the sound identification system developed at the Cornell Lab of Ornithology and Chemnitz University of Technology. It provides structured information about every species that BirdNET currently recognizes or may learn to identify in the future: common and scientific names, descriptions, images, and external identifiers, all accessible through a browsable website and a REST API.

Which species are included?

The database covers birds along with other animal groups that produce identifiable sounds: mammals, insects, amphibians, and reptiles.

For birds, we use AviList as our taxonomic authority. AviList is a global bird species checklist that reconciles and unifies major world bird lists. A bird must appear on the current AviList edition to be included in our database. Species found on iNaturalist but absent from AviList are excluded, and species listed on AviList but not yet on iNaturalist are still included so that coverage stays as complete as possible.

For all other groups, we include species that have at least one sound recording documented on iNaturalist. Some groups, like insects and amphibians, require a slightly higher minimum to filter out very sparse or questionable entries. Species flagged as extinct on iNaturalist are excluded from all groups.

Where do the data come from?

No single source covers everything we need, so we pull from several databases and combine the results. iNaturalist provides the backbone: taxon identifiers, observation counts, common names in dozens of languages, and default taxon photos. eBird contributes species pages, Macaulay Library images, and common names in over sixty regional language variants. Wikidata supplies external identifiers (GBIF, NCBI, Avibase, BirdLife) and Wikimedia Commons images with licensing information. Wikipedia provides species descriptions in about twenty languages, and we use Claude to translate or shorten descriptions for an additional set of target languages where Wikipedia coverage is thin.

Each species is also cross-referenced against Macaulay Library, Xeno-Canto, and observation.org so that users can quickly find sound recordings and observations for any species in the database.

How are images chosen?

We try several sources in order and use the first image that meets our licensing requirements. The preference order is: the default taxon photo from iNaturalist (if it carries a Creative Commons license), then the species image from eBird and Macaulay Library, then Wikimedia Commons via Wikidata, and finally a Creative-Commons-licensed observation photo from iNaturalist as a last resort. For a small number of species we apply manual overrides when the automatically selected image is misleading or of poor quality.

Once selected, every image is downloaded, smart-cropped using object detection to keep the animal centered, and saved in two sizes as WebP files: a small thumbnail for the browse grid and a medium version for the species detail page.

How do names and descriptions work?

Common names come from three sources that complement each other. For birds, the primary English name is taken from AviList, our taxonomic authority, ensuring consistency with the accepted species list. eBird provides additional regional name variants in other languages (for example, different Spanish names used in Argentina, Chile, and Mexico). iNaturalist adds community-contributed names in languages eBird does not cover. Wikidata labels fill remaining gaps. We merge all of these, giving preference to AviList for English bird names and eBird for other locales, and store them per locale so the API can return the right name for any supported language.

Descriptions follow a similar pattern. We start with Wikipedia extracts, then overlay higher-quality translations produced by Claude for our core set of languages. If neither Wikipedia nor Claude has a usable text for a given locale, we fall back to the eBird species description in English.

What is a BirdNET ID?

Every species in the database receives a stable BirdNET identifier, formatted as a five-digit number with a "BN" prefix, for example BN00042. These identifiers are meant to remain constant across releases so that downstream systems can reference a species without worrying about taxonomic name changes. The mapping is stored persistently and new species simply receive the next available number.

Licensing and reuse

The metadata itself (names, descriptions, identifiers) is available under the project's open-source license. Images, however, carry their own individual licenses: most are Creative Commons in various flavors, and Macaulay Library images are copyrighted by their photographers and used under Cornell's terms. The license for each image is recorded in the dataset so you can filter or attribute accordingly.

The full dataset is available for download in CSV, JSON, and ZIP formats from the Download page, and programmatic access is provided through the REST API. Source code and documentation live on GitHub.