Birdsong Syllable Detection Using Convolutional Neural Network

Open Access
Mu, Xiaofan
Area of Honors:
Electrical Engineering
Bachelor of Science
Document Type:
Thesis Supervisors:
  • Dezhe Jin, Thesis Supervisor
  • Jeffrey Louis Schiano, Honors Advisor
  • Birdsong
  • Convolutional Neural Network
  • Machine Learning
The goal of this research is to develop a program that can detect the existence of birdsong syllables in songs recordings by utilizing convolutional neural network (CNN) and dataset with pre-classified syllables and noise. More specifically, we wish to use CNN to perform image classification by feeding in spectrogram images, which are calculated from the audio, to the network. The CNN approach to detect birdsong syllables has the advantage of efficiency, accuracy and autonomy. The training dataset is prepared by extracting features from spectrograms of syllables and noises. The CNN is expected to learn to classify the syllable features and noise features through the training process. Both the network structure and the dataset can affect the performance of the network, so trial-and-error method is used to find the network and dataset that give the best results. Once the network with the best results is found, the network will be trained again, and then it can detect syllables by pattern matching within continuous spectrograms, which are converted from birdsong audio recordings. If the system gives a segment of the continuous spectrogram highest probability for syllable features, it will label that segment as syllable. The detection results are represented by the vertical lines overlaying the continuous spectrograms, and syllables exist in the regions bounded by the lines. Evaluation will be done by human, inspecting the accuracy of the syllable boundaries, as well as checking for false positives and false negatives, if any.