Introduction to song features

< PREVIOUS: The Song features > NEXT: Articulation based analysis

Introduction to Song Features

We now take a deeper look into the acoustic features and the measures we derive from them. The first step of the analysis is to reduce the sound spectrograph to a few simple features. The entire SAP2011 analysis is based on those features – that is, the features replace the sonogram.
Why use features? Many of the previous attempts to automate the analysis of sound similarity used a sound-spectrographic cross-correlation as a way to measure the similarity between syllables: correlation between the spectrograms of the two notes was examined by sliding one note on top of the other and choosing the best match (the correlation peak). However, measures based on the full spectrogram suffer from a fundamental weakness: the high dimensionality of the basic features.  For example, cross-correlations between songs can be useful if the song is first partitioned into its notes, and if the notes compared are simple. But even in this case, mismatch of a single feature can reduce the correlation to baseline level. For example, a moderate difference between the fundamental frequencies of two complex sounds that are otherwise very similar would prevent us from overlapping their spectrogram images (a vertical translation will not help since the harmonics won’t match).

The cross-correlation approach (as mentioned above) requires, as a first step, that a song be partitioned into its component notes or syllables.  This, in itself, can be a problem. Partitioning a song into syllables or notes is relatively straightforward in a species such as the canary in which syllables are always preceded and followed by a silent interval. Partitioning a song into syllables is more difficult in the zebra finch, whose song includes many changes in frequency modulation and in which diverse sounds often follow each other without intervening silent intervals.  Thus, the problems of partitioning sounds into their component notes and then dealing with the complex acoustic structure of these notes compound each other. The analytic approach of Sound Analysis addresses both of the above difficulties.  It achieves this by reducing complex sounds to an array of simple features and by implementing an algorithm that does not require that a song be partitioned into its component notes.
In recent years, several alternative approaches to sound analysis were published and made available, including improved cross correlation methods, and other means of extracting features. Other approaches are based on compression by principal component analysis, and other objective methods for dimensionality reduction. If there is one advantage to using Sound Analysis Pro, it is the usage of simple, and well understood features. When you see that two syllables differ in pitch, Wiener entropy or in frequency modulation, it is easy to imagine how the vocal quality varies, and so it is more likely that you will be able to develop a good intuition, and perhaps some understanding of the difference between the two sounds.