The similarity score
< PREVIOUS: Asymmetric | measurements __________________________________________> NEXT: Saving data criteria |
|
|
|
Scoring similarity between songs on the scale of a single window is hopeless, as is comparing pictures one pixel at a time. The solution is to compare intervals consisting of several windows. If such intervals are sufficiently long, they will contain enough information to identify a unique song segment. Yet, if the intervals are too long, similarities that are real at a smaller interval size may be rejected and that would reduce the power of analysis. We found empirically that comparisons using 50-70ms intervals, centered on each 10-ms time window were satisfactory. Perhaps not surprisingly, the duration of these song intervals is on the order of magnitude of a typical song note. Our final score of similarity combined the two scales: the ‘large scale’ (usually 50-70 ms) is used for reducing ambiguity with a measure we call %similarity, while the ‘small scale’ (usually 5-10 ms) is used to obtain a fine-grained quantification of similarity, which we call accuracy.
For each pair of time windows labelled as ‘similar’ for two songs being compared, SAP2011 calculated the probability that the goodness of the match would have occurred by chance as described above. We are left, then, with a series of P values, and the lower the P, the higher the similarity. For convenience we transform these P values to 1-P; therefore, a 99% similarity between a pair of windows means that the probability that the goodness of the match would have occurred by chance is less than 1%. In this case, 99% similarity does not mean that the features in the two songs being compared are 99% similar to each other. In practice and because of how our thresholds were set, songs or sections of songs that get a score of 99% similarity tend, in fact, to be very similar. The SAP2011 procedure requires that there be a unique relation between a time window in the model and a time window in the pupil. Yet, our technique allows that more than one window in the pupil song will meet the similarity threshold. The probability of finding one or more pairs of sounds that meet this threshold increases with the number of comparisons made and so, in some species at least, the duration of the pupil’s song will influence the outcome. When a window in a tutor’s song is similar to more than one window in the pupil’s song, the problem is how to retain only one pair of windows. Two types of observations helped us make this final selection: the first is the magnitude of similarity, the second one is the length of the section that met the similarity criterion.
Windows with scores that meet the similarity threshold are often contiguous to each other and characterize discrete ‘sections’ of the song. In cases of good imitation, sections of similarity are interrupted only by silent intervals, where similarity is undefined. Depending on the species, a long section of sequentially similar windows (i.e. serial sounds similar in the two songs compared) is very unlikely to occur by chance, and thus the sequential similarity we observed in zebra finches was likely the result of imitation. Taken together, the longer the section of similarity and the higher the overall similarity score of its windows, the lower the likelihood of this having occurred by chance. Therefore, the overall similarity that a section captures has preeminence over the local similarity between time windows.
To calculate how much similarity each section captured SAP2011 used the following procedure. Consider for example, a tutor’s song of 1000 ms of sound (i.e. excluding silent intervals) that has a similarity section of 100 ms with the song of its pupil, and the average similarity score between windows of that section is 80%. The overall similarity that this section captures is therefore 8%.
This procedure is repeated for all sections of similarity. Then, we discarded parts of sections that showed overlapping projections, either on the tutor or on the pupil’s song. Starting from the section that received the highest overall similarity score (the product of similarity[1]duration), we accepted its similarity score as final and removed overlapping parts in other sections. We based the latter decision on the overall similarity of each section and not on the relative similarity of their overlapping parts. We repeated this process down the scoring hierarchy until all redundancy was removed. The remainder was retained for our final score of %similarity.
Therefore: