We will start by creating a very small Syllable Table. Open SAP2 'explore & score' and click 'New table'. Call the table table1. Now set the amplitude (yellow slider) threshold to 42Db. Next, click 'open sound' and open the sound 'bird102_long.wav' (you may download this file from http://). Select and then outline the entire sound (click to the left of the first syllable and then move the scroll bar below all the way to the right before clicking at the end of the last syllable). You should now see 45 records (syllables) in the table.
The SELECT keyword
Once you understand how to use SELECT keyword, you can already achieve a lot. Open WB (mySQL workbench) and navigate to sap database, and then click the SQL button or open Matlab (after installing the mySQL interface), open sap as shown above and then type
WB: select duration, mean_pitch from table1; and click !
Matlab: [duration pitch]=mysql('select duration, mean_pitch from table1');
If you use WB select the entire output table, click Control-C (to copy the fields to the clipboard) and then open Excel and type Control-V to paste. Click data->Text to column and choose delaminated and then 'next' and in 'other' type '|' And then click finish. You can now create a X-Y plot of duration versus pitch.
Although the sample size is really small you can already see that some of the data are clustered. Let's query to obtain data that belong to one cluster, for example we can ask for data where duration is between 100-150ms and pitch is between 600-1100Hz:
WB: select duration, mean_pitch from table1 where duration>100 and duration<150 and mean_pitch>600 and mean_pitch<1100;
Matlab: [c1_duration, c1_pitch]=mysql('select duration, mean_pitch from table1 where duration>100 and duration<150 and mean_pitch>600 and mean_pitch<1100;');
Advance usage of the SELECT keyword
Table1 includes only three song bouts produced by a bird named b109. In our database you will find () a table called b109clustered, which contains the entire song development of this bird. In this table you may not want to run the query shown above, because this query would return hundreds of thousands of records.
In this case, you should always using the keyword limit, e.g., (Matlab):
[c1_duration, c1_pitch]=mysql('select duration, mean_pitch from b109clustered where duration>100 and duration<150 and mean_pitch>600 and mean_pitch<1100 limit 5000;');
will execute promptly and will return the first 5000 syllables that match your criteria. Because this table is already “clustered”, namely, the syllable types are already identified (see chapter xx) you could have simplified this query since this cluster is already identified as cluster number 4, so this query can turn to
select duration, mean_pitch from b109clustered where cluster=4 limit 5000;
Now, this query will return the first 5000 syllables of type 4, but what if you want not the first 5000 syllables by a random sample of 5000 syllables? That's easy:
select duration, mean_pitch from b109clustered where cluster=4 order by rand() limit 5000;
What if you only want to count how many syllables of each count exists in the table?
select count(*) from b109clustered where cluster=4;
will return 173581, so you can now repeat this query for clusters 0 (not clustered), 1 (introductory notes) and 3, 4 (song syllable types of this bird) and tell that in this bird the frequency of types over development is:
We can now see if those types are present in similar proportion throughout development. To do so, we will limit our queries to different dates.
select count(*) from b109clustered where month=8 and day<19 and cluster=4;
will return 0, but
select count(*) from b109clustered where month=8 and day=20 and cluster=4;
will return 858, telling us that (using our clustering method) this cluster was not identified before August 19.
It should be now quite easy for you to realize how to select data by any combination of criteria. Using a simple for loop, you can pump the results of these queries into Matlab functions. When properly used, the MySQL server is a very potent device, capable of amazing performance returning complicated queries in a remarkable speed.
SELECT and CREATE tables
It is sometimes useful to create new tables from subsets (queries) of an existing table (or even by consolidating data from several tables). We will cover this subject briefly here:
The simplest approach is demonstrated in this query:
create table b109cluster4 select * from b109clustered where cluster=4 limit 9999999;
creates a new table, called b109cluster4, that includes only records of syllable type 4. Note that if you do not use the limit keyword, mySQL will create a table using some default limit (usually 1000 records) so we advise that you always use limit.
Now, a slightly more sophisticated case is when you need to create a new table from certain fields, combining data from a few different tables. For example, say that you have a raw feature table that is a bit faulty (my_faulty_table) and you suspects that some of the records in that table has Wiener entropy values which are positive (Wiener entropy must be negative!). Now detecting those records is easy:
SELECT entropy FROM my_faulty_table where entropy>0;
Of course, you can now create a new table excluding the faulty records:
create table my_right_table SELECT * FROM my_faulty_table where entropy<0 limit 999999;
However, you also want to identify the wave file that those faulty data belongs to, so as to figure out what happened. The problem is that the name of the wave file is not included in the raw features table. In that table you only have the file_index field. This field, however, is pointing at the appropriate wave file name in another linked table, called file_table. This data structure saves much space since we have thousands of records from each file in the raw features table. So we now need to join information across those related table, and mySQL makes it very easy. All you need to do is identify the table where the field comes from by the syntax table_name.field_name.
For example, the query
SELECT entropy, file_table.file_name FROM my_faulty_table, file_table where entropy>0 and my_faulty_table.file_index=file_table.file_index limit 99999;
returns the positive entropy values side by side with the file names that relate to those values.
The simple examples above demonstrated how a new table that include only some of the table records (row) and fields (columns) can be generated from an already existing table using the Select command combined with other commands such as create table.
Altering, cleaning and merging tables
Mistakes happens, and tables need to be altered or updated occasionally. You can alter data manually in the MCC, but as your tables become larger, it becomes impractical to correct values by hand. The simplest way of altering a table is to apply a change to an entire field, e.g.,
Update my_syllables set cluster=0;
is going to erase all your cluster values, setting them to zero. So, if you actually like most of your clusters, but not cluster 4, you could type:
Update my_syllables set cluster=0 where cluster=4;
In short, you can alter subset of your data using the exact same method for selecting subset of your data.
Finally, it is quit easy to merge two tables into a single, large table:
Insert into my_first_table select * from my_second_table;