You are here: Home Manual Chapter 6: Data Managment MySQL in a nutshell

MySQL in a nutshell

Why should you learn some Standard Query Language (SQL)

Standard Query Language (SQL) will allow you to retrieve and manipulate SAP2 data easily, and very quickly. MySQL is one of the most standard implementation of SQL and it is (practically) free. Since MySQL is used heavily in SAP2, it is not a bad idea to know just enough SQL to be able to manipulate the tables and export data efficiently to other applications. This is particularly important because the SAP2 tables often contain large quantity (even millions) of records. There are very few applications that can efficiently manage large amount of automatically generated data. Therefore, you should look at SQL as an efficient data pump between the tables and other applications. For example, Matlab does not have any (respectable) database functionality, and it stores all variables in memory. It is therefore inefficient to import an entire table to Matlab, but you can use SQL to pump data epochs into Matlab, calculate something and move to the next epoch. Furthermore, those data epochs can be very flexible in content. For example, obtaining epochs containing the pitch of the first 100 syllables produced at around 9AM is a very simple command:

select pitch from my_table where hour=9 limit 100;

as you can see, it is very simple and intuitive to write SQL queries.

SQL is used not only to retrieve but also to process data from your tables, but here we will only teach you how to select data from a table, how to copy data into a new table, how to change table values, and how to merge tables together.
How to practice this tutorial in MySQL Workbench (WB):

You can download the workbench here. Once installed open the workbench and double click 'local instance mysql' as shown below.
mysql1
-
Then type a query:
-
mysql2
-
and you should see the results (or error message) below:
mysql3
-
To use Matlab: download the mySQL Matlab connector here. Then open Matlab and type:
>> 
mysql('open', 'localhost', 'root', 'sap2011');
Connecting to  host=localhost  user=root  password=sap2011
Uptime: 745762  Threads: 1  Questions: 17  Slow queries: 0  Opens: 42  Flush tables: 1  Open tables: 0  Queries per second avg:
>>
mysql('use sap');
Current database is "sap"
and then type a query , e.g.,
>> 
x=mysql('select bird_ID from sap.settings');

We will start by creating a very small Syllable Table. Open SAP2 'explore & score' and click 'New table'. Call the table table1. Now set the amplitude (yellow slider) threshold to 42Db.  Next, click 'open sound' and open the sound 'bird102_long.wav' (you may download this file from http://). Select graphic and then outline the entire sound (click to the left of the first syllable and then move the scroll bar below all the way to the right before clicking at the end of the last syllable). You should now see 45 records (syllables) in the table.


The SELECT keyword

Once you understand how to use SELECT keyword, you can already achieve a lot. Open WB (mySQL workbench) and navigate to sap database, and then click the SQL button or open Matlab (after installing the mySQL interface), open sap as shown above and then type
WB: select duration, mean_pitch from table1; and click !
Matlab: [duration pitch]=mysql('select duration, mean_pitch from table1');

If you use WB select the entire output table, click Control-C (to copy the fields to the clipboard) and then open Excel and type Control-V to paste. Click data->Text to column and choose delaminated and then 'next' and in 'other' type '|' And then click finish. You can now create a X-Y plot of duration versus pitch.

Although the sample size is really small you can already see that some of the data are clustered.  Let's query to obtain data that belong to one cluster, for example we can ask for data where duration is between 100-150ms and pitch is between 600-1100Hz:

WB: select duration, mean_pitch from table1 where duration>100 and duration<150 and mean_pitch>600 and mean_pitch<1100;
Matlab: [c1_duration, c1_pitch]=mysql('select duration, mean_pitch from table1 where duration>100 and duration<150 and mean_pitch>600 and mean_pitch<1100;');


Advance usage of the SELECT keyword

Table1 includes only three song bouts produced by a bird named b109. In our database you will find () a table called b109clustered, which contains the entire song development of this bird. In this table you may not want to run the query shown above, because this query would return hundreds of thousands of records.
In this case, you should always using the keyword limit,  e.g., (Matlab):

[c1_duration, c1_pitch]=mysql('select duration, mean_pitch from b109clustered where duration>100 and duration<150 and mean_pitch>600 and mean_pitch<1100 limit 5000;');

will execute promptly and will return the first 5000 syllables that match your criteria. Because this table is already “clustered”, namely, the syllable types are already identified (see chapter xx) you could have simplified this query since this cluster is already identified as cluster number 4, so this query can turn to

select duration, mean_pitch from b109clustered where cluster=4 limit 5000;

Now, this query will return the first 5000 syllables of type 4, but what if you want not the first 5000 syllables by a random sample of 5000 syllables? That's easy:

select duration, mean_pitch from b109clustered where cluster=4 order by rand() limit 5000;

What if you only want to count how many syllables of each count exists in the table?
select count(*) from b109clustered where cluster=4;

will return 173581, so you can now repeat this query for clusters 0 (not clustered), 1 (introductory notes) and  3, 4 (song syllable types of this bird)  and tell that in this bird the frequency of types over development is:

0: 279354
1: 337884
3: 198997
4: 173581

We can now see if those types are present in similar proportion throughout development. To do so, we will limit our queries to different dates.

select count(*) from b109clustered where month=8 and day<19 and cluster=4;

will return 0, but

select count(*) from b109clustered where month=8 and day=20 and cluster=4;

will return 858, telling us that (using our clustering method) this cluster was not identified before August 19.

It should be now quite easy for you to realize how to select data by any combination of criteria. Using a simple for loop, you can pump the results of these queries into Matlab functions. When properly used, the MySQL server is a very potent device, capable of amazing performance returning complicated queries in a remarkable speed.

 

 

SELECT and CREATE tables

It is sometimes useful to create new tables from subsets (queries) of an existing table (or even by consolidating data from several tables). We will cover this subject briefly here:

The simplest approach is demonstrated in this query:

create table b109cluster4 select * from  b109clustered where cluster=4 limit 9999999;

creates a new table, called b109cluster4, that includes only records of syllable type 4. Note that if you do not use the limit keyword, mySQL will create a table using some default limit (usually 1000 records) so we advise that you always use limit.

Now, a slightly more sophisticated case is when you need to create a new table from certain fields, combining data from a few different tables. For example, say that you have a raw feature table that is a bit faulty (my_faulty_table) and you suspects that some of the records in that table has Wiener entropy values which are positive (Wiener entropy must be negative!). Now detecting those records is easy:

SELECT entropy FROM my_faulty_table where entropy>0;

Of course, you can now create a new table excluding the faulty records:

create table my_right_table  SELECT * FROM my_faulty_table where entropy<0 limit 999999;

However, you also want to identify the wave file that those faulty data belongs to, so as to figure out what happened.  The problem is that the name of the wave file is not included in the raw features table. In that table you only have the file_index field. This field, however, is pointing at the appropriate wave file name in another linked table, called file_table. This data structure saves much space since we have thousands of records from each file in the raw features table. So we now need to join information across those related table, and mySQL makes it very easy. All you need to do is identify the table where the field comes from by the syntax table_name.field_name.

For example, the query

SELECT entropy, file_table.file_name FROM my_faulty_table, file_table where entropy>0 and my_faulty_table.file_index=file_table.file_index limit 99999;

returns the positive entropy values side by side with the file names that relate to those values.

The simple examples above demonstrated how a new table that include only some of the table records (row) and fields (columns) can be generated from an already existing table using the Select command combined with other commands such as create table.

 

 

Altering, cleaning and merging tables

Mistakes happens, and tables need to be altered or updated occasionally. You can alter data manually in the MCC, but as your tables become larger, it becomes impractical to correct values by hand. The simplest way of altering a table is to apply a change to an entire field, e.g.,

Update my_syllables set cluster=0;

is going to erase all your cluster values, setting them to zero. So, if you actually like most of your clusters, but not cluster 4, you could type:

Update my_syllables set cluster=0 where cluster=4;

In short, you can alter subset of your data using the exact same method for selecting subset of your data.

Eliminating duplicates:

Finally, it is quit easy to merge two tables into a single, large table:

Insert into my_first_table select * from my_second_table;