Vectors, Matrices, and Arrays: Statistical Analysis
Contents
Though most MATLAB functions are vectorized, we should be careful when using those functions which include usage of two or more array elements in the same time. This situation occurs frequently when dealing with data statistics.
Here's a sample data array generated from the random number generator:
DataA = rand(3,5)
DataA = 0.1829 0.0287 0.9787 0.4711 0.0424 0.2399 0.4899 0.7127 0.0596 0.0714 0.8865 0.1679 0.5005 0.6820 0.5216
Comparison between Elements
By default, in arrays of two or more dimensions, these operators do the calculations along the first non-singleton dimension. For example:
max(DataA)
ans = 0.8865 0.4899 0.9787 0.6820 0.5216
min(DataA)
ans = 0.1829 0.0287 0.5005 0.0596 0.0424
these print the maximum and minimum values of each row in array DataA. Another example is
sum(DataA)
ans = 1.3094 0.6865 2.1918 1.2127 0.6355
this calculates the sum of the elements of each row in array DataA.
Specifying Subscripts
If the default dimension is not the one along which you want to operate the functions, you can add a second optional parameter that specifies which dimension you want to collapse:
sum(DataA,2)
ans = 1.7038 1.5736 2.7585
this will sum the columns (the 2nd dimension).
Note that in the case of max and min, these functions are supposed to compare two arrays, so the second optional parameter of max and min is set to be an array. Therefore, to specify which dimension we want to analysize, we need to add an empty second optional parameter and indicate the dimension we want in the third argument:
max(DataA,[], 2)
ans = 0.9787 0.7127 0.8865
(Exercise: what will happen if you type max(DataA,2) ?)
Colon Operator Again!
In the case of data arrays, we want the operators to apply on every element in the array, instead of columns or rows. This can be achieved by using the colon operator. For example, if you want to sum all the numbers in DataA irrespective of their position in the array, do
sum(DataA(:))
ans = 6.0359
which gives you the summation of every single element in array DataA. Similarly you can have
max(DataA(:)) min(DataA(:))
ans = 0.9787 ans = 0.0287
Simple Statistics
MATLAB provides several internal function for the sake of data statistics. For example, the average value of each element in array DataA can be calculated by using the function mean:
mean(DataA(:))
ans = 0.4024
For the standard deviation between each element of array DataA, use std:
std(DataA(:))
ans = 0.3166
Exercise
The goal of this exercise is to make you feel grateful to the MATLAB internal functions...
Your professor gives you an unknown data set named BlackBox. You don't know what are the values in the data set, you don't even know the size of the data set. How could you calculate the average value of the data set, without using the function mean?
(Hint: Use sum and size!)