Vectors, Matrices, and Arrays: Statistical Analysis

Contents

Though most MATLAB functions are vectorized, we should be careful when using those functions which include usage of two or more array elements in the same time. This situation occurs frequently when dealing with data statistics.

Here's a sample data array generated from the random number generator:

DataA = rand(3,5)
DataA =

    0.1829    0.0287    0.9787    0.4711    0.0424
    0.2399    0.4899    0.7127    0.0596    0.0714
    0.8865    0.1679    0.5005    0.6820    0.5216

Comparison between Elements

By default, in arrays of two or more dimensions, these operators do the calculations along the first non-singleton dimension. For example:

max(DataA)
ans =

    0.8865    0.4899    0.9787    0.6820    0.5216

min(DataA)
ans =

    0.1829    0.0287    0.5005    0.0596    0.0424

these print the maximum and minimum values of each row in array DataA. Another example is

sum(DataA)
ans =

    1.3094    0.6865    2.1918    1.2127    0.6355

this calculates the sum of the elements of each row in array DataA.

Specifying Subscripts

If the default dimension is not the one along which you want to operate the functions, you can add a second optional parameter that specifies which dimension you want to collapse:

sum(DataA,2)
ans =

    1.7038
    1.5736
    2.7585

this will sum the columns (the 2nd dimension).

Note that in the case of max and min, these functions are supposed to compare two arrays, so the second optional parameter of max and min is set to be an array. Therefore, to specify which dimension we want to analysize, we need to add an empty second optional parameter and indicate the dimension we want in the third argument:

max(DataA,[], 2)
ans =

    0.9787
    0.7127
    0.8865

(Exercise: what will happen if you type max(DataA,2) ?)

Colon Operator Again!

In the case of data arrays, we want the operators to apply on every element in the array, instead of columns or rows. This can be achieved by using the colon operator. For example, if you want to sum all the numbers in DataA irrespective of their position in the array, do

sum(DataA(:))
ans =

    6.0359

which gives you the summation of every single element in array DataA. Similarly you can have

max(DataA(:))
min(DataA(:))
ans =

    0.9787


ans =

    0.0287

Simple Statistics

MATLAB provides several internal function for the sake of data statistics. For example, the average value of each element in array DataA can be calculated by using the function mean:

mean(DataA(:))
ans =

    0.4024

For the standard deviation between each element of array DataA, use std:

std(DataA(:))
ans =

    0.3166

Exercise

The goal of this exercise is to make you feel grateful to the MATLAB internal functions...

Your professor gives you an unknown data set named BlackBox. You don't know what are the values in the data set, you don't even know the size of the data set. How could you calculate the average value of the data set, without using the function mean?

(Hint: Use sum and size!)