Lab 0: Data Generation
This first (optional) lab is focused on getting started with MATLAB/Octave and working with data for ML. The goal is to provide basic familiarity with MATLAB syntax, along with some preliminary data generation, processing and visualization.
MATLAB/Octave resources
The labs are designed for MATLAB/Octave. Below you can find a number of resources to get you started.
- MATLAB getting started tutorial for an introduction to the environment, syntax and conventions.
- MATLAB has very thorough documentation, both online and built in. In the command window, type: help functionName (check use) or doc functionName (pull up documentation).
- Built in tutorials: in the command window enter demo.
- Comprehensive MATLAB reference and introduction: (pdf)
- MIT Open CourseWare: Introduction to MATLAB
- Stanford/Coursera Octave Tutorial (video)
- Writing Fast MATLAB Code (pdf): Profiling, JIT, vectorization, etc.
- Stack Overflow: MATLAB tutorial for programmers
Getting Started
- Get the code file, add the directory to MATLAB path (or set it as current/working directory).
- Use the editor to write/save and run/debug longer scripts and functions.
- Use the command window to try/test commands, view variables and see the use of functions.
- Use
plot(for 1D),imshow,imagesc(for 2D matrices),scatter,scatter3Dto visualize variables of different types. - Work your way through the examples below, by following the instructions.
1. Optional - MATLAB Warm-up
- Create a column vector
v = [1; 2; 3]and a row vectoru = [1,2,3]- What happens with the command
v'? What is the corresponding algebraic/matrix operation? - Create
z = [5;4;3]and try basic numerical operations of addition and subtraction withv. - What happens with
u + z?
- What happens with the command
- Create the matrices
A = [1 2 3; 4 5 6; 7 8 9]andB = A'- What kind of matrix is
C = A + B? - Explore what happens with
A(:,1),A(1,:),A(2:3,:)andA(:).
- What kind of matrix is
- Use the product operator
*- What happens with
2*u,u*2,2*v? - What happens with
u*vandv*u, why? WithA*v,u*AandA*u? - Use
sizeand/orlengthfunctions to find the dimensions of vectors and matrices.
- What happens with
- Use the element-wise operators
.*and./, e.g.,u.*zandz./u- What happens with
v.*zandv./z? - Why aren't
A*AandA.*Athe same?
- What happens with
- Use the functions
zeros,ones,rand,randn- Create a 3 x 5 matrix of all zeros, all ones or random numbers uniformly distributed between 2 and 3 and random numbers distributed according to a Gaussian of variance 2.
- Use the functions
eyeanddiag- Create a 3 x 3 identity matrix and a matrix whose diagonal is the vector
v.
- Create a 3 x 3 identity matrix and a matrix whose diagonal is the vector
2. Core - Data generation
The function MixGauss(means, sigmas, n) generates datasets where the
distribution of each class is an isotropic Gaussian with a given mean and variance, according to the values in matrices/vectors means and sigmas. Study the function code or type help MixGauss on the MATLAB shell. The function scatter can be used to plot points in 2D.
- Generate and visualize a simple dataset:
[X, C] = MixGauss([[0;0], [1;1]], [0.5, 0.25], 1000);
figure; scatter(X(:,1), X(:,2), 25, C);
- Generate more complex datasets:
- 4-class dataset: the classes must live in the 2D space and be centered on the corners of the unit square (0,0), (0,1), (1,1), (1,0), all with variance 0.2.
- 2-class dataset: manipulate the data to obtain a 2-class problem where data on opposite corners share the same class. Hint: if you generated the data following the suggested center order, you can use the function
modto quickly obtain two labels, e.g.Y = mod(C, 2).
3. Optional - Extra practice
- Generate datasets of larger variances, higher dimensionality of input space etc.
- Add noise to the data by flipping the labels of random points.
- For a dataset compute the distances among all input points (use vectorization in your code, avoid using a
forloop). How does the mean distance change with the number of dimensions? - Generate regression data: Consider a regression model defined by a linear function with coefficients
wand Gaussian noise of level (SNR)delta.- Create a MATLAB function with input the number of points
n, the number of dimensionsD, the D-dimensional vectorwand the scalardeltaand output an (n x D) matrixXand an (n x 1) vectorY. - Plot the underlying (linear) function and the noisy output on the same figure.
- Test/visualize the 1-D and 2-D cases, but make the function generic to account for higher dimensional data.
- Create a MATLAB function with input the number of points
- Generate regression data using a 1-D model with a non-linear function.
- Generate a dataset (either for regression or for classification) where most of the input variables are "noise", i.e., they are unrelated to the output.