This course introduces modern methods for multivariate data building also some theoretical foundations. The lecture is divided into five blocks: I. Foundations of Multivariate Analysis, II. Multivariate statistical Inference, III. Dimensionality Reduction Techniques, IV. Covariance matrix modelling and estimation, V. Methods for Tensors.
More details can be found in syllabus and piazza.
Prof | Piotr Zwiernik |
---|---|
piotr.zwiernik@utoronto.ca | |
Office hours | Tuesday, 15:30-17:00 (UY 9033) |
Shupeng Chen (shupeng.chen@mail.utoronto.ca), Dayi Li (dayi.li@mail.utoronto.ca), Miaoshiqi Liu (miaoshiqi.liu@mail.utoronto.ca), Luis Sierra Muntané (luis.sierra@mail.utoronto.ca), Rongqian Zhang (rongqian.zhang@mail.utoronto.ca)
Section | Room | Lecture time |
---|---|---|
STA 437 LEC0101 & STA 2105 LEC0101 | BR 200 | W 9-11 (lecture), F 9-10 (tutorials) |
STA 437 LEC5101 & STA 2105 LEC5101 | SF 1105 | W 13-15 (lecture), F 13-14 (tutorials) |
Lecture notes (the file will be expanded and updated as the course progresses so don’t print the whole document)
The lecture notes cover all the material presented in class. Some of the textbooks I used:
(20%) midterm 1, (20%) midterm 2, (20%) final project, (40%) final exam
The midterms are short (1 hour) and they focus on simple conceptual/theory questions. ***
Week | Lectures | Notes | Tutorials | Lecture date | Timeline |
---|---|---|---|---|---|
1 | Introduction, some linear algebra, matrix decompositions Random vectors, covariance matrices. |
slides1 notes1 |
RZ: tut1 | 8 Jan | syllabus |
2 | Sample statistics. Multivariate normal distribution: definition, basic properties. | notes2 | ML: tut2 | 15 Jan | |
3 | MVN: Conditional distribution, conditional independence. | notes3 | LSM: tut3 code |
22 Jan | |
4 | Estimation for MVN models Gaussian Processes: basic definitions and examples |
notes4 | DL: tut4 | 29 Jan | |
5 | Non-Gaussian distributions: elliptical distributions, copulas | slides5 notes5 |
midterm1 | 5 Feb | |
6 | Non-Gaussian distributions: Copulas (cont’d), Gaussian mixtures | slides6 | ML: tut6 code |
12 Feb | |
Reading week (no class/tutorial) |
- | - | - | Final project out | |
7 | Principal Component Analysis: definition, basic examples, Scree plot | slides7 notes7 |
SCh: tut7 | 26 Feb | |
8 | Principal Component Analysis: Affine Subspace Approximation Computations, Covariance matrix estimation |
slides8 notes8 |
RZ: tut8 code code pdf |
5 Mar | |
9 | Multidimensional Scaling Laplacian eigenmap and UMAP |
slides9 notes9 |
midterm2 | 12 Mar | |
10 | Canonical Correlation Analysis (CCA) Factor Analysis (FA) |
slidesFA notes10 |
DL: tut10 | 19 Mar | |
11 | Conditional independence Graphical models |
slidesCIndep | SCh: tut11 | 26 Mar | |
12 | Gaussian Graphical models Ising model |
rec1 rec2 |
LSM | Apr 2 |
Submissions: Groups of size 1-2. You have two datasets to choose from. Submit a PDF file with a carefully described data analysis and the code used. Deadline: April 1st.
Expectations and grading: This is an open-ended project that is aimed at forcing you to use some of the multivariate methods for a real dataset. Although there is no right question here, we look for quality analysis that uses the range of methods discussed in class. To help you focus, we gave a list of possible questions that could be addressed. But there is no need to answer them - get creative and follow your curiosity. If the provided dataset is to big, feel free to take a smaller portion. The only real goal here is to learn the methods.
Note: Be to the point. Avoid AI-generated long and meaningless descriptions. You should be ready to answer questions about your work (methods used and conclusions, not implementation details). We prepared the data in R but feel free to prepare your analysis using Python or Julia.
This dataset contains brain activity recordings from 47 individuals who participated in a study at Yale University. The data come from functional MRI (fMRI) scans, which measure brain activity over time. Each subject has a matrix (196 × 110) representing their brain activity.
DX_GROUP
→ Diagnosis (1 = Autism, 2 = Control).AGE_AT_SCAN
→ Age at the time of the scan.SEX
→ Gender (1 = Male, 2 = Female).Guiding Questions for Analysis
Imagine a company takes out a big loan. The lender worries: What if the company can’t pay it back? To manage this risk, financial markets offer Credit Default Swaps (CDS)—a type of insurance for loans.
This dataset includes CDS spreads for over 600 companies across 10 different time periods (tenors). Since spreads vary over time and across companies, analyzing them can reveal how financial markets assess risk under different conditions.
Guiding Questions for Analysis
This appendix provides guidelines on how to structure your final project report. While the project remains open-ended, following this structure will help ensure a clear and well-organized submission.
### 3. Methodology
Describe the multivariate methods used and justify their relevance to your research question.
### 4. Results
This is the list of exercises that should be relevant for preparing for the final. I cleaned-up the exercises so the numbers below refer to the newest version of the notes. The list is incomplete and it covers only Chapters 1-6 for now. The rest is coming soon.
Chapter 1: 1-5,7-10,12,16,19,20,27,28
Chapter 2: 2-5,7,9,10,16,18-20
Chapter 3: 2,4,15,17,19,21,23,25-29,38
Chapter 4: 1-11
Chapter 5: 2, 12-14
Chapter 6: 1, 3-9