# Modern Applied Linear Algebra

## Intro

Here we learn:

- How to factor gigantic matrices into simpler pieces
- How to break up gigantic vectors to simpler subspaces and basis
- The Singular Value Decomposition/Principal Component Analysis
- Matrix calculus
- Precise constructions of all these concepts and some of their proofs

Once you learn vector spaces and linear transformations, a lot of programming problems become simple linear algebra solutions.

### Materials

The most efficient path to accomplish this I could find is:

- MIT's modernized 18.06
- 3Blue1Brown visualizations when needed
- Terence Tao's lecture notes and problem sets to fill in some gaps

I'm also doing a totally optional playlist *Linear Algebraic Geometry* you don't have to do but I will do. You'll see later that really all these abstract vector concepts are just parallel lines existing in some abstract subspace and thinking in this way will be beneficial to go on to more advanced linear algebra. He sets everything up as 'this is the special case, how can we make it the general case' in almost every lecture. Then you get linear algebra from 3 different observations: applied everyday linear algebra using software, Terence Tao complete construction of all objects and the Wildberger more abstract linear algebra that compliments the other two perspectives.

If you are using a laptop/desktop or have Termux installed on your phone you can clone the repo for a local backup to do offline:

git clone https://github.com/mitmath/1806.git

#### Optional

I'll mix some of these in but you don't have to do them

- Trefethen & Bau's
*Numerical Linear Algebra*book (the SVD lectures) - The transformations, autodiff, SVD/PCA lectures from MIT's
*Intro to Computational Thinking* - Titu Andreescu's
*Essential Linear Algebra with Applications*book which focuses on linear maps between vector spaces as prep to go on to more advanced linear algebra. It's set up like an olympiad problem practice book. - Brown University's CS053 course
*Coding the Matrix*has open to the public lectures, most of the assignments are done in Python.

### Software

If you want you can just view static notebooks on github, here is an example. Julia notebooks are only used as a calculator and demonstrations really, most of the homework asks you conceptual questions to think about you don't have to program anything.

To make them interactive, use either Nextjournal or binder (experimental) and paste in the .ipynb file path from the mitmath/1806 github repo, or install locally. Local would be the best option to avoid all the problems of slow free container services trying to install requirements.

Almost any language will have linear algebra libraries, feel free to translate all the notebook code to your language of choice. If you're on a phone use Google Colab, import SymPy, PyPlot and NumPy and do it all in Python in your browser, much of the Julia notebook code is actually imported PyPlot and other Python libraries anyway.

## Vectors

The MIT class immediately jumps into matrices since it's MIT and can assume that highschool background, we have to learn it from scratch instead.

### Vector visualization

Watch what a vector is (10m length) covering addition and scalars.

#### Linear Algebraic Geometry 1

This playlist is **optional** but highly recommended. This video you learn linear combinations and change of basis vectors to suit different perspectives. These will be the best linear algebra lectures you'll ever watch, if you have the time.

- Introduction - Wild Linear Algebra A 1 (43m)

He constructs the affine plane from scratch. The vector (8/3, 10,3) you need to full screen to better see exactly where they are. From B, you move right 2 lines, and 2/3 so \(2 + \frac{2}{3}\) which is 8/3. From there you move up 1/3, and 3 so \(3 + \frac{1}{3}\) which is 10/3. Reasons for using the affine plane is this is what a vector space actually looks like, you don't have to impose anything on it artificially like the origin (0,0) in the Cartesian plane or any kind of x, y or z axis. The change of basis explanation here is superior to anything I've seen before. If you forgot highschool, end of this lecture @~30m will teach you how to solve two linear equations with 2 different variables by subtracting and substitution.

@34:01 he shows the general 1-dimensional case, then the general 2-dimensional case where you end up getting the determinant, something we will learn soon. Essentially this is the determinant from scratch and you may want to come back to this lecture later when we learn it to see how it's constructed. The exercises if you don't know what a determinant of 0 means in 3D space, there is solutions, and of course we will learn all this shortly.

In this related lecture he shows how ancient people made a ruler using just parallel line geometry, which leads to what he claims is modeling the rational continuum.

### Tao Vector spaces

Terence Tao's notes reading page 1-19, there's a great intro precisely explaining what linear means, why it's useful to approximate non-linear things with linear transformations, the idea that a scalar can be a vector not a single number, that things like trace and eigenvectors are statistics associated with a matrix, then on page 6 vector definitions begin.

If you've never seen set theory watch this or read Tao's short crash course here, there's no need to do any proofs.

Page 9 has an interesting comment that mathematicians will only say what vectors do, not what they are, to allow for maximum abstraction of the idea of vectors to numerous different subjects. Note in the axioms, subtraction was never defined instead it is axiom II (associative addition) and axiom IV (additive inverse) being interpreted from v + (-w) to v - w shorthand. This is also v + (-1w).

Page 10 the vector space R^{N} is defined as the space of all n-tuples containing scalars (numbers). A tuple is an ordered data structure. This means you only have to research properties of tuples to understand a precise construction of vectors and all the operations you can do on them, which is the same for operations on tuple data structures. The term n-tuple means size, so a 3-tuple is (x, y, z) sometimes called a triple.

Page 12 Polynomials as vectors, why are his 4 examples not in P_{3}(R)? The first exceeds the bounds of degree 3, a rational exponent ie: square root is not a polynomial, e^{x} is an exponential function not a polynomial, and the last example polynomials cannot have negative exponents, because x^{-3} is 1/x^{3} and no polynomial can contain division by a variable.

Rest of these notes up to page 19, he goes through all the possible vector spaces you can imagine because they all conform to the same properties of n-tuples, adding them elementwise, scalar multiplication, and being closed under these operations which he describes as meaning you stay within the bounds of that vector space so if it's R^{2}, you don't end up with some R^{3} result after addition or multiplication. There's errata on page 16 he forgot to include the scalar 10 but you probably figured it out anyway. The point of this was to show there are infinitely many vector spaces, and any kind of picture you see of linear algebra spaces is a metaphor because these spaces often cannot have a picture we can comprehend in multiple dimensions. Case in point the machine learning workshop we are dealing with gigantic vectors of ridiculous dimensions (rows) of data.

Did you get the proof on page 19 that W is a subspace of V because it will be closed anyway? If W doesn't contain zero, it won't matter because 0w = 0, so if you multiply anything in W by a 0 scalar then since 0w=0 just leaving it as (0 \(\cdot\) w) means that zero vector defacto exists anyway. The negative case is the same.

Stop reading at page 19 since the remainder is covered in the next 3Blue1Brown lecture (what a plane is, linear combinations, basis vectors).

### Linear combinations visualization

- Watch linear combinations and what span, linear combination, linear independence and basis is (10m).
- Watch abstract vector spaces (15m) starting at 1:50. This is the very last 3blue1brown lecture but we're already at that point in Tao's book reasoning about abstract vector subspaces like polynomials. He talks about how
*the space is really just parallel lines*(addition and scaling) and this is another reason why you should watch those Wildberger videos explaining all these concepts using only the affine parallel plane. @8:47 in this 3blue1brown video, he is demonstrating exactly what Wildberger uses for polynomial encoding which he calls polynumbers. The end of this video covers the vector subspace axioms we already read.

#### Linear Algebraic Geometry 2

Reminder these lectures are optional.

- Geometry w/Vectors - Wild Linear Algebra A 2 (44m)

All the laws of vector arithmetic: negative vectors (any vector that goes from the head of one to another vector's head), associative/commutative vectors, distributive law, 0 vector. Linear dependence/independence is also demonstrated. He actually shows tying the geometry to the algebra demonstrating some geometry theorems.

@32:48 if lambda = mu:

- lambda = 1 - mu
- mu + lambda = 1
- 2lambda = 1 (since mu = lambda)
- lambda = 1/2

### Tao Linear combinations

Reading Tao's notes starting at end of page 19 where we left off, the 'flat plane' he talks about is what you saw in the 3blue1brown visualization of a plane slicing the 3D space and going through the origin (0,0,0). His examples of R^{3} subspaces, why is the vector (x,y,z) in the line subspace x + 2y + 3z = 0 a vector subspace? Try vector addition and scalar multiplication. 0(x,y,z) = 0x + 02y + 03z = 0. Use a symbolic calculator to solve for whatever variable you want, ie: x = -2y -3z. Using vector addition: (x,y,z) + (a,b,c) = x + a + 2y + 2b + 3z + 3c = 0. This is x + a = -2y -2b -3z -3c or substituting x + a back into our original vector addition: (-2y -2b -3z -3c) + 2y + 2b + 3z + 3c = 0 and closed under addition.

Page 22, the intersection of two subspaces producing another subspace, at this stage all we can do is reason about a scalar being applied to both the n x n diagonal matrix, and the n x n tr(A) = 0 matrix and you still end up with both a diagonal matrix, and the tr(A) = 0 matrix since scaling the diagonal, if it sums to zero won't change it from summing to zero. We can also reason about the intersection operator (AND) if a vector is in the intersection of two subspaces, then it had to have been in both subspaces to begin with, meaning it was closed in both before and now their combined subspace will also be closed. Union (OR) if a vector is in the union of two subspaces, we can't assume it was in both as the definition of union is combine everything, so there's no logical guarantee that a vector closed in one subspace is now closed in the union with another subspace. I'm sure there will be a proof assignment for this later.

TODO