Patterns/Patterns-Content.tex

\chapter{Patterns}


The purpose of this chapter is to help you learn to create examples of patterns that others before you have noticed. Vector spaces are the most important pattern we will explore in this chapter.

\input{Patterns/Patterns-objectives}

\section{Recognizing and Naming Patterns}

Have you ever noticed while reading a mathematics or other science textbook that the book is filled with the following four things: (1) examples, (2) definitions, (3) theorems, and (4) proofs? One of the main goals of mathematics is to describe the patterns in the world around us, and these four things are often the ways a pattern is found and communicated.  

Most of the time, recognizing a pattern starts with \emph{examples}.  After working with lots of examples, we start to notice recurring patterns in the objects we deal with and the relationships between those objects. We create \emph{definitions} to explicitly describe the structures and objects that keep recurring.  When the patterns help us understand when and how these objects occur, or describe relationships between these objects, we create \emph{theorems}.  \emph{Proofs} provide the logical backbone to the theorems and show why the theorems are true in every case. Everything in this process starts with examples.

Learning mathematics requires learning to work with examples. The main goal of this chapter is to learn how to create examples to understand the definitions and theorems that others have discovered.  If you encounter a new idea in any scientific area, start by looking at examples.  After you have studied several simple examples, you will discover patterns that help you understand the new idea.  Sometimes, your exploration of simple examples will result in new discoveries (theorems). Examining, understanding, and generalizing from examples is one of the main tools of mathematical and scientific research.  Learning to create and look at examples is crucial.

The most important structure in linear algebra is the notion of a vector space. This is an abstract space that came from studying the properties of vectors in $\mathbb{R}^n$, and then realizing that these same properties hold in other places (like polynomials, functions, waves, and more). Before we formally define a vector space in general, we will practice creating examples to understand definition and theorems.  Most of what follows is a summary of the ideas introduced in the previous chapters. Once we have done this, we'll be ready to formally define general vector spaces. Then the real power of linear algebra begins.

As you encounter each new definition or theorem, your homework will be to create an example which illustrates the new idea.  To illustrate a definition, create several examples which satisfy the definition.  Make some of the examples simple examples, and make some examples wild and crazy examples of the definition.  Then make several examples that do not quite satisfy the definition (but are close).  Making examples of what a term means and what it does not mean will solidify the definition in your mind.

To illustrate a theorem, make a few examples that satisfy the conditions in the theorem and show that the result of the theorem is satisfied.  Again, make some examples that are simple and some that are wild and crazy.  Then make some examples that do not quite satisfy the conditions in the theorem and do not have the result hold (for example, you might make examples that satisfy all but one of the conditions in the theorem).

%
%Mathematical discoveries begin with examples. However, mathematical writing often begins with definitions. Those definitions are followed by theorems and proofs. If space permits, examples are often given to illustrate the definitions and theorems.  If appropriate, an application (example) is also given to show why a theorem is important. Those who write mathematical papers and textbooks around the nation have become accustomed to writing papers for peers in journals. Most of the time, these papers assume a lot of knowledge from the reader, and because of space limitations the papers are missing lots of examples. In order to read this kind of mathematics, you have to learn how to create examples to illustrate definitions and theorems whenever you see them.

%In this section, most of the definitions and theorems you have already seen and used in the previous chapters. The next three subsections present definitions, theorems, and then proofs. In the definitions and theorems section, I will provide examples of some of the ideas (or point you to examples from previous chapters). Your homework is to create examples to illustrate the definitions and theorems. In the proofs section, I will illustrate a few common techniques that are used to prove theorems.  The proofs in linear algebra are often shorter and easier to follow than proofs in other branches of mathematics. Those of you who plan to pursue any type of graduate study in engineering, physics, computer science, economics, statistics, and more will be expected to give proofs of the ideas you discover. In this book I will provide only some of the proofs and let you seek other references to find complete proofs to all the ideas. 



\subsection{Definitions}

Whenever you see a new definition, one of the most important things you can do is create a few examples which illustrate that definition.  Most of these words are review, but a few are new.  Make sure you know these definitions.

This first definition is a formal statement of what we mean when we write $\mathbb{R}^n$.  It is the foundation for the formal vector space definition we'll see later.
\begin{definition}
The symbol $\mathbb{R}$ is used to represent the real numbers.  
The real vector space $\mathbb{R}^n$ is a set together with a way of adding two elements (called \define{vector addition}) and a way of multiplying an element by a real number (called \define{scalar multiplication}).  
The set we use is the set of $n$-tuples $(v_1,v_2,\ldots, v_n)$ where each $v_i$ is an element of $\mathbb{R}$. 
We write this symbolically as 
$$
\{ (v_1,v_2,\ldots, v_n)\st v_i\in \mathbb{R} \text{ for } 1\leq i\leq n\}.$$
Each $n$-tuple $\vec v = (v_1,v_2,\ldots, v_n)$ is called a vector. 
The entries of the vector are called the \define[vector!components]{components} of the vector.  Vector addition is performed by adding the components of the vectors. Scalar multiplication is done by multiplying each component by the scalar. 
\end{definition}


\begin{definition} 
\marginpar{See example \ref{ex dot product}.}%
\label{def dot product}
The \define{dot product} of two vectors $\vec u = \left<u_1,u_2,\ldots,u_n\right>$ and $\vec v =\left<v_1,v_2,\ldots,v_n\right>$ is the scalar $\vec u\cdot \vec v = u_1v_1+u_2v_2+\cdots+u_nv_n = \sum u_iv_i$. We say that two vectors are \define{orthogonal} if their dot product is zero.
\end{definition}

\begin{definition} \label{def matrix product} 
A (real) \define{matrix} is a rectangular grid of real numbers.
We use the notation $a_{ij}$ to represent the entry in the $i$th row and $j$th column of the matrix $A$.  
If a matrix $A$ has $m$ rows and $n$ columns, we say the matrix has \define[matrix!size]{size} $m$ by $n$ and write $A_{mn}$. 
The \define[matrix!sum]{sum} of two matrices of the same size is performed by adding corresponding entries.  
The \define[matrix!product]{product} of two matrices $A_{m n} = [a_{ij}]$ and $B_{np} =[b_{jk}]$ is a new matrix $C=AB$ where the $ik$th entry of $C$ is $c_{ik}=\sum_{j=1}^n a_{ij}b_{jk}$ (the dot product of the $i$th row of $A$ and the $k$th column of $B$).
\end{definition}

\begin{definition} \label{def rref}
\marginpar{See example \ref{ex rref last}. Every matrix we obtain in the row reduction process of $A$ is row equivalent to all the other matrices obtained from $A$ by doing row operations, and to the reduced row echelon form of $A$. Each of the matrices in example \ref{ex rref last} is row equivalent to all the other matrices in the example.}%
 We say that two matrices are \define[matrix!row equivalent]{row equivalent} if one can be obtained from the other by a finite sequence of these three row operations: (1)~interchange two rows, (2) multiply one row by a nonzero constant, and (3) add to a row a multiple of a different row. 

A matrix is said to be in \define{row echelon form} (ref) if 
(1) each nonzero row begins with a 1 (called a \define{leading 1}), 
(2) the leading 1 in a row occurs further right than a leading 1 in the row above, and
(3) any rows of all zeros appear at the bottom.
The location in the matrix of each leading 1 is called a \define{pivot}, and the column containing the pivot is called a \define{pivot column}.
A matrix is said to be in \define{reduced row echelon form} (rref) if 
the matrix is in row echelon form, and each pivot column contains all zeros except for the pivot (the leading one).

\end{definition}

\begin{definition}[Homogeneous]
A linear system $A\vec x=\vec b$ is said to be either \define{homogeneous} if $\vec b=\vec 0$ or \define{nonhomogeneous} if $\vec b\neq \vec 0$.  
\end{definition}
\begin{definition}[Consistent]
A linear system is said to be \define{consistent} if it has at least one solution and  \define{inconsistent} if it has no solutions.
\end{definition}



\begin{definition}[Linear combination, Span]
\marginpar{See example \ref{ex span}.}%
A \define[vectors!linear combination]{linear combination} of vectors $\{\vec v_1,\vec v_2,\ldots,\vec v_n\}$ is a sum $$c_1 \vec v_1+c_2\vec v_2+\cdots+c_n\vec v_n = \sum_{i=1}^n c_i\vec v_i$$ where each scalar $c_1,c_2,\ldots, c_n$ is a real number. 
The \define{span} of a set of vectors $\{\vec v_1,\vec v_2,\ldots,\vec v_n\}$ is the set of all possible linear combinations of those vectors.
\end{definition}

\begin{definition}[Linear independence] \label{def linear independence}
\marginpar{See example \ref{first linearly independent example}. }%
We say a set $\{\vec v_1,\vec v_2,\ldots,\vec v_n\}$ of vectors is \define{linearly independent} if the only solution to the homogeneous equation $c_1 \vec v_1+c_2\vec v_2+\cdots+c_n\vec v_n=\vec 0$ is the trivial solution $c_1=c_2=\cdots=c_n=0$. If there is a nonzero solution to this homogeneous equation, then we say the set of vectors is \define{linearly dependent}, or we just say the vectors are linearly dependent (leave off ``set of'').
\end{definition}

\begin{definition}[Basis]
\label{def:basis for Rn}
A \define[basis!$\RR^n$]{basis} for $\RR^n$ is a set of vectors that are (1) linearly independent and (2) span all of $\RR^n$.
The \define[standard basis!$\RR^n$]{standard basis} for $\RR^n$ is the set 
$$\{\vec e_1 = (1,0,0,\ldots,0,0),\, 
\vec e_2 = (0,1,0,\ldots,0,0),\, 
\ldots,\, 
\vec e_n = (0,0,0,\ldots,0,1)\}.$$
\end{definition}

\begin{definition}[Column space, Rank] 
\marginpar{See examples \ref{colspace1ex} and \ref{colspace2ex}.}%
The \define{column space} of a matrix is the span of the column vectors. 
A \definestyle{basis for the column space} of a matrix is a set of linearly independent vectors whose span is the column space. 
The \definestyle{dimension} of the column space is the number of vectors in a basis for the column space. 
The \define{rank} of a matrix is the dimension of the column space.
\end{definition}


\begin{definition}[Row space] 
\marginpar{See examples \ref{colspace1ex} and \ref{colspace2ex}.}%
The \define{row space} of a matrix is the span of the row vectors. 
A \definestyle{basis for the row space} is a set of linearly independent vectors whose span is the row space. 
The \definestyle{dimension} of the row space is the number of vectors in a basis for the row space.
\end{definition}

\begin{definition}[Coordinates]
\label{def coordinates for Rn}
\marginpar{See examples \ref{colspace1ex} and \ref{colspace2ex}.}%
If $\beta=\{\vec b_1,\vec b_2,\ldots,\vec b_n\}$ is a basis for a vector space, and a vector $\vec v$ equals the linear combination 
$$ \vec v = c_1\vec b_1+c_2\vec v_2 + \cdots c_n\vec v_n, $$ then we call the scalars $c_1,\,c_2,\,\ldots,\, c_n$ the  \define{coordinates} of $\vec v$ relative to the basis $\beta$. We may write $\vec v = (c_1,c_2,\ldots, c_n)_\beta$ to represent the vector $\vec v$, or if the basis $\beta$ is understood, we may just write $\vec v = (c_1,c_2,\ldots, c_n)$.
\end{definition}

\begin{definition}[Diagonal, Identity, Triangular matrices]
A \define[matrix types!diagonal]{diagonal matrix} is a square matrix in which the entries off the main diagonal are zero ($a_{ij}=0$ if $i\neq j$). The \define[matrix types!identity]{identity matrix} is a diagonal matrix in which each entry on the main diagonal is 1 ($a_{ii}=1$). We often write $I$ to represent the identity matrix and $I_n$ to represent the $n\times n$ identity matrix when specifying the size is important. An \define[matrix types!upper triangular]{upper triangular matrix} is a square matrix in which every entry below the main diagonal is zero. A \define[matrix types!lower triangular]{lower triangular matrix} is a square matrix in which every entry above the main diagonal is zero.
$$\begin{array}{cccc}
\begin{bmatrix} 3&0&0\\0&-2&0\\0&0&8\end{bmatrix} 
&\begin{bmatrix} 1&0&0\\0&1&0\\0&0&1\end{bmatrix} 
&\begin{bmatrix} 0&3&2\\0&4&0\\0&0&-1\end{bmatrix}
&\begin{bmatrix} 0&0&0\\3&4&0\\2&0&-1\end{bmatrix} 
\\
\text{diagonal}
&\text{identity}
&\text{upper triangular}
&\text{lower triangular}
\end{array}$$
\end{definition}



\begin{definition}[Determinant] 
\marginpar{See example \ref{ex det}.}%
The \define{determinant} of a $1\times 1$ matrix is the entry $a_{11}$. A \define{minor} $M_{ij}$ of an $n\times n$ matrix is the determinant of the $(n-1)\times (n-1)$ submatrix obtained by removing the $i$th row and $j$th column. A \define{cofactor} of a matrix is the product $C_{ij}=(-1)^{i+j}M_{ij}$. The determinant of an $n\times n$ matrix ($n\geq 2$) is found by multiplying each entry $a_{1i}$ on the first row of the matrix by its corresponding cofactor, and then summing the result. Symbolically, we write  $$\det A = |A| = a_{11}C_{11}+a_{12}C_{12}+\cdots+a_{1n}C_{1n} = \sum_{j=1}^n a_{1j}C_{1j}.$$ 
\end{definition}


\begin{definition}[Inverse] 
\marginpar{See example \ref{ex inverse}}%
The \define{inverse} of a square matrix $A$ is a matrix $B$ such that $AB=I$ and $BA=I$, where $I$ is the identity matrix.   If a square matrix has an inverse, we say the matrix is \define{nonsingular} or \define{invertible}.  If a square matrix does not have an inverse, we say the matrix is \define{singular} or \define{noninvertible}.
\end{definition}

\begin{definition}[Transpose, Symmetric]
\marginpar{See example \ref{ex transpose}.}%
The \define[matrix!transpose]{transpose} of a matrix $A_{m\times n}=\{a_{ij}\}$ is a new matrix $A^T_{n\times m}$ in which the $i$th column of $A$ is the $i$th row of $A^T$.  A \define[matrix types!symmetric]{symmetric matrix} is a matrix $A$ such that $A=A^T$.
\end{definition}


\begin{definition}[Eigenvector, Eigenvalue] 
\marginpar{See examples \ref{ex eigen1} and \ref{ex eigen2}. 
  Note that $\vec x=\vec 0$ is never an eigenvector.}%
An \define{eigenvector} of a square matrix $A$ is a nonzero vector $\vec x$ with the property that $A\vec x = \lambda \vec x$ for some scalar $\lambda$.  We say that $\vec x$ is an eigenvector corresponding to the \define{eigenvalue} $\lambda$. 
\end{definition}





















%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems
%%%%%%%%%%%%%%%%Theorems


\subsection{Theorems}

The definitions above summarize most of what we have done in the first two chapters. The list of properties below summarize many of the patterns we have already observed, together with many patterns that we will be exploring as the semester progresses.  Each is called a theorem because it is a statement that requires justification (a proof). In the homework, your job is to create examples to illustrate these theorems.  If a theorem is hard to remember, then create more examples to illustrate the theorem until you recognize the patterns that emerge.  Mathematics is all about finding patterns and then precisely stating the pattern in words. You do not have to memorize all of these theorems, but you should be able to create examples to illustrate them. In the next section, we will prove some of them.

Before stating the theorems, we first need to discuss the phrase ``if and only if.'' Theorem \ref{every column pivot iff independent} states 
\begin{quote}
The columns are linearly independent if and only if every column is a pivot column.
\end{quote}
We could write this statement symbolically by writing ``$p$ if and only if $q$.''
\marginpar{The statement ``$p$ if and only if $q$'' implies both 
\begin{enumerate}
	\item if $p$ then $q$, and 
	\item if $q$ then $p$.
\end{enumerate}
}
The words ``if and only if'' create two if-then statements, namely (1) if $p$ then $q$ and (2) if $q$ then $p$. This gives us the two statments:
\begin{enumerate}
	\item \emph{If} the columns are linearly independent \emph{then} every column is a pivot column.
	\item \emph{If} every column is a pivot column \emph{then} the columns are linearly independent.
\end{enumerate}
Every time you see the words ``if and only if,'' remember that this means you have two if-then statements.  That means that you need at least two examples to illustrate such a theorem (one for each if-then statement).



\subsubsection{Vector Properties}
\begin{theorem}[Vector space properties of $\mathbb{R}^n$]\label{rn vector space properties}
Vector addition and scalar multiplication in $\mathbb{R}^n$ satisfy the following properties:
\begin{enumerate}
	\item[($A_1$)] Vector addition is associative: $(\vec u+\vec v)+\vec w = \vec u +(\vec v+\vec w)$.
	\item[($A_2$)] Vector addition is commutative: $\vec u+\vec v= \vec v+\vec u$.
	\item[($A_3$)] The zero vector $\vec 0$ satisfies $\vec 0+\vec v = \vec v+\vec 0=\vec v$.
	\item[($A_4$)] The additive inverse of $\vec v$, denoted $-\vec v$, is the vector which satisfies $\vec v+(-\vec v)=0$.
	\item[($M_1$)] Scalar multiplication distributes across vector addition: $c(\vec u+\vec v)= c\vec u + c\vec v$.
	\item[($M_2$)] Scalar multiplication distributes across scalar addition: $(c+d)\vec v= c\vec v+ d\vec v$.
	\item[($M_3$)] Scalar multiplication is associative: $(cd)\vec v = c(d\vec v)$
	\item[($M_4$)] Scalar multiplication by 1 does nothing: $1\vec v=\vec v$
\end{enumerate}
\end{theorem}

\subsubsection{System Properties}
 
\begin{theorem}\label{thm unique rref}
Every matrix has a unique reduced row echelon form (rref). This implies that for any matrix, the location of the pivot columns is unique (the locations do not depend on how the matrix is reduced to rref). 
\end{theorem}

\begin{theorem}\label{thm row equivalent}
Two matrices are row equivalent if and only if they have the same reduced row echelon form.
\end{theorem}


\begin{theorem} \label{pivot columns iff} \label{thm solution iff augmented not pivot}
A linear system of equations $A\vec x=\vec b$ has a solution if and only if the number of pivot columns of $A$ equals the number of pivot columns of the augmented matrix $[A\mid b]$.
\end{theorem}

\begin{theorem}\label{thm no, one, or infinitely many solutions}
If a linear system of equations has more than one solution, it has infinitely many.  Solving a linear system has three possible outcomes: (1) no solution, (2) exactly one solution, or (3) infinitely many solutions. 
\end{theorem}

\begin{theorem}[Removing non-pivot columns does not alter the row operations involved in obtaining rref] Let $A$ be a matrix whose rref is $R$.  Let $B$ be the matrix obtained from $A$ by removing the $k$th column.  If the $k$th column of $A$ is not a pivot column, then the rref of $B$ is found by removing the $k$th column from $R$. 
\end{theorem}

\begin{theorem}\label{thm n-k free variables}
If a consistent linear system of equations $A\vec x=\vec b$ has $n$ variables and $A$ has $k$ pivot columns, then the system has $n-k$ free variables. 
\end{theorem}

\begin{theorem}[Superposition Principle] \label{thm superposition}\label{thm homogeneous}
For a homogeneous system $A\vec x = \vec 0$, any linear combination of solutions is also a solution.
\end{theorem}

\begin{theorem}\label{thm nonhomogeneous}
For a nonhomogeneous system $A\vec x=\vec b$, the difference $\vec x_1-\vec x_2$ between any two solutions is a solution to the homogeneous system $A\vec x=\vec 0$. 
\end{theorem}





\subsubsection{Matrix Properties}

\begin{theorem} [Vector space properties of $M_{m,n}$]\label{matrix vector space properties}
Let $A,B$ and $C$ be $m$ by $n$ matrices, and $c,d$ be real numbers.  Matrix addition and scalar multiplication satisfy the following properties:
\begin{enumerate}
	\item[($A_1$)] Matrix addition is associative: $(A+B)+C = A +(B+C)$.
	\item[($A_2$)] Matrix addition is commutative: $A+B=B+A$.
	\item[($A_3$)] The zero matrix $0$ satisfies $0+A = A+0=A$.
	\item[($A_4$)] The additive inverse of $A$ is $-A$ which satisfies $A+(-A)=0$.
	\item[($M_1$)] Scalar multiplication distributes across matrix addition: $c(A+B)= cA + cB$.
	\item[($M_2$)] Scalar multiplication distributes across scalar addition: $(c+d)A= cA+ dA$.
	\item[($M_3$)] Scalar multiplication is associative: $(cd)A = c(dA)$.
	\item[($M_4$)] Scalar multiplication by 1 does nothing: $1A=A$.
\end{enumerate}
\end{theorem}

\begin{theorem}[Transpose of a product]
\label{thm transpose of product}
The transpose of a product is found by multiplying the transposes in the reverse order:
$$(AB)^T=B^T A^T.$$
\end{theorem}

\begin{theorem}\label{linearcombcol}
The product $A\vec x$ of a matrix $A$ and column vector $\vec x$ is a linear combination of the columns of $A$. 
\end{theorem}

\begin{theorem}\label{linearcombrow}
The product $\vec x A$ of a row vector $\vec x$ and matrix $A$ is a linear combination of the rows of $A$.
\end{theorem}

\begin{theorem}\label{matrix multiplication two ways}
Consider the matrices $A_{m\times n} = \begin{bmatrix}\vec a_1 & \vec a_2 &\cdots &\vec a_n\end{bmatrix}$ and 
$B_{n\times p} = \begin{bmatrix}\vec b_1 & \vec b_2 &\cdots &\vec b_p\end{bmatrix}$, where $\vec a_i$ and $\vec b_i$ are the columns of $A$ and $B$.  The product $AB$ equals $AB = \begin{bmatrix}A\vec b_1 & A\vec b_2 &\cdots &A\vec b_p\end{bmatrix}$. In other words, every column of $AB$ is a linear combination of the columns of $A$.
Similarly, every row of $AB$ is a linear combination of the rows of $B$.
\end{theorem}



\subsubsection{Linear Independence Properties}

\begin{theorem}\label{thm rank equal pivot columns}
The rank of a matrix is the number of pivot columns, which equals the number of leading 1's. 
\end{theorem}

\begin{theorem}\label{every column pivot iff independent}
The columns of a matrix are linearly independent if and only if every column is a pivot column.
\end{theorem}

\begin{theorem}\label{dependentiff}
  A set (of two or more vectors) is linearly dependent if and only if at least one of the vectors can be written as a linear combination of the others. In particular, two vectors are linearly dependent if and only if one is a multiple of the other.  Definition~\ref{def linear independence} gives a test for determining if a set of vectors is linearly dependent.
\end{theorem}


\begin{theorem}
The system $A\vec x=\vec b$ is consistent if and only if $\vec b$ is in the column space of $A$. 
\end{theorem}


\begin{theorem}\label{rankrowcolumn}
The dimension of the column space (the rank) of a matrix equals the dimension of the row space of the matrix.
\end{theorem}


\begin{theorem}\label{thm colsp basis}
Consider a matrix $A$ whose columns are $\vec v_1,\vec v_2,\ldots,\vec v_n$. A basis for the column space is the set of pivot columns of $A$.  Let $R$ be the reduced row echelon form of $A$ with any rows of zeros removed. For each column $\vec v_k$ of $A$, the $k$th column of $R$ gives the coordinates of $\vec v_k$ relative to the basis of pivot columns.
  
In terms of matrix multiplication, if $C$ is the matrix obtained from $A$ by erasing the non-pivot columns, and $\vec r_k$ is the $k$th column of $R$, then $C\vec r_k=v_k$. The columns of $R$ tell us how to linearly combine the pivot columns of $A$ to obtain all columns of $A$.
\end{theorem}

\begin{theorem}\label{thm rowsp basis}
Consider a matrix $A$ whose rows are $\vec v_1,\vec v_2,\ldots,\vec v_m$. Let $R$ be the reduced row echelon form of $A$ with any rows of zeros removed. A basis for the row space is the set of nonzero rows of $R$.  Let $C$ be the matrix obtained from $A$ by erasing the non-pivot columns of $A$. For each row $\vec v_k$ of $A$, the $k$th row of $C$ gives the coordinates of $\vec v_k$ relative to the basis of nonzero rows of $R$.
  
In terms of matrix multiplication, if $\vec r_k$ is the $k$th row of $C$, then $\vec r_k R=v_k$. The rows of $C$ tell us how to linearly combine the rows of $R$ to obtain all rows of $A$.
\end{theorem}





\subsubsection{Inverse Properties}

The following theorems hold for invertible (square) matrices.


\begin{theorem}\label{invunique}
The inverse of a matrix is unique.
\end{theorem}

\begin{theorem} \label{thm inverse of inverse}
The inverse of $A\inv$ is $A$. This implies that $(A^k)\inv=(A\inv)^k$ and $(cA)\inv = \frac{1}{c}A\inv$.
\end{theorem}

\begin{theorem}\label{thm inverse of transpose}
The inverse of the transpose is the transpose of the inverse: $(A^T)\inv = (A\inv)^T$.
\end{theorem}

\begin{theorem}[Socks-Shoes property]\label{invprod}
The inverse of a product is $(AB)\inv = B\inv A\inv$. 
\end{theorem}

In the previous theorem, notice that the order of multiplication is reversed. You put on your socks and then your shoes. To undo this, you first take off your shoes, then your socks.

\begin{theorem}
If $A$ is invertible, then the solution to $A\vec x=\vec b$ is $\vec x=A\inv \vec b$.
\end{theorem}



\subsubsection{Determinant Properties}

The following theorems hold for square matrices.

\begin{theorem}
The determinant can be computed by using a cofactor expansion along any row or any column. Symbolically, we can write this as $\ds |A| = \sum_{i=1}^n a_{ij} C_{ij}$ for any column $j$,  or $\ds |A| = \sum_{j=1}^n a_{ij} C_{ij}$ for any row $i$.
\end{theorem}


\begin{theorem}\label{thm det triangular}
The determinant of a triangular matrix is the product of the entries along its main diagonal.
\end{theorem}

\begin{theorem}\label{thm det product}
The determinant of $AB$ is $|AB|=|A||B|$. This implies that $|cA|=c^n|A|$ and $|A\inv|=\frac{1}{|A|}$. 
\end{theorem}

\begin{theorem}\label{thm det transpose}
The determinant of $A$ and $A^T$ are the same: $|A^T| = |A|$.
\end{theorem}

\begin{theorem}\label{thm det zero}
A matrix is invertible if and only if its determinant is not zero.
\end{theorem}

\begin{theorem}\label{thm det multiple columns}
If one column of a matrix is a multiple of another column, then the determinant of the matrix is zero. 
If one row of a matrix is a multiple of another row, then the determinant of the matrix is zero. 
\end{theorem}

\begin{theorem}\label{thm adjugate}
Let $A$ be a square matrix. Define the adjugate of $A$, written $\adjugate(A)$, to be the matrix whose $ij$th entry is the $ji$th cofactor of $A$ (the transpose of matrix of cofactors).  
The product of the adjugate and the original matrix is $\adjugate(A) A = |A|I$, a multiple of the identity matrix. 
If the determinant of $A$ is nonzero, this implies that the inverse of $A$ is $A^{-1}=\frac{1}{|A|}\adjugate(A)$.
\end{theorem}




\subsubsection{Eigenvalues and Eigenvectors}
The following theorems hold for square matrices.

\begin{theorem}[How to compute eigenvalues and eigenvectors] \label{thm compute eigen}
The nonzero vector $\vec x$ is an eigenvector of $A$ corresponding to the eigenvalue $\lambda$ if and only if $(A-\lambda I)\vec x=\vec 0$ and $\det(A-\lambda I)=0$.
\end{theorem}

\begin{theorem}\label{thm characteristic degree n}
For an $n$ by $n$ real matrix, the characteristic polynomial is an $n$th degree polynomial. This means there are $n$ eigenvalues (counting multiplicity and complex eigenvalues). 
\end{theorem}

\begin{theorem}\label{thm eigen triangular}
For a triangular matrix (all zeros either above or below the diagonal), the eigenvalues appear on the main diagonal.
\end{theorem}

\begin{theorem}\label{thm eigen transpose}
The eigenvalues of $A^T$ are the same as the eigenvalues of $A$. 
\end{theorem}

\begin{theorem}\label{thm eigenvalues and sums}
If the sum of every row of $A$ is the same, then that sum is an eigenvalue corresponding to the eigenvector $(1,1,\ldots,1)$. 
If the sum of every column of $A$ is the same, then that sum is an eigenvalue of $A$ as well (because it is an eigenvalue of $A^T$). 
A stochastic matrix (e.g., a transition matrix for a Markov process) always has 1 as an eigenvalue because the columns sum to 1. 
\end{theorem}

\begin{theorem}\label{thm independent eigenvectors}
Let $A$ be an $n$ by $n$ matrix.  Suppose there are $n$ distinct eigenvalues of $A$, named $\lambda_1, \lambda_2, \ldots, \lambda_n$. Let $\vec x_i$ be an eigenvector corresponding to $\lambda_i$ for each $i$.  The set $\{\vec x_1,\vec x_2, \ldots,\vec x_n\}$ is linearly independent, and hence is a basis for $\mathbb{R}^n$.
\end{theorem}

\begin{theorem}\label{thm spectral}
If a matrix is symmetric ($A^T=A$), then the eigenvalues of $A$ are real. In addition, eigenvectors corresponding to different eigenvalues of a symmetric matrix are orthogonal (if $\vec x_1$ and $\vec x_2$ are eigenvectors corresponding to different eigenvalues, then $\vec x_1\cdot \vec x_2=0$).  
\end{theorem}











\subsection{Proving theorems}

In this section I'll illustrate various methods of justifying a theorem (giving a proof). You do not have to be able to prove every one of the theorems from the previous section. If you want a challenge, try to prove all of the theorems below.

\subsubsection{Prove by using a definition}
To prove theorem \ref{invprod} ($(AB)\inv=B\inv A \inv$), we just use the definition of an inverse. It often helps to rewrite a definition, perhaps changing the names of variables if needed. The inverse of a matrix $C$ is a matrix $D$ such that $CD=DC=I$ (multiplying on either side gives the identity). To show that the inverse of $AB$ is $B\inv A\inv$, we need to show that multiplying $AB$ on both the left and right by $B\inv A\inv$ results in the identity. We multiply $AB$ on the left by $B\inv A\inv$ and compute 
$$(B\inv A\inv)(AB) = B\inv (A\inv A)B = B\inv IB = BB\inv=I.$$
Similarly, we multiply $AB$ on the right by $B\inv A\inv$ and compute 
$$(AB)(B\inv A\inv) = A(BB\inv) A\inv = AIA\inv = AA\inv=I.$$
 Because multiplying $AB$ on both the left and right by $B\inv A\inv$ gives us the identity matrix, we know $B\inv A\inv$ is the inverse of $AB$. All we did was use the definition of an inverse.

\subsubsection{Proving something is unique}
To show something is unique, one common approach is to assume there are two things and then show the two things must be the same.  
We'll do this to show that inverses are unique (Theorem \ref{invunique}).  
Suppose that $A$ is a square matrix with an inverse $B$ and another inverse $C$.  
Using the definition, we know that $AB=BA=I$, and $AC=CA=I$.  
We now need to show that $B=C$ (show the two things are the same). 
If we multiply both sides of the equation $AB=I$ on the left by $C$, then we obtain $C(AB)=CI$. 
Since matrix multiplication is associative, we can rearrange the parentheses on the left to obtain $(CA)B=C$.  
Since $C$ is an inverse for $A$, we know $CA=I$, which means $IB=C$, or $B=C$, which shows that the inverse is unique.   



\subsubsection{Prove by using another theorem}
Sometimes using other theorems can quickly prove a new theorem.  
Theorem \ref{linearcombrow} says that the product $\vec x A$ is a linear combination of the rows of $A$.  
The two theorems right before this one (Theorems~\ref{thm transpose of product} and \ref{linearcombcol}) state that $(AB)^T=B^TA^T$ and that $A\vec x$ is a linear combination of the columns of $A$.  
Rather than prove Theorem \ref{linearcombrow} directly, we are going to use these other two theorems.  
The transpose of $\vec x A$ is $A^T\vec x^T$, which is a linear combination of the columns of $A^T$.  
However, the columns of $A^T$ are the rows of $A$, so $A^T\vec x^T$ is a linear combination of the rows of $A$.  
This means that $\vec x A$ (undoing the transpose) is a linear combination of the rows of $A$. 
Theorems which give information about columns often immediately give information about rows as well.


\subsubsection{If and only if}
The words ``if and only if'' require that an ``if-then'' statement work in both directions.  The following key theorem (Theorem~\ref{dependentiff}) has an if and only if statement in it. 
\begin{quote}
A set of two or more vectors is linearly dependent if and only if one of the vectors is a linear combination of others.  
\end{quote}
To prove this, we need to show two things:
\begin{enumerate}
	\item If the vectors are linearly dependent, then one is a linear combination of the others.
	\item If one of the vectors is a linear combination of the others, then the vectors are linearly dependent.
\end{enumerate}
We'll start by proving the first item.  If the vectors are dependent, then there is a nonzero solution to the equation $$c_1 \vec v_1+c_2\vec v_2+\cdots+c_n\vec v_n=\vec 0.$$ Pick one of the nonzero constants, say $c_k$. Subtract $c_k\vec v_k$ from both sides and divide by $-c_k$ (which isn't zero) to obtain 
$$\frac{c_1}{-c_1}\vec v_1+\cdots+\frac{c_{k-1}}{-c_k}\vec v_{k+1}+\frac{c_{k+1}}{-c_k}\vec v_{k+1}+\cdots+\frac{c_n}{-c_k}\vec v_n=\vec v_k,$$
which means that $v_k$ is a linear combination of the other vectors.

Now let's prove the second item. Suppose one of the vectors is a linear combination of the others (say the $k$th). 
Write 
$$v_k=c_1\vec v_1+\cdots+c_{k-1}\vec v_{k+1}+c_{k+1}\vec v_{k+1}+\cdots+c_n\vec v_n.$$  
Subtracting $v_k$ from both sides gives a nonzero solution ($c_k=-1$) to the equation $\vec 0=c_1\vec v_1+\cdots+c_{k-1}\vec v_{k+1}-v_k+c_{k+1}\vec v_{k+1}+\cdots+c_n\vec v_n$, which means the vectors are linearly dependent.

\subsubsection{Proof by contradiction}
Sometimes in order to prove a theorem, we make a contradictory assumption and then show that this assumption leads to an error (which means the assumption cannot be correct).  
We will do this to show that if $A$ is invertible, then $|A|\neq 0$. 
We will also use the fact that the determinant of a product is the product of the determinants, $|AB|=|A||B|$.
Let's start with a matrix $A$ which has an inverse and then assume that the determinant is $|A|=0$.  
The inverse of $A$ is $A\inv$ and the product $AA\inv=I$ is the identity. 
Taking determinants of both sides gives $|AA\inv|=|I|=1$, or $|A||A\inv|=1$ (remember the identity matrix has determinant 1).  
Since we assumed that $|A|=0$, we have $0|A\inv|=0=1$, which is absurd. 
Our assumption that $|A|=0$ must be incorrect. 
We must have $|A|\neq 0$.    

\subsubsection{Prove $a=b$ by showing $a\leq b$ and $b\leq a$}
If $a\leq b$ and $b\leq a$, then $a=b$. 
Sometimes it is easier to show an inequality between two numbers than it is to show the numbers are equal.  
We will use this to show that the rank of a matrix (the dimension of the column space, which is the number of pivot columns) equals the dimension of the row space.
Let the rank of $A$ be $r$, and let the dimension of the row space equal $s$. 
Write $A=CR$ where the columns of $C$ are the pivot columns of $A$ and the rows of $R$ are the nonzero rows of the rref of $A$. 
We know that the rank $r$ equals the number of columns in $C$. 
Since every row of $A$ can be written as a linear combination of the rows of $R$, the rows of $R$ span the rows space. 
We know that a basis for the row space cannot contain more than $r$ vectors since there are $r$ rows in $R$. 
This means that the dimension $s$ of the row space must satisfy $s\leq r$.
 
We now repeat the previous argument on the transpose of $A$. Write $A^T = C^\prime R^\prime$ where the columns of $C^\prime$ are the pivot columns of $A^T$ and the rows of $R^\prime$ are the nonzero rows of the rref of $A^T$. The columns of $C^\prime$ are linearly independent rows of $A$, so $C^\prime$ has $s$ columns, and $R^\prime$ has $s$ rows. Since every row of $A^T$ can be written as a linear combination of the rows of $R^\prime$, a basis for the row space of $A^T$ (i.e. the column space of $A$) cannot have more than $s$ vectors, or $r\leq s$. Combining both inequalities, since $r\leq s$ and $s\leq r$, we have $r=s$.

%\subsubsection{Proofs involving $n$ elements}
%We have been using the idea that a non-pivot column is a linear combination of the pivot columns preceding it (Theorem \ref{pivotcol}). To prove this, we need to work with summation notation. I'll start by repeating what we are trying to prove. Consider a matrix $A$ whose columns are $\vec v_1,\vec v_2,\ldots,\vec v_n$. Let $R$ be the reduced row echelon form of $A$ with any rows of zeros removed. If $\vec v_k$ is not a pivot column, then $\vec v_k$ is a linear combination of the preceding vectors. In particular, if $C$ is the matrix obtained from $A$ by erasing the non-pivot columns, and $\vec r_k$ is the $k$th column of $R$, then $C\vec r_k=v_k$. In other words, the columns of $R$ tell us how to linearly combine the pivot columns of $A$ to obtain columns of $A$.
%
%Consider the equation $c_1 \vec v_1 +c_2 \vec v_2 +\cdots \vec c_n \vec v_n=0$ (in matrix form, just list the vectors and augment by zero). Since $v_k$ is not a pivot column, $c_k$ is a free variable. Let $c_k=-1$ and all other free variables equal zero. This allows us to write $c_1 \vec v_1 +c_2 \vec v_2 +\cdots+(-1)\vec v_k+\cdots+ \vec c_n \vec v_n=0$ or $$\vec v_k=c_1 \vec v_1 +c_2 \vec v_2 +\cdots+c_{k-1}\vec v_{k-1}+c_{k+1}\vec v_{k+1}+\cdots+ \vec c_n \vec v_n,$$ which means that $\vec v_k$ is a linear combination of the other vectors. Since every free variable is zero, we see that $v_k$ is a linear combination of the pivot columns of $A$. Because each free variable other that $c_k$ is zero, for all $i>k$ we have $c_i=0$ (non free variables only depend on the free variables which follow them).  This means that $v_k$ is linear combination of the vectors which precede it. To simplify notation below, let $C = \begin{bmatrix}\vec u_1 & \cdots &\vec u_m\end{bmatrix}$ be the matrix whose columns are the pivot columns of $A$.  Rename the constants above to obtain $\vec v_k = d_1\vec u_1 + \cdots +\d_m\vec u_m$, which corresponds to the augmented system $\begin{bmatrix}\vec u_1& \cdots &\vec u_m & \vec v_k\end{bmatrix}$ whose reduced row echelon form has all pivot columns except has $(d_1,\ldots,d_n)$ in the augmented system. 
%
%I need more here.  I will add some soon.  The connection between non-pivot columns, linear combinations, and reduced row echelon form are crucial.


\note{add a section about proof by induction, with an easy example}















\section{A huge equivalence: Connecting all the ideas}

\begin{theorem}\label{huge equivalence}
 For an $n$ by $n$ matrix $A$, the following are all equivalent (meaning you can put an if and only if between any two of these statements):
\begin{enumerate}
	\item The system $A\vec x=\vec 0$ has only the trivial solution.
	\item The system $A\vec x=\vec b$ has a unique solution for every $\vec b$.
	\item The rref of $A$ is the identity matrix.
	\item Every column of $A$ is a pivot column. 
	\item The columns of $A$ are linearly independent. (The dimension of the column space is $n$ and the span of the columns of $A$ is ${\mathbb{R}}^n$.)
	\item The rank of $A$ is $n$.
	\item The rows of $A$ are linearly independent. (The dimension of the row space is $n$ and the span of the rows of $A$ is ${\mathbb{R}}^n$.)
	\item The vectors $\vec e_1 = (1,0,0,\ldots,0),\, \vec e_2 = (0,1,0,\ldots,0),\,\ldots,\, \vec e_n = (0,0,0,\ldots,1)$ are in the column space of $A$.
	\item $A$ is invertible.
	\item $|A|\neq 0$.
	\item The eigenvalues of $A$ are all nonzero.	
\end{enumerate}
\end{theorem}

We have been using this theorem periodically without knowing it. Because this theorem is true, there is an if and only if between any pair of items.  In addition, because there is an if and only if between every item, we can also negate every single statement and obtain an equivalence as well.  In this negated equivalence, we use the symbol $\iff$ to represent the words ``if and only if.''
\begin{theorem}\label{huge equivalence negated}
	There is more than one solution to  $A\vec x=0$ $\iff$ the system $A\vec x=\vec b$ does not have a unique solution (it may not have one at all) $\iff$ $\rref(A)\neq I$  $\iff$ at least one column is not a pivot column  $\iff$ the columns are linearly dependent (the dimension of the column space is less than $n$)  $\iff$ the rank of $A$ is less than $n$  $\iff$ the rows are linearly dependent  $\iff$ $A$ is not invertible $\iff$ $\det(A)=0$  $\iff$ zero is an eigenvalue.
\end{theorem}

The results of these two theorems involve two if-then statements (an if and only if) between every pair of statements in the same theorem, making 110 different theorems altogether. We can prove the theorem much more quickly than writing out 110 different proofs, though.  For example, if we want to show four statements are equivalent, then we can show that (1) implies (2), (2) implies (3), (3) implies (4), and (4) implies (1). Then we will have shown that (1) also implies (3) and (4) because we can follow a circle of implications to obtain an implication between any two statements.  To prove the theorem above, we need to show a chain of 11 implications that eventually circles back on itself.

\begin{proof}

[(1) implies (2)] If the system $A\vec x=\vec 0$ has only the trivial solution, then the reduced row echelon form of $[A\mid \vec 0]$ is $[I\mid \vec 0]$. The row operations used to reduce this matrix will be the same as the row operations used to reduce $[A\mid\vec b]$. Hence the rref of $[A\mid\vec b]$ will be $[I\mid \vec x]$, where $A\vec x=\vec b$, so there will be a unique solution.  

[(2) implies (3)]  With a unique solution to $A\vec x =\vec b$ for any $\vec b$, the solution to $A\vec x = \vec 0$ is $\vec x = \vec 0$.  This means that the rref of $[A|\vec 0]$ is $[I|\vec 0]$. Since removing the last column of zeros won't change any row operations, the rref of $A$ is $I$. 

[(3) implies (4)] Since the reduced row echelon form of $A$ is $I$, then each column contains a leading 1, and hence is a pivot column

[(4) implies (5)] Since each column is a pivot column, the only solution to $c_1 \vec v_1+c_2\vec v_2+\cdots+c_n\vec v_n=\vec 0$ (where $\vec v_i$ is the $i$th column of $A$) is the trivial solution $c_1=c_2=\cdots=c_n=0$. This is because in order for each column of $A$ to be a pivot column, each row of $A$ must have a leading 1 (since there are $n$ rows as well as $n$ columns).  Notice that we have just shown (4) implies (1) as well, which means that (1) through (4) are all equivalent.

[(5) implies (6)] The definition of rank is the dimension of the column space. Since there are $n$ pivot columns, the rank is $n$.

[(6) implies (7)] This is the result of Theorem \ref{rankrowcolumn}, which also shows that (7) implies (5).

[(7) implies (8)] Since (7) is equivalent to (5), we know that the span of the column space is all of ${\mathbb{R}}^n$. This means in particular that the vectors listed are in the span of the columns of $A$.

[(8) implies (9)]  To find the inverse of $A$, we need to solve $A B = I$ or $[A\vec b_1\ A\vec b_2\ \cdots\ A\vec b_n]=[\vec e_1\ \vec e_2\ \cdots\ \vec e_n]$. This means we need to solve the equation $A\vec b_i=\vec e_i$ for $\vec b_i$ for each $i$, and this can be done because each $\vec e_i$ is in the column space of $A$. The inverse of $A$ is thus $[\vec b_1 \ \vec b_2\ \cdots\ \vec b_n]$.

[(9) implies (10)] We already showed that if a matrix has an inverse, then the determinant is not zero (otherwise $|A||A\inv|=|I|$ gives $0=1$).

[(10) implies (11)] If the determinant is not zero, then $|A-0 I|\neq 0$, which means $0$ is not an eigenvalue.

[(11) implies (1)] If zero is not an eigenvalue, then the equation $A\vec x = 0\vec x$ cannot have a nonzero solution. This means that the only solution is the trivial solution.  
\end{proof}

At this point, if we want to say that (10) implies (9) (determinant nonzero implies there is an inverse), then the proof above shows that (10) implies (11) which implies (1) which implies (2) and so on until it implies (9) as well.  By creating a circle of implications, all of the ideas above are equivalent.


In the homework, you are asked to illustrate this theorem with a few matrices.  Your job is to show that your example matrix either satisfies every single one of the statements in Theorem~\ref{huge equivalence}, or satisfies every single one of the statements in Theorem~\ref{huge equivalence negated}.


\section{Vector Spaces}

The notion of a vector space is the foundational structure upon which we will build for the rest of the semester.  Part of the power of mathematics is generalizing patterns to handle more situations, drawing upon the similarities between situations to apply concepts from one situation to another.

\subsection{Vector Subspaces}

We have looked at sets of vectors that we call vector spaces.  This is a set (of vectors) that can be expressed as a span.  For example, we've seen that the set of all multiples of a single vector is a vector space.

The notion of a vector subspace allows us to quickly find new vector spaces inside of vector spaces we already understand.

\begin{definition}
Suppose $V$ is a vector space. If $U$ is a nonempty set of vectors in $V$ such that you can write $U$ as the span of some vectors, then $U$ is also a vector space.  We say $U$ is a \define{vector subspace} of $V$ and write $U\subset V$.   
\end{definition}

\begin{example}
We know that $\RR^3$, the set of all vectors with three components, is a vector space.  Here are some sets of vectors in $\RR^3$ that are vector subspaces of $\RR^3$.
\begin{enumerate}
\item  $\{c(1,2,3) \st c\in\RR\}$: all multiples (i.e., linear combinations) of the vector $(1,2,3)$.  The endpoints of these vectors form a line through the origin.
\item $\{(0,0,0)\}$: just the zero vector (i.e., all linear combinations of the zero vector).  This vector space has only one vector, as opposed to the others which have an infinite number of vectors.
\item $\{c_1(-1,0,1)+c_2(2,1,2) \st c_1,c_2\in\RR\}$: all linear combinations of $(-1,0,1)$ and $(2,1,2)$. The endpoints of these vectors form a plane through the origin.
\item $\{c_1(-1,0,1)+c_2(2,1,2)+c_3(-4,-1,0) \st c_1,c_2,c_3\in\RR\}$: all linear combinations of $(-1,0,1)$, $(2,1,2)$, and $(-4,-1,0)$.  Since the third vector $(-4,-1,0)$ is actually a linear combination of $(-1,0,1)$ and $(2,1,2)$, $(-4,-1,0)=2(-1,0,1)-(2,1,2)$, any linear combination of the three vectors can actually be written as a linear combination of just the first two vectors.  For example, 
\begin{align*}
3(-1,0,1)-2(2,1,2)+2(-4,-1,0)&=3(-1,0,1)-2(2,1,2)+2(2(-1,0,1)-(2,1,2))\\
&=7(-1,0,1)-4(2,1,2).
\end{align*}
From this, we can see that the set of all linear combinations of the three vectors is the same as the set of all linear combinations of just the first two vectors.  So this vector subspace is the same as the previous example.
\end{enumerate}
\end{example}

%In general, each line through the origin that lies in the plane is a vector subspace of the plane. Each line and plane in $\RR^3$ which passes through the origin is a vector subspace of $\RR^3$. 

\begin{example}
  Here are some examples of sets of vectors that are \emph{NOT} vector subspaces of $\RR^3$.  Can you see that each of these is impossible to express as the span of some vectors?
  \begin{enumerate}
  \item $\{ (1,2,3)\}$
  \item $\{c(-1,0,1)+(2,1,3)\st c\in\RR\}$
  \item $\{(x,y) \st x\geq 0 \text{ and } y\geq0\}$: all vectors in $\RR^2$ that have only positive components
  \item $\{(x,y)\st x=0 \text{ or } y=0\}$: all vectors in $\RR^2$ that are on either the $x$ or $y$ axes
  \item $\{(x,y,z)\st \sqrt{x^2+y^2+z^2}\leq 1\}$: all vectors in $\RR^3$ that have length less than or equal to 1
  \end{enumerate}
\end{example}

In order for a set of vectors to actually be the span of some vectors, the set must be closed under linear combinations.  In other words, if we take any vectors in the subspace, then every linear combination of the vectors must also be in the subspace.  To avoid trivialities, we insist that vector subspaces actually have at least one vector.  Since the vector space must have all multiples of any vectors, we know that every vector space at least has the zero vector.  These facts help us formulate a quick test to verify that a set of vectors is actually a vector space (i.e., the set of vectors is actually the span of some vectors).

\begin{theorem}\label{thm subspace iff closed}
Suppose $U$ is a set of vectors in a vector space $V$ and the all of the following three conditions hold:
\begin{enumerate}
  \item $\vec 0\in U$
	\item For every $\vec u,\vec v\in U$, we have $\vec u+\vec v\in U$ (we say $U$ is closed under vector addition)
	\item For every $\vec u \in U$ and every $c\in \RR$, we have $c\vec u\in U$ (we say $U$ is closed under scalar multiplication)
\end{enumerate}
If each of those three conditions hold, then $U$ is a vector subspace of $V$.  If any of the three conditions above are false, then $U$ is not a vector space, and so is not a vector subspace of $V$.
\end{theorem}

The first condition above just ensures that $U$ has at least one vector in it since a vector space will always have the zero vector.  The second two conditions are testing if $U$ is closed under linear combinations by testing to see if $U$ is closed under both operations involved in a linear combination (scalar multiplication and vector addition).

\begin{example}
  Let's use this test to show that the set of vectors $U=\{c(1,2,3)\st
  c\in\RR\}$ from the example above is a vector space.  First, $\vec
  0\in U$ since setting $c=0$ gives the vector $(0,0,0)=\vec 0$.
  Second, if we have any two vectors in $U$ (i.e., two multiples of
  $(1,2,3)$), say $c_1(1,2,3)$ and $c_2(1,2,3)$, then their sum
  $c_1(1,2,3)+c_2(1,2,3)=(c_1+c_2)(1,2,3)$ is also a multiple of
  $(1,2,3)$, so the sum is in $U$.  Finally, if we have any vector in
  $U$, say $c_1(1,2,3)$, and we multiply it by any real number $c$, we
  get $cc_1(1,2,3)$, which is a multiple of $(1,2,3)$ and therefore in
  $U$.  Thus $U$ contains the zero vector and is closed under linear
  combinations, and so $U$ is a vector space.

  Note that we need to show that the last two conditions hold no matter what two vectors we pick from $U$ (for the second condition) or what vector from $U$ and what scalar from $\RR$ we pick (for the third condition).  It is not enough to show the two conditions hold for specific examples.  The last two conditions must hold \emph{always}.
\end{example}

\begin{example} Now let's use the conditions in the theorem to show
  that the set $U=\{c(-1,0,1)+(2,1,3)\}$ is not a vector subspace.
  Note that you cannot pick a constant $c$ so that $\vec
  0=(0,0,0)=c(-1,0,1)+(2,1,3)$ since you'd have to pick $c=2$ to zero
  out the first component, but then the second component would be 1.
  That means that $U$ does not contain the zero vector, violating the first condition above.  That would be enough to show that $U$ is not a vector space.
  However, let's test the other two conditions anyway. 

  Let's pick two generic vectors from $U$, $\vec u=c_1(-1,0,1)+(2,1,3)$ and $\vec v=c_2(-1,0,1)+(2,1,3)$.  In order to show that the second condition fails, we only need to find one specific example where $\vec u+\vec v \notin U$.  However, we use variables $c_1$ and $c_2$ so that we can analyze what happens for \emph{any} pair of vectors in $U$, and that will make it easier to find the example we need.  The sum is
  \begin{align*}
    \vec u+\vec v=(c_1+c_2)(-1,0,1)+(4,2,6)
    =(-c_1-c_2+4, 2, c_1+c_2+6).
    \end{align*}
    Note that the second component of $\vec u+\vec v$ is 2, but any vector in $U$ will always have a second component equal to 1 (since it is of the form $c(-1,0,1)+(2,1,3)=(-c+2, 1, c+3)$).  This shows that $\vec u+\vec v$ is \emph{never} in $U$, no matter what $\vec u,\vec v$ we pick from $U$, so $U$ is not closed under vector addition.  Again, violating just this second condition is enough to show that $U$ is not a vector space, but we'll test the last condition anyway to see how it is done.

    Let's pick a vector from $U$, $\vec u=c_1(-1,0,1)+(2,1,3)$, and a scalar $c\in\RR$.  Again, notice that we use variables to analyze what happens for any vector in $U$ and any scalar in $\RR$.  The scalar multiplication $c\vec u$ is
    \begin{align*}
      c\vec u = c(c_1(-1,0,1)+(2,1,3)) = (-cc_1+2, c, cc_1+3).
    \end{align*}
    If $c\neq 1$, then the second component of $c\vec u$ is not 1, so $c\vec u$ is not in $U$.  This means that $U$ is not closed under scalar multiplication.

    Note that we only need to show that one of the conditions in the test fails to show that $U$ is not a vector space (and hence not a vector subspace).  Furthermore, since the second two conditions must hold always, if we can come up with a single example of two vectors $\vec u$ and $\vec v$ which are in $U$, but $\vec u+\vec v$ is not in $U$, that is enough to show that the second condition fails and $U$ is not a vector space.  If we can come up with a single example of a vector $\vec u\in U$ and a scalar $c\in\RR$ where $c\vec u$ is not in $U$, then the third condition fails and $U$ is not a vector space.

In this example, all three conditions failed.  Sometimes only one or two of the conditions will be false.  In those cases, $U$ is still not a vector space.
\end{example}


\subsection{Generalizing what a ``vector'' is}

The ideas behind linear combinations are very useful for things besides lists of numbers, which is what we've meant when we've said ``vector'' up to this point.  Many times we'd like to work with things other than lists of numbers.  In this section, let's look at how the concepts of subspace and linear combinations apply when we change what a ``vector'' is.

\begin{example}
  Here are some examples of sets of ``vectors'' where a ``vector'' is something other than a list of numbers.
  \begin{enumerate}
  \item A vector is a $3\times 2$ matrix.  $M_{3,2}$ is the set of all $3\times 2$ matrices and is a vector space.  All of the vectors in $M_{3,2}$ are linear combinations of the vectors
    \begin{equation*}
      \left\{
      \begin{bmatrix}1&0\\0&0\\0&0\end{bmatrix},
      \begin{bmatrix}0&1\\0&0\\0&0\end{bmatrix},
      \begin{bmatrix}0&0\\1&0\\0&0\end{bmatrix},
      \begin{bmatrix}0&0\\0&1\\0&0\end{bmatrix},
      \begin{bmatrix}0&0\\0&0\\1&0\end{bmatrix},
      \begin{bmatrix}0&0\\0&0\\0&1\end{bmatrix}
      \right\}
  \end{equation*}

  \item A vector is an $m\times n$ matrix.  $M_{m,n}$ is the vector space of all $m\times n$ matrices.  If $m=n$, then we abbreviate $M_{m,n}$ as just $M_n$, e.g., $M_3$ is the vector space, or set of vectors, of all $3\times 3$ matrices.
  \item A vector is a polynomial of degree less than or equal to 2\marginpar{The \define{degree} of a nonzero polynomial is the highest power of the variable.  The degree of $x^3-3x^4+1$ is four and the degree of the constant polynomial $2$ is zero.  The degree of the zero polynomial $0$ is either $-1$ or $-\infty$, depending on the convention used.}.  $P_2(x)$ is the set of all polynomials of degree less than or equal to 2 and is a vector space.  All of the vectors in $P_2(x)$ are linear combinations of the vectors $\{1,x,x^2\}$.
 \item A vector is a polynomial with degree less than or equal to $n$.  The set of all polynomials with degree less than or equal to $n$ is $P_n(x)$, and this is a vector space.
 \item A vector is a polynomial.  The set of all polynomials with variable $x$ is the vector space $P(x)$.
\end{enumerate}
\end{example}

\begin{example} Here are some examples of vector subspaces.
\begin{enumerate}
	\item  The span of a set of vectors in a vector space $V$ is always a subspace of $V$. This is because every linear combination is by definition in the span, so it is in the subspace.  Hence, if $M$ is an $m$ by $n$ matrix, the row space (the span of the rows) is a subspace of $\RR^n$ and the column space is a subspace of $\RR^m$.
	\item  The set of upper triangular $m$ by $n$ matrices is a subspace of $M_{m,n}$. The zero matrix is in $M_{m,n}$. The sum of two upper triangular matrices is upper triangular.  Multiplying each entry of an upper triangular matrix by a scalar will still result in an upper triangular matrix.
	\item  The set of polynomials of degree less than or equal to $n$, written $P_n(x)$, is a subspace of $P(x)$. The zero polynomial ($p(x)=0$, which has degree less than zero) is in the set.  If two polynomials each have degree less than or equal to $n$, then their sum also has degree less than or equal to $n$.  Also, multiplying a polynomial by a constant cannot increase the degree, so the set is closed under scalar multiplication.
	\item  More examples are given in Schaum's outlines on page 160 (examples 4.4, 4.5, and 4.6).
        \item See Example~\ref{RR subspace examples} for some examples we've seen before of sets which are vector subspaces of $\RR^n$.
\end{enumerate}
\end{example}

\begin{example}
Here are some examples of sets that are \emph{NOT} vector subspaces. To show a set of vectors in a vector space is not a vector subspace, you just need to find one example that shows the set is not closed under addition or scalar multiplication (or show that the set is empty).
\begin{enumerate}
        \item See Example~\ref{RR subspace nonexamples} for some examples we've seen before of sets which are not vector subspaces.
	\item The set of polynomials of degree two (each polynomial \emph{must} have an $x^2$ term) is not a vector subspace of the set of polynomials because the zero polynomial is not in this set. Also, the sum of $x^2$ and $1-x^2$ is $(x^2)+(1-x^2)=1$, which is a polynomial of degree zero instead of degree 2 (so the set is not closed under addition).
\end{enumerate}
\end{example}

% \begin{proof}
% The proof of this theorem is rather quick. Let $V$ be the larger vector space, and $U$ the subset of vectors that satisfy the 3 properties.  From the conditions in the theorem, we know that $U$ is nonempty and closed under vector addition and scalar multiplication. Because every vector in $U$ is also a vector in $V$, we immediately know that vector addition is associative ($A_1$) and commutative ($A_2$), scalar multiplication distributes across vector addition ($M_1$) and distributes across scalar addition ($M_2$), scalar multiplication is associative ($M_3$), and multiplication by 1 does nothing ($M_4$). The first condition in the theorem shows that $U$ has a zero vector ($A_3$). So all we need to show is that the additive inverse ($A_4$) of any vector in $U$ is also in $U$. If $\vec u\in U$, then because $c\vec u\in U$ for every $c\in \mathbb{R}$, we can pick $c=-1$ to obtain $-1 \vec u\in U$. This means that $u+(-1\vec u) = (1-1)\vec u = 0\vec u = \vec 0$, which means every $u\in U$ has an additive inverse $-1\vec u\in U$ ($A_4$). We have now verified that $U$ satisfies all the axioms required to be a vector space.
% \end{proof}


The first example of a vector subspace that we examined involved looking at the span of a set of vectors. Because any vector subspace is closed under linear combinations, the span of all vectors in the subspace is the subspace itself. This means that every vector subspace is the span of a set of vectors. This pattern will be used so often that we will state it as a theorem.

\begin{theorem}\label{thm subspace iff span}
If $W$ is a subset of vectors in a vector space $V$, then $W$ is a subspace of $V$ if and only if $W$ is the span of a set of vectors.
\end{theorem}


\begin{example}
Let's use the previous theorem to show that some sets are vector subspaces. 
\begin{enumerate}
	\item The set of 2 by 2 diagonal matrices is spanned by 
	$\begin{bmatrix}1&0\\0&0\end{bmatrix}$ and $\begin{bmatrix}0&0\\0&1\end{bmatrix}$, and so is a vector subspace of $M_{2,2}$.
	\item We already know that $P(x)$ is a vector space. The span of $\{1,x,x^2,x^3,x^4\}$ is the set of polynomials of the form $a+bx+cx^2+dx^3+ex^4$, which is the set of all polynomials of degree 4 or less, $P_4(x)$. Since $P_4(x)$ is the span of set of vectors, it is a vector subspace of $P(x)$. 
\end{enumerate}
\end{example}



\subsection{Generalizing Bases, Dimension, and Coordinates}
Vector spaces are often rather large sets. In the previous chapter, we found basis vectors for the column and row space of a matrix by row reducing a matrix. These basis vectors allow us to talk about the entire space by considering the span of the basis vectors. The coordinates of other vectors can then be given relative to the chosen basis. Let's review these ideas with an example of finding a basis for the column space of a matrix.

\begin{example}
In this example, a vector is a list of three numbers, i.e., an element of $\RR^3$.
The reduced row echelon form of 
$\begin{bmatrix}[ccc] 1&-1&1\\3&0&6\\5&1&11\end{bmatrix}$ 
is 
$\begin{bmatrix}[ccc] 1&0&2\\0&1&1\\0&0&0\end{bmatrix}$. 
The column space is the vector space spanned by the column vectors 
$\vec u_1= (1,3,5)$, 
$\vec u_2= (-1,0,1)$, and 
$\vec u_3= (1,6,11)$.
The reduced row echelon form tells us that the third column is 2 times the first ($u_1$) plus the second ($u_2$).   
Since the third column is already a linear combination of the first two columns, it does not contribute any new vectors to the span of just the first two vectors.  
Hence, the column space is the span of $\vec u_1$ and $\vec u_2$ (we can remove the dependent vector without shrinking the number of vectors in the span). 
Because the first two columns are linearly independent, we cannot remove either vector without reducing the number of vectors in their span.

A basis for the column space is a linearly independent set of vectors whose span is the the column space. 
Because $\vec u_1$ and $\vec u_2$ are linearly independent and they span the column space, they form a basis for the column space. 
The coordinates of the third vector relative to this basis are $(2,1)$, the numbers in the third column of the rref.  
The dimension of the column space is the number of vectors in a basis for the column space, so here the dimension of the column space is 2.  The column space is a plane of vectors in $\RR^3$.
\end{example}

The example above illustrates the key ideas we need to work with any finite dimensional vector spaces.  Let's now give the formal definitions of basis, dimension, and coordinates that will work no matter what things we are calling ``vectors.''  Compare this definition to Definitions~\ref{def:basis for Rn} and \ref{def coordinates for Rn} and see that the underlying concepts are exactly the same, no matter what sort of thing a ``vector'' is.

\begin{definition}[Basis, Dimension, Coordinates]
A \define[basis!vector space]{basis} for a vector space is a set of vectors that are 
(1) linearly independent and (2) span the vector space.
The \define{dimension} of a vector space is the number of vectors in a basis for the vector space. 
We say the dimension of the zero vector space (the vector space consisting of only the zero vector) is zero. 
If $\beta=\{\vec v_1,\vec v_2,\ldots,\vec v_n\}$ is a basis and $\vec v = c_1\vec v_1+c_2\vec v_2+\cdots+c_n\vec v_n$, then we call $(c_1,c_2,\ldots,c_n)_\beta$ the \define{coordinates} of $\vec v$ relative to the basis $\beta$. 
If the basis is understood, we just leave off the subscript $\beta$.
\end{definition}

Remember, once you have a new definition, you should always try to illustrate that definition with examples.  Let's now look at a few more examples. 
\begin{example} Let's start by looking at $\mathbb{R}^n$, a vector space we are familiar with.
The standard basis for ${\mathbb{R}}^2$ is $\vec e_1 = (1,0),\vec e_2=(0,1)$, so the dimension is 2. 
Similarly,  $\vec e_1 = (1,0,\ldots,0),\vec e_2=(0,1,\ldots,0), \ldots, \vec e_n=(0,0,\ldots,1)$ is a basis for ${\mathbb{R}}^n$, so the dimension is $n$.  
\end{example}

\begin{example}
Let's find a basis for a polynomial vector space. 
Consider the set of polynomials $\{1, x, x^2\}$. The span of these polynomials is $a_0(1)+a_1(x)+a_2(x^2)$, which is every polynomial of degree less than or equal to 2. 
To show that this set forms a basis for $P_2(x)$, we only need to show that these polynomials are linearly independent.  
To show this, we must show that $$a_0(1)+a_1(x)+a_2(x^2) = 0 \text{ implies } a_0=a_1=a_2=0$$ (the only linear combination which results in the zero polynomial is the trivial combination).  
Since the equation above holds for every real number $x$, we can simply plug in 3 different $x$ values to obtain 3 different equations. Letting $x=0,1,-1$ gives us the three equations $a_0=0$, $a_0+a_1+a_2=0$, and $a_0-a_1+a_2=0$. Solving this system yields $a_0=a_1=a_2=0$. Hence, the three polynomials $1, x, x^2$ are linearly independent.  
These three polynomials form a basis for $P_2(x)$, and we call this the standard basis for $P_2(x)$.  Using this basis, the coordinates of $a_0+a_1x+a_2x^2$ relative to the standard basis are $(a_0,a_1,a_2)$.
Since there are three basis vectors, the dimension of $P_2(x)$ is 3 (one more than the maximum degree of the polynomials). 
\marginpar{In general, the set $\{1, x, x^2, x^3, \ldots, x^n\}$ is called the standard basis for $P_n(x)$.}%
\end{example}


For vector spaces that are not ${\mathbb{R}}^n$, a simple way to work with vectors is to start by finding a basis for the vector space. Once you have a basis, you can give the coordinates of every vector in the space relative to that basis.  These coordinates can then be thought of as vectors in ${\mathbb{R}}^n$. Questions about the vector space can then be reduced to questions about vectors in $\mathbb{R}^n$.  As seen in the last example, the vector space $P_2(x)$ consists of all polynomials of the form $a_0+a_1x+a_2x^2$. If we use the standard basis for polynomials, then the coordinates are simply $(a_0,a_1,a_2)$. We can now use these coordinates to ask other questions about the vectors in $P_2(x)$. Here are two examples.

\begin{example}
In this example, a vector is a polynomial.  Let $V$ be the vector space $\vspan\{1+x,\,x+x^2,\, 4+x-3x^2\}$. We'll show it is a 2-dimensional vector subspace of $P_2(x)$, and provide a basis for this space in two different ways. The coordinates of these three polynomials relative to the standard basis for $P_2(x)$ are $(1,1,0)$, $(0,1,1)$, and $(4,1,-3)$. 
To study the span of the polynomials, we instead look at the column space of $A$.
$$
A=
\begin{bmatrix}
 1 & 0 & 4\\
 1 & 1 & 1\\
 0 & 1 & -3
\end{bmatrix} 
\xrightarrow{\rref}
\begin{bmatrix}
 1 & 0 & 4\\
 0 & 1 & -3\\
 0 & 0 & 0
\end{bmatrix}, \quad A=\begin{bmatrix}1&0\\1&1\\0&1\end{bmatrix}
\begin{bmatrix}1&0&4\\0&1&-3\end{bmatrix}.$$ Row reduction of $A$ shows us that the first two columns of $A$ are pivot columns, hence basis vectors for the column space of $A$.  This tells us that the corresponding set of polynomials $\{1+x, x+x^2\}$ is a basis for $V$.  From writing $A$ as $CR$, we see the coordinates of $4+x-3x^2$ relative to this basis are $(4,-3)$.

To find a different basis for $V$, we instead look at the row space of $A^T$.
$$A^T
=
\begin{bmatrix}
 1 & 1 & 0 \\
 0 & 1 & 1 \\
 4 & 1 & -3
\end{bmatrix}
\xrightarrow{\rref}
\begin{bmatrix}
 1 & 0 & -1 \\
 0 & 1 & 1 \\
 0 & 0 & 0
\end{bmatrix},\quad A^T=\begin{bmatrix}1&1\\0&1\\4&1\end{bmatrix}
\begin{bmatrix}1&0&-1\\0&1&1\end{bmatrix}.$$ 
Row reduction tell us that $(1,0,-1)$ and $(0,1,1)$ are basis vectors for the row space of $A^T$ (hence the column space of $A$).  The corresponding polynomials $1-x^2$ and $x+x^2$ are basis vectors for $V$. From the last row of the $C$ matrix in the $CR$ factorization above, the coordinates of $4+x-3x^2$ relative to this basis are $(4,1)$.
  
\end{example}

\begin{example}
Let's show the set of vectors $\{1,\, 1+x,\, 1+x+x^2\}$ is a basis for $P_2(x)$, and then write $2+3x-4x^2$ as a linear combination of these vectors. 
To do this, we place the coordinates of each vector (relative to the standard basis for $P_2(x)$) into the columns of a matrix, and then reduce 
$$\begin{bmatrix}
 1 & 1 & 1 & 2 \\
 0 & 1 & 1 & 3 \\
 0 & 0 & 1 & -4
\end{bmatrix}
\xrightarrow{\rref}
\begin{bmatrix}
 1 & 0 & 0 & -1 \\
 0 & 1 & 0 & 7 \\
 0 & 0 & 1 & -4
\end{bmatrix}.$$ 
Since the first 3 columns are pivot columns, they are linearly independent. 
This implies that the polynomials represented by these vectors are linearly independent. Since the three sets of coordinates span all of $\RR^3$, the three polynomials must span all of $P_2(x)$.
The last column of the rref tells us that the coordinates of $(2,3,-4)$ relative to $\beta=\{(1,0,0),(1,1,0),(1,1,1)\}$ are $(-1,7,-4)_\beta$. In terms of polynomials we write $2+3x-4x^2 = -1(1)+7(1+x)-4(1+x+x^2)$. To emphasize the vocabulary, we can now state two things:
\begin{enumerate}
	\item The coordinates of $2+3x-4x^2$ relative to $\{1,x,x^2\}$ are $(2,3,-4)$.
	\item The coordinates of $2+3x-4x^2$ relative to $\{1,1+x,1+x+x^2\}$ are $(-1,7,-4)$.
\end{enumerate}
\end{example}

We'll often start by using a standard basis to represent vectors, and then change the basis to a different one if the need arises.  Later we will explore this idea in more depth.  For now we'll focus on finding a basis and coordinates relative to that basis.

\begin{example}
As a final example, let's introduce the standard basis for the vector space of $m$ by $n$ matrices. We'll focus on 2 by 2 matrices, though this example generalizes to any size matrix. The set of matrices 
$$\left\{
\begin{bmatrix}
 1 & 0  \\
 0 & 0 
\end{bmatrix},
\begin{bmatrix}
 0 & 1  \\
 0 & 0 
\end{bmatrix},
\begin{bmatrix}
 0 & 0  \\
 1 & 0 
\end{bmatrix},
\begin{bmatrix}
 0 & 0  \\
 0 & 1 
\end{bmatrix}
\right\}$$
forms a basis for the vector space $M_{2,2}$ of 2 by 2 matrices (which means this space has dimension 4).  
To show it is a basis, we must show that (1) the matrices are linearly independent and (2) the matrices span $M_{2,2}$.
To show linearly independent, we solve 
$$c_1
\begin{bmatrix}
 1 & 0  \\
 0 & 0 
\end{bmatrix}
+c_2
\begin{bmatrix}
 0 & 1  \\
 0 & 0 
\end{bmatrix}
+c_3
\begin{bmatrix}
 0 & 0  \\
 1 & 0 
\end{bmatrix}
+c_4
\begin{bmatrix}
 0 & 0  \\
 0 & 1 
\end{bmatrix}
=
\begin{bmatrix}
 c_1 & c_2  \\
 c_3 & c_4 
\end{bmatrix}
=\begin{bmatrix}
 0 & 0  \\
 0 & 0 
\end{bmatrix}
.$$
The only solution is the trivial solution, so the matrices are linearly independent. 
These 4 matrices clearly span the space, as the coordinates of any matrix 
$\begin{bmatrix}
 a & b  \\
 c & d 
\end{bmatrix}$ 
relative to this basis are $(a,b,c,d)$.
We can now work with 2 by 2 matrices as a vector space by considering vectors in ${\mathbb{R}}^4$ instead. 
\end{example}

Whenever we encounter vector spaces of finite dimension, we will often start by finding a basis and then work with coordinates relative to this basis instead. Then we can work with the coordinates as vectors in ${\mathbb{R}}^n$ to perform calculations. This means that once we understand what happens in ${\mathbb{R}}^n$, we can understand what happens in any vector space of dimension $n$. 

\subsection{Generalizing vector operations}

Not only can we change what a ``vector'' is, but we can also change our definition of ``vector addition'' or ``scalar multiplication.''  Again, the power of mathematics is in understanding the key properties of our vector addition and scalar multiplication properties so that we can change what those operations are, but still retain the important properties that let us do useful things.

To understand what exactly are the key properties of vector addition and scalar multiplication, let's see what common properties those two operations have for various things we have considered as vectors.

Comparing Theorems~\ref{rn vector space properties} and \ref{matrix vector space properties}, we see that there are 4 important properties of addition and 4 important properties of scalar multiplication that are shared between vectors in $\RR^n$ and matrices (vectors in $M_{m,n}$).  These 8 properties also hold for normal polynomial addition and scalar multiplication (i.e., for vectors in $P_n(x)$).  These 8 properties are the key things that need to hold for vector addition and scalar multiplication to give us what we consider ``normal'' addition and scalar multiplication.  If you want to define a different way to combine two vectors together (vector addition) or a different way to combine a scalar and a vector (scalar multiplication), and your new vector addition and scalar multiplication satisfies these 8 properties, then you can use all the tools, patterns, and theorems we have discussed for vector spaces.

We finally now have the fundamental underlying properties that makes vectors in $\RR^n$, matrices, polynomials, and other structures similar.


\begin{definition}\marginpar{We use the word ``real'' to emphasize that the scalars are real numbers. For those of you who know about fields, you can replace ``real'' with any field, such as the complex numbers or finite fields used in computers or cryptography. For simplicity, we will stick to scalars being real numbers for now.}%
A (real) \define{vector space} is a set $V$ of things (which we will call ``vectors'') together with two operations:
\begin{enumerate}
	\item[(C1)] \define{vector addition} ($+$), which assigns to any pair $\vec u, \vec v\in V$ another vector $\vec u+\vec v\in V$,
	\item[(C2)] \define{scalar multiplication} ($\cdot$), which assigns to any $\vec v \in V$ and $c\in {\mathbb{R}}$ another vector $c\vec v\in V$.
\end{enumerate}
\marginpar{We say that the set $V$ is closed under the operations of vector addition and scalar multiplication because adding vectors and scaling vectors produces other vectors in $V$.}%
These two operations must satisfy the following axioms ($c,d\in {\mathbb{R}}$ and $\vec u,\vec v,\vec w\in V$):
\marginpar{Compare these properties to Theorems~\ref{rn vector space properties} and \ref{matrix vector space properties}.}%
\begin{enumerate}
	\item[($A_1$)] Vector addition is associative: $(\vec u+\vec v)+\vec w = \vec u +(\vec v+\vec w)$.
	\item[($A_2$)] Vector addition is commutative: $\vec u+\vec v= \vec v+\vec u$.
	\item[($A_3$)] There is a zero vector $\vec 0$ in $V$ which satisfies $\vec 0+\vec v = \vec v+\vec 0=\vec v$.
	\item[($A_4$)] Every $\vec v\in V$ has an additive inverse, called $-\vec v$, which satisfies $\vec v+(-\vec v)=\vec 0$.
	\item[($M_1$)] Scalar multiplication distributes across vector addition: $c(\vec u+\vec v)= c\vec u + c\vec v$.
	\item[($M_2$)] Scalar multiplication distributes across scalar addition: $(c+d)\vec v= c\vec v+ d\vec v$.
	\item[($M_3$)] Scalar multiplication is associative: $(cd)\vec v = c(d\vec v)$
	\item[($M_4$)] Scalar multiplication by 1 does nothing: $1\vec v=\vec v$
\end{enumerate}
\end{definition}
The definition above is the formal definition of a vector space. The definition generalizes the properties we already use about vectors in $\RR^2$ and $\RR^3$. This definition essentially means you can add and multiply as you would normally expect---no surprises will occur. 
 However, you can work with this definition if you think 
\begin{quote}
A vector space is a set of things (``vectors'') closed under vector addition and scalar multiplication, and these operations obey the usual rules of addition and multiplication.
\end{quote}


With vector addition and scalar multiplication properly defined, the span of any set of vectors from the vector space will always be a vector space. In other words, every linear combination of the vectors in that space must also be in the space.  Let's look at some examples of vector spaces, and explain why they are vector spaces.

\begin{example}[Examples of vector spaces]
Here are some examples of sets (and corresponding vector addition and scalar multiplication operations) that are vector spaces.
\begin{enumerate}
	\item Both ${\mathbb{R}}^2$ and ${\mathbb{R}}^3$ are vector spaces. The set of all lists of real numbers of length $n$, ${\mathbb{R}}^n$, is a vector space.  Vector addition is defined component-wise. Addition is associative, commutative, the zero vector is $(0,0,\ldots,0)$, and $(v_1,v_2,\ldots,v_n)+(-v_1,-v_2,\ldots,-v_n)=\vec 0$. Also, scalar multiplication distributes across vector and scalar addition, it is associative, and $1\vec v=\vec v$.  The definition of a vector space was defined to capture the properties of ${\mathbb{R}}^n$ which are useful in other settings.
	\item The set of $m$ by $n$ matrices, written $M_{m,n}$, with ``vector addition'' being normal matrix addition and ``scalar multiplication'' being normal matrix-scalar multiplication, is a vector space.
	\item The set of all polynomials, written $P(x)$, with ``vector addition'' being the normal sum of polynomials, and ``scalar multiplication'' being the normal number-times-polynomial operation, is a vector space.
	\item The set of continuous functions on an interval $[a,b]$, written $C[a,b]$, where ``vector addition'' is just adding the two functions together and ``scalar multiplication'' is just multiplying the function by a scalar, is a vector space.
	\item The set of continuously differentiable functions on an interval $(a,b)$, written $C^1(a,b)$, where ``vector addition'' is just adding the two functions together and ``scalar multiplication'' is just multiplying the function by a scalar, is a vector space.
\end{enumerate}
\end{example}
\begin{example}[Examples that are \emph{NOT} vector spaces] 
Here are some examples of sets that are not vector spaces along with an explanation of one of the properties they do not satisfy.
\begin{enumerate}
	\item A line that does not pass through the origin. The zero vector is not on the line.
	\item Polynomials of degree $n$.  There is no zero vector.
	\item Invertible Matrices. There is no zero matrix since it is not invertible.
	\item The nonnegative $x$ axis. There is no additive inverse for $(1,0)$, as $(-1,0)$ is not in the space.
	\item The line segment from $(-1,0)$ to $(1,0)$. The product $2(1,0)=(2,0)$ is not in the space.
\end{enumerate}	
\end{example}


\begin{example} [The set of all polynomials is a vector space]
To illustrate how to verify that a set is a vector space, let's formally verify that one of the examples above is indeed a vector space. Let's work with the set of all polynomials $P(x)$. Our vectors are polynomials, which are functions of the form
$$a(x) = a_0 +a_1 x+a_2 x^2 + \cdots + a_n x^n = \sum_{i=0}^n a_i x^i.$$
We must define vector addition and scalar multiplication.  
If $\ds a(x) =  \sum_{i=0}^n a_i x^i$ and $\ds b(x) =  \sum_{i=0}^m b_i x^i$ are two polynomials, define their sum to be a new polynomial $c(x)$ where $c_i = a_i+b_i$ for each $i$ (where $a_i=0$ if $i> n$ and $b_i=0$ if $i> m$). This is just normal polynomial addition.
For example, we compute the sum $(1+4x+3x^2)+(x-x^3) = (1+0)+(4+1)x+(3+0)x^2+(0-1)x^3$. If $c$ is a real number, then scalar multiplication $c\cdot a(x)$ is a new polynomial $b(x)$ where $b_i=ca_i$ for each $i$.  This is just the normal number-times-polynomial operation

First, we need to check that the ``vector addition'' of two polynomials is also in our set (i.e., the result is a polynomial).  This is clear from the definition.  We also need to check that the ``scalar product'' of a number and a polynomial is also a polynomial.  This again is clear from the definition of scalar multiplication.

Finally, we need to show the 8 axioms hold:
\begin{enumerate}
	\item[($A_1$)] Polynomial addition is associative: $(a(x)+b(x))+c(x) = a(x) +(b(x)+c(x))$. To verify this, we look at each coefficient of the sum.  The coefficients are $(a_i+b_i)+c_i = a_i+(b_i+c_i)$, and equality holds because addition of real numbers is associative.
	\item[($A_2$)] Polynomial addition is commutative: $a(x)+b(x)=b(x)+a(x)$. Again this follows because addition of real numbers is commutative. In particular, $a_i+b_i=b_i+a_i$ for each $i$.
	\item[($A_3$)] The zero polynomial $z(x)=0$ is the zero vector, as $0+a(x) = a(x)+0=a(x)$ for any polynomial.
	\item[($A_4$)] The additive inverse of $a(x) = a_0+a_1x+\cdots+a_nx^n$ is $-a(x) = -a_0-a_1x-\cdots-a_nx^n$.
	\item[($M_1$)] Scalar multiplication distributes across vector addition: $c(a(x)+b(x))= ca(x) + cb(x)$. For each $i$, we compute $c(a_i+b_i) = ca_i+cb_i$. 
	\item[($M_2$)] Scalar multiplication distributes across scalar addition: $(c+d)a(x)= ca(x)+ da(x)$. For each $i$, we compute $(c+d)a_i = ca_i+da_i$. 
	\item[($M_3$)] Scalar multiplication is associative: $(cd)a(x) = c(da(x))$. For each $i$, we compute $(cd)a_i = c(da_i)$.
	\item[($M_4$)] Scalar multiplication by 1 does nothing: $1a(x)=a(x)$. For each $i$, we compute $1a_i = a_i$.
\end{enumerate}
We have now shown that the set of polynomials is a vector space.
\end{example}



%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "../linear-algebra"
%%% End: