Skip to content

Commit 990114c

Browse files
committed
50008: Added Lecture 9
1 parent 5ae5334 commit 990114c

File tree

6 files changed

+123
-0
lines changed

6 files changed

+123
-0
lines changed
Binary file not shown.

50008 - Probability and Statistics/50008 - Probability and Statistics.tex

+2
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@
5555

5656
\addchapter{hypothesis_testing}
5757

58+
\addchapter{maximum_likelihood_estimate}
59+
5860
\addchapter{credit}
5961

6062
\end{document}

50008 - Probability and Statistics/maximum_likelihood_estimate/code/.gitkeep

Whitespace-only changes.

50008 - Probability and Statistics/maximum_likelihood_estimate/diagrams/.gitkeep

Whitespace-only changes.

50008 - Probability and Statistics/maximum_likelihood_estimate/images/.gitkeep

Whitespace-only changes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
\chapter{Maximum Likelihood Estimate}
2+
3+
Given some distribution with an unknown parameter $\theta$:
4+
\[X \thicksim Distribution(\dots \theta \dots)\]
5+
And a sample taken from the distribution $\underline{X}$:
6+
\[\underline{X} = (X_1, X_2, \dots, X_n)\]
7+
We want to know the value of $\theta$ for which the likelihood of the sample occurring is highest.
8+
\begin{definitionbox}{Likelihood Function}
9+
The likelihood of some observations $x_1, x_2, \dots, X_n$ occurring given some $\theta$ are:
10+
\[\begin{split}
11+
L(\theta) &= P(x_1, x_2, \dots, x_n|\theta) \\
12+
&= \prod_{i=1}^n f(x_i|\theta)
13+
\end{split}\]
14+
This is as $f$ is the \keyword{probability mass function}, and as each observation is independent we can multiply their probabilities.
15+
\end{definitionbox}
16+
\begin{definitionbox}{Log Likelihood Function}
17+
Used more often than likelihood (easier to work with, and converts decimal small values to large negative values - avoids floating point errors)
18+
\[l(\theta) = \ln L(\theta)\]
19+
\end{definitionbox}
20+
To do this, we construct the likelihood (or log likelihood) function from the distribution and sample in term of $\theta$.
21+
\\
22+
\\ Then we can differentiate the function to determine the value of $\theta$ for the maximum.
23+
\\
24+
\\ This value of $\theta$ is the \keyword{Maximum Likelihood Estimate} ($\hat{\theta}$).
25+
26+
\section{Common Maximum Likelihood Estimates}
27+
Given a sample $\underline{x} = (x_1, x_2, \dots, x_n)$, we can use formulas for the maximum likelihood.
28+
29+
\subsection{Exponential Distribution}
30+
\[X \thicksim Exp(\theta) \Rightarrow f(x) = \theta e^{-\theta x}\]
31+
First we determine the \keyword{likelihood} in terms of $\theta$.
32+
\[\begin{split}
33+
L(\theta) &= \prod_{i=1}^n f(x_i) \\
34+
&= \prod_{i=1}^n \theta e^{-\theta x_i} \\
35+
&= \theta^n\prod_{i=1}^n e^{-\theta x_i} \\
36+
&= \theta^n e^{-\theta\sum_{i=1}^n x_i} \\
37+
\end{split}\]
38+
Next we can derive the \keyword{log likelihood}
39+
\[\begin{split}
40+
l(\theta) &= \ln L(\theta) \\
41+
&= \ln \left(\theta^n e^{-\theta \sum_{i=1}^nx_i} \right)\\
42+
&= n\ln \theta -\theta\sum_{i=1}^n x_i \\
43+
\end{split}\]
44+
Next we can differentiate and set equal to zero:
45+
\[\begin{split}
46+
\cfrac{dl(\theta)}{d\theta} &= n\cfrac{1}{\theta} - \sum_{i=1}^nx_i = 0\\
47+
0 &= \cfrac{n}{\theta} - \sum_{i=1}^nx_i \\
48+
\sum_{i=1}^nx_i &= \cfrac{n}{\theta} \\
49+
\theta &= \cfrac{n}{\sum_{i=1}^nx_i} \\
50+
\end{split}\]
51+
Hence the maximum likelihood estimator is the reciprocal of the mean of the sample.
52+
\[\hat{\theta} = \sfrac{1}{\overline{x}}\]
53+
\subsection{Geometric Distribution}
54+
\[X \thicksim Geo(\theta) \Rightarrow f(x) =\theta(1-\theta)^{x - 1}\]
55+
\[\begin{split}
56+
L(\theta) &= \prod_{i=1}^n f(x_i) \\
57+
& \prod_{i=1}^n \theta(1-\theta)^{x_i - 1} \\
58+
& \theta^n \prod_{i=1}^n (1-\theta)^{x_i - 1} \\
59+
& \theta^n (1 - \theta)^{\sum_{i=1}^n(x_i - 1)} \\
60+
& \theta^n (1 - \theta)^{\left(\sum_{i=1}^nx_i\right) - n} \\
61+
\end{split}\]
62+
Now we find the \keyword{log likelihood}.
63+
\[\begin{split}
64+
l(\theta) &= \ln L(\theta) \\
65+
&= \ln \left( \theta^n (1 - \theta)^{\left(\sum_{i=1}^nx_i\right) - n} \right) \\
66+
&= n\ln\theta +\left(\left(\sum_{i=1}^nx_i\right) - n\right)\ln\left(1 - \theta\right) \\
67+
\end{split}\]
68+
Now we differentiate, and set equal to zero to find $\hat{\theta}$.
69+
\[\begin{split}
70+
\cfrac{dl(\theta)}{d\theta} &= \cfrac{n}{\theta} + \left(\left(\sum_{i=1}^nx_i\right) - n\right)\cfrac{1}{\theta - 1} = 0 \\
71+
0 &=\cfrac{n(\theta - 1)}{\theta(\theta - 1)} + \left(\left(\sum_{i=1}^nx_i\right) - n\right)\cfrac{\theta}{\theta(\theta - 1)} \\
72+
0 &=n(\theta - 1)+ \left(\left(\sum_{i=1}^nx_i\right) - n\right)\theta \\
73+
0 &=n\theta - n + \left(\left(\sum_{i=1}^nx_i\right) - n\right)\theta \\
74+
n &=\left(\sum_{i=1}^nx_i\right)\theta \\
75+
\cfrac{n}{\sum_{i=1}^nx_i} &=\theta \\
76+
\end{split}\]
77+
Hence the maximum likelihood estimator is the reciprocal of the mean of the sample.
78+
\[\hat{\theta} = \sfrac{1}{\overline{x}}\]
79+
\subsection{Binomial Distribution}
80+
\[X \thicksim Binomial(m, \theta) \Rightarrow f(x) =\begin{pmatrix}
81+
m \\ x
82+
\end{pmatrix}\theta^x(1 - \theta)^{m-x}\]
83+
\[\begin{split}
84+
L(\theta) &= \prod_{i=1}^n f(x_i) \\
85+
&= \prod_{i=1}^n \begin{pmatrix}
86+
m \\ x_i
87+
\end{pmatrix}\theta^{x_i}(1 - \theta)^{m-x_i} \\
88+
&= \prod_{i=1}^n \begin{pmatrix}
89+
m \\ x_i
90+
\end{pmatrix} \times \prod_{i=1}^n\theta^{x_i} \times \prod_{i=1}^n(1 - \theta)^{m-x_i} \\
91+
&= \prod_{i=1}^n \begin{pmatrix}
92+
m \\ x_i
93+
\end{pmatrix} \times \theta^{\sum_{i=1}^nx_i} \times (1 - \theta)^{\sum_{i=1}^n m - x_i} \\
94+
&= \prod_{i=1}^n \begin{pmatrix}
95+
m \\ x_i
96+
\end{pmatrix} \times \theta^{\sum_{i=1}^nx_i} \times (1 - \theta)^{mn - \sum_{i=1}^n x_i} \\
97+
\end{split}\]
98+
Now we find the \keyword{log likelihood}.
99+
\[\begin{split}
100+
l(\theta) &= \ln L(\theta) \\
101+
&= \ln \left( \prod_{i=1}^n \begin{pmatrix}
102+
m \\ x_i
103+
\end{pmatrix} \times \theta^{\sum_{i=1}^nx_i} \times (1 - \theta)^{mn - \sum_{i=1}^n x_i} \right) \\
104+
&= \ln \prod_{i=1}^n \begin{pmatrix}
105+
m \\ x_i
106+
\end{pmatrix} + \ln \theta^{\sum_{i=1}^nx_i} + \ln (1 - \theta)^{mn - \sum_{i=1}^n x_i} \\
107+
&= \ln \prod_{i=1}^n \begin{pmatrix}
108+
m \\ x_i
109+
\end{pmatrix} + \sum_{i=1}^nx_i\ln \theta + \left( mn - \sum_{i=1}^n x_i \right)\ln (1 - \theta) \\
110+
\end{split}\]
111+
Now we differentiate, and set equal to zero to find $\hat{\theta}$.
112+
\[\begin{split}
113+
\cfrac{dl(\theta)}{d\theta} &= 0 + \sum_{i=1}^nx_i\cfrac{1}{\theta} + \left( mn - \sum_{i=1}^n x_i \right)\cfrac{1}{\theta - 1}= 0 \\
114+
0 & = \sum_{i=1}^nx_i\cfrac{\theta - 1}{\theta(\theta - 1)} + \left( mn - \sum_{i=1}^n x_i \right)\cfrac{\theta}{\theta(\theta - 1)} \\
115+
0 & = \sum_{i=1}^nx_i (\theta - 1) + \left( mn - \sum_{i=1}^n x_i \right)\theta \\
116+
0 & = \theta\sum_{i=1}^nx_i - \sum_{i=1}^nx_i + mn\theta - \theta\sum_{i=1}^n x_i \\
117+
0 & = - \sum_{i=1}^nx_i + mn\theta \\
118+
\cfrac{\sum_{i=1}^nx_i}{mn} & =\theta \\
119+
\end{split}\]
120+
Hence the maximum likelihood estimator is the sample mean divided by the number of trials (for binomial):
121+
\[\hat{\theta} = \cfrac{\overline{x}}{m}\]

0 commit comments

Comments
 (0)