forked from avehtari/BDA_course_Aalto
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBDA_notes_ch1.tex
210 lines (180 loc) · 7.19 KB
/
BDA_notes_ch1.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
\documentclass[a4paper,11pt,english]{article}
\usepackage{babel}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{newtxtext} % times
\usepackage[varqu,varl]{inconsolata} % typewriter
\usepackage{amsmath}
\usepackage{microtype}
\usepackage{xcolor}
\usepackage[bookmarks=false]{hyperref}
\hypersetup{%
bookmarksopen=true,
bookmarksnumbered=true,
pdftitle={Bayesian data analysis},
pdfsubject={Reading instructions},
pdfauthor={Aki Vehtari},
pdfkeywords={Bayesian probability theory, Bayesian inference, Bayesian data analysis},
pdfstartview={FitH -32768},
colorlinks=true,
linkcolor=blue,
citecolor=black,
filecolor=black,
urlcolor=blue
}
\urlstyle{same}
% if not draft, smaller printable area makes the paper more readable
\topmargin -4mm
\oddsidemargin 0mm
\textheight 225mm
\textwidth 160mm
%\parskip=\baselineskip
\DeclareMathOperator{\E}{E}
\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\var}{var}
\DeclareMathOperator{\Sd}{Sd}
\DeclareMathOperator{\sd}{sd}
\DeclareMathOperator{\Bin}{Bin}
\DeclareMathOperator{\Beta}{Beta}
\DeclareMathOperator{\Invchi2}{Inv-\chi^2}
\DeclareMathOperator{\logit}{logit}
\DeclareMathOperator{\N}{N}
\DeclareMathOperator{\U}{U}
\DeclareMathOperator{\tr}{tr}
%\DeclareMathOperator{\Pr}{Pr}
\DeclareMathOperator{\trace}{trace}
\pagestyle{empty}
\begin{document}
\thispagestyle{empty}
\section*{Bayesian data analysis -- reading instructions}
\smallskip
{\bf Aki Vehtari}
\smallskip
\subsection*{Chapter 1 -- outline}
Outline of the chapter 1
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item 1.1-1.3 important terms, especially 1.3 for the notation
\item 1.4 an example related to the first excerise, and another
practical example
\item 1.5 foundations
\item 1.6 good example related to visualisation exercise
\item {\color{gray}1.7 example which can be skipped}
\item 1.8 background material, good to read before doing the first assignment
\item 1.9 background material, good to read before doing the second assignment
\item 1.10 a point of view for using Bayesian inference
\end{list}
\subsection*{Chapter 1 -- most important terms}
Find all the terms and symbols listed below. Note that some of the
terms are now only briefly introduced and will be covered later in
more detail.
When reading the chapter, write down questions related to things
unclear for you or things you think might be unclear for others.
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item full probability model
\item posterior distribution
\item potentially observable quantity
\item quantities that are not directly observable
\item exchangeability
\item independently and identically distributed
\item $\theta, y, \tilde{y}, x, X, p(\cdot|\cdot), p(\cdot), \Pr(\cdot), \sim, H$
\item sd, E, var
\item Bayes rule
\item prior distribution
\item sampling distribution, data distribution
\item joint probability distribution
\item posterior density
\item probability
\item density
\item distribution
\item $p(y|\theta)$ as a function of $y$ or $\theta$
\item likelihood
\item posterior predictive distribution
\item probability as measure of uncertainty
\item subjectivity and objectivity
\item transformation of variables
\item simulation
\item inverse cumulative distribution function
\end{list}
\subsection*{Proportional to}
The symbol $\propto$ means \textit{proportional to}, which means left
hand side is equal to right hand size given a constant multiplier. For
instance if $y=2x$, then $y \propto x$. It's \texttt{\textbackslash
propto} in LaTeX. See
\url{https://en.wikipedia.org/wiki/Proportionality_(mathematics)}.
\subsection*{Model and likelihood}
Term $p(y|\theta,M)$ has two different names depending on the
situation. Due to the short notation used, there is possibility of
confusion.
\begin{itemize}
\item[1)] Term $p(y|\theta,M)$ is called a \emph{model} (sometimes
more specifically \emph{observation model} or \emph{statistical
model}) when it is used to describe uncertainty about $y$ given
$\theta$ and $M$. Longer notation $p_y(y|\theta,M)$ shows explicitly
that it is a function of $y$.
\item[2)] In Bayes rule, the term $p(y|\theta,M)$ is called
\emph{likelihood function}. Posterior distribution describes the
probability (or probability density) for different values of
$\theta$ given a fixed $y$, and thus when the posterior is computed
the terms on the right hand side (in Bayes rule) are also evaluated
as a function of $\theta$ given fixed $y$. Longer notation
$p_\theta(y|\theta,M)$ shows explicitly that it is a function of
$\theta$. Term has it's own name (likelihood) to make the differene
to the model. The likelihood function is unnormalized probability
distribution describing uncertainty related to $\theta$ (and that's
why Bayes rule has the normalization term to get the posterior
distribution).
\end{itemize}
\subsection*{Two types of uncertainty}
Epistemic and aletory uncertainty are reviewed nicely in the article:
Tony O'Hagan, "Dicing with the unknown"
Significance 1(3):132-133, 2004. \url{http://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2004.00050.x/abstract}
There is one typo using the word \textit{aleatory} instead of \textit{epistemic} (if you notice this, it's then quite obvious).
\subsection*{Transformation of variables}
\begin{itemize}
\item BDA3 p. 21
\end{itemize}
\subsection*{Ambiguous notation in statistics}
\begin{itemize}
\item[] In $p(y|\theta)$
\begin{itemize}
\item[-] $y$ can be variable or value
\begin{itemize}
\item[] we could clarify by using $p(Y|\theta)$ or $p(y|\theta)$
\end{itemize}
\item[-] $\theta$ can be variable or value
\begin{itemize}
\item[] we could clarify by using $p(y|\Theta)$ or $p(y|\theta)$
\end{itemize}
\item[-] $p$ can be a discrete or continuous function of $y$ or $\theta$
\begin{itemize}
\item[] we could clarify by using $P_Y$, $P_\Theta$, $p_Y$ or $p_\Theta$
\end{itemize}
\item[-]
$P_Y(Y|\Theta=\theta)$ is a probability mass function, sampling distribution, observation model
\item[-]
$P(Y=y|\Theta=\theta)$ is a probability
\item[-]
$P_\Theta(Y=y|\Theta)$ is a likelihood function (can be discrete or continuous)
\item[-] $p_Y(Y|\Theta=\theta)$ is a probability density function, sampling distrbution, observation model
\item[-] $p(Y=y|\Theta=\theta)$ is a density
\item[-] $p_\Theta(Y=y|\Theta)$ is a likelihood function (can be discrete or continuous)
\item[-] $y$ and $\theta$ can also be mix of continuous and discrete
\item[-] Due to the sloppines sometimes likelihood is used to refer
$P_{Y,\theta}(Y|\Theta)$, $p_{Y,\theta}(Y|\Theta)$
\end{itemize}
\end{itemize}
\subsection*{Exchangeability}
You don't need to understand or use the term exchangeability before
Chapter 5 and Lecture 7. At this point and until Chapter 5 and Lecture
7, it is sufficient that you know that 1) independence is stronger
condition than exchangeability, 2) independence implies
exchangeability, 3) exchangeability does not imply independence, 4)
exchangeability is related to what information is available instead of
the properties of unknown underlying data generating mechanism. If
you want to know more about exchangeability right now, then read BDA
Section 5.2 and BDA\_notes\_ch5.
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End: