Conditioning and Independence

\[\newcommand{\ind}{\perp\kern-5pt\perp}\]

Conditioning

Conditional probability is defined as $\mathbb P(A\lvert B)=\frac{\mathbb P(A\cap B)}{\mathbb P(B)}$, assuming $\mathbb P(B)>0$. Conditioning makes the events outside of $B$ have zero probability. Also note $A\subseteq B\implies\mathbb P(A\lvert B)=\frac{\mathbb P(A)}{\mathbb P(B)}$.

Conditional probabilities share properties of ordinary probabilities.

Axiom	Formula
Non negativity	$\mathbb P(A\lvert B)\geq 0$
Normalization	$\mathbb P(\Omega\lvert B)=1$
(finite) Additivity	$\begin{align}A_i\cap A_j=\emptyset, i\neq&j\implies\\\mathbb P(A_1\cup A_2\cup\dots A_n\vert B)&=\sum_{i=1}^n\mathbb P(A_i\lvert B)\end{align}$

Main three tools

Obtained from application of the definition $\mathbb P(A\cap B)=\mathbb P(A)\mathbb P(B\lvert A)$.

Name	Rule
Multiplication rule	$\mathbb P(A_1\cap A_2\cap\dots A_n)=\mathbb P(A_1)\prod_{i=2}^n\mathbb P(A_i\lvert A_1\cap\dots A_{i-1})$
Total probability theorem	$\mathbb P(B)=\sum_{i=1}^n\mathbb P(A_i)\mathbb P(B\lvert A_i)$
Bayes’ rule	$\displaystyle\mathbb P(A_i\lvert B)=\frac{\mathbb P(A_i\cap B)}{\mathbb P(B)}=\frac{\mathbb P(A_i)\mathbb P(B\lvert A_i)}{\sum_{i=1}^n\mathbb P(A_i)\mathbb P(B\lvert A_i)}$

Bayesian inference, is a three step procedure:

Draw an initial belief $\mathbb P(A)$ on possible causes of $B$
Model $B$ under each $A_i$, that is $\mathbb P(B\lvert A_i)$
Draw conclusions about causes, $\mathbb P(A_i\lvert B)$

When building a model, create a separate branch for every combination events.

Independence

$A$ is independent from $B$ when $\mathbb P(A\lvert B)=\mathbb P(A)$. Accordingly, $\mathbb P(A\cap B)=\mathbb P(A)\mathbb P(B)$. Often, independence is denoted with $A\ind B$, is symmetric ($A\ind B\iff B\ind A$), allows $\mathbb P(B)=0$ (as opposed to conditioning), and extends to complements ($A\ind B\iff A\ind B^C$).

Conditioning, however, affects independence, and $\mathbb P(A\cap B\lvert C)=\mathbb P(A\lvert C)\mathbb P(B\lvert C)$ does not mean that $\mathbb P(A\cap B\lvert C^C)=\mathbb P(A\lvert C^C)\mathbb P(B\lvert C^C)$.

Also, pairwise independence $\mathbb P(A_i\cap A_j)=\mathbb P(A_i)\mathbb P(A_j), i\neq j$ does not imply collective independence $\mathbb P(A_1\cap A_2\cap\dots A_n)=\mathbb P(A_2)\mathbb P(A_2)\dots\mathbb P(A_n)$.

Reliability

Reliability is good example of joint probability of independent events. In the table below, $U$ and $D$ indicate that the link is up or down, respectively.

Mode	Diagram	Probability
Serial		$\mathbb P(U)=\mathbb P(U_1\cap U_2)=pq$
Parallel		$\mathbb P(D)=\mathbb P(D_1\cap D_2)=(1-p)(1-q)$ $\mathbb P(U)=\mathbb P(U_1\cup U_2)=p+q-pq$ (according to inclusion-exclusion formula)

Assuming $L_i\in\{0,1\}$ indicates that the network is up when the link $i$ has failed, then the probability of the network being up despite one element failing is $\mathbb P(L)=\sum_{i=1}^n \frac{p_i}{\sum_{i=1}^np_i}L_i$.

Go back to the syllabi breakdown.

Back-up

Logical operators

When assessing events, or their combinations, logical operators help expressing in an alternative (and somehow more natural) way set algebra operations. Indeed the following expressions are equivalent.

description	logical operator	set algebra operator
$A$ or $B$	$A\lor B$	$A\cup B$
$A$ and $B$	$A\land B$	$A\cap B$
not $A$	$\lnot A$	$A^C$
$A$ implies $B$	$A\implies B$	$A\subseteq B$

Sum of non negative quantities

Assume you have a collection of $k$ non negative elements $\{a_i\}$, s.t. $a_i\ge0$ for $i=1,\dots,k$. Let’s define a series of events $A_i=\left(a_i\lt\frac1k\varepsilon\right)$ and an event $C=\left(\sum_{i=1}^ka_i\lt\varepsilon\right)$, accordingly in natural language we can say “if every $a_i$ is smaller than $\frac1k\varepsilon$, then the sum of $k$ elements $a_i$ will be smaller than $\varepsilon$”.

\[\begin{align} \bigwedge\limits_{i=1}^k A_i\implies C&&\text{equivalent to}&&\bigcap_{i=1}^k A_i\subseteq C \end{align}\]

If we negate (complement) the above expressions we obtain the following statement “if the sum of $k$ elements $a_i$ is larger or equal than $\varepsilon$, then at least one of $a_i$ is larger or equal than $\frac1k\varepsilon$”.

\[\begin{align} \lnot C\implies\bigvee\limits_{i=1}^k\lnot A_i&&\text{equivalent to}&&C^C\subseteq\bigcup_{i=1}^k A_i^C \end{align}\]

By computing probabilities of the expression described with set algebra (on the right), we obtain the following relation.

\[\begin{align} \mathbb P(C^C)&\le\mathbb P\left(\bigcup_{i=1}^k A_i^C\right)\\ \mathbb P\left(\sum_{i=1}^k a_i\ge\varepsilon\right)&\le\mathbb P\left(\bigcup_{i=1}^k\left(a_i\ge\frac1k\varepsilon\right)\right)&\text{($A_i\ind A_j$ for every $i\neq j$)}\\ &\le\sum_{i=1}^k\mathbb P\left(a_i\ge\frac1k\varepsilon\right)\\ &=k\mathbb P\left(a_i\ge\frac1k\varepsilon\right) \end{align}\]

An alternative result could be obtained starting with the following statement “if the sum of $a_i$ is bigger than $\varepsilon$, and one of them is smaller than $\frac1k\varepsilon$, then the sum of the remaining elements is larger or equal than $\frac{k-1}k\varepsilon$”

\[\begin{align} (\lnot C)\land(A_i)\implies&\left(\sum_{j\neq i}a_j\ge\frac{k-1}k\varepsilon\right)&\text{($C\ind A_i$)}\\ \mathbb P\left(\sum_{i=1}^k a_i\ge\varepsilon\right)\mathbb P\left(a_i\lt\frac1k\varepsilon\right)\le\mathbb P&\left(\sum_{j\neq i}a_j\ge\frac{k-1}k\varepsilon\right)\\ \mathbb P\left(\sum_{i=1}^k a_i\ge\varepsilon\right)\left(1-\mathbb P\left(a_i\ge\frac1k\varepsilon\right)\right)\le\mathbb P&\left(\sum_{j\neq i}a_j\ge\frac{k-1}k\varepsilon\right) \end{align}\]

description	logical operator	set algebra operator
\(A\) or \(B\)	\(A\lor B\)	\(A\cup B\)
\(A\) and \(B\)	\(A\land B\)	\(A\cap B\)
not \(A\)	\(\lnot A\)	\(A^C\)
\(A\) implies \(B\)	\(A\implies B\)	\(A\subseteq B\)

Axiom	Formula
Non negativity	\(\mathbb P(A\lvert B)\geq 0\)
Normalization	\(\mathbb P(\Omega\lvert B)=1\)
(finite) Additivity	\(\begin{align}A_i\cap A_j=\emptyset, i\neq&j\implies\\\mathbb P(A_1\cup A_2\cup\dots A_n\vert B)&=\sum_{i=1}^n\mathbb P(A_i\lvert B)\end{align}\)

Name	Rule
Multiplication rule	\(\mathbb P(A_1\cap A_2\cap\dots A_n)=\mathbb P(A_1)\prod_{i=2}^n\mathbb P(A_i\lvert A_1\cap\dots A_{i-1})\)
Total probability theorem	\(\mathbb P(B)=\sum_{i=1}^n\mathbb P(A_i)\mathbb P(B\lvert A_i)\)
Bayes’ rule	\(\displaystyle\mathbb P(A_i\lvert B)=\frac{\mathbb P(A_i\cap B)}{\mathbb P(B)}=\frac{\mathbb P(A_i)\mathbb P(B\lvert A_i)}{\sum_{i=1}^n\mathbb P(A_i)\mathbb P(B\lvert A_i)}\)

Mode	Diagram	Probability
Serial		\(\mathbb P(U)=\mathbb P(U_1\cap U_2)=pq\)
Parallel		\(\mathbb P(D)=\mathbb P(D_1\cap D_2)=(1-p)(1-q)\) \(\mathbb P(U)=\mathbb P(U_1\cup U_2)=p+q-pq\) (according to inclusion-exclusion formula)