KWHMath: July 2023

This is a preliminary draft.

$\def\bA{{\blue A}} \def\gB{{\green B}}$

First, we give a diagrammatic version of the Bayes situation,

using the abstract letters $\bA$ and $\gB$ for two subsets of the larger, but finite, set $X$:

\[ \boxed{ \begin{array} {} && \bA \rlap{ \blue{ \xrightarrow [\kern9em] {\textstyle {A \over \black X}} } } &&&& X \\ & \blue{ \llap{ A \cap \gB \over A } \nearrow } \rlap{ \scriptstyle \kern.8em \red{\text{Bayes}} } && \searrow \rlap{\bA \over \gB} && \green{ \nearrow \rlap{ B \over \black X } } \\ \rightadj{ \boxed{\bA \mathrel{\rightadj \cap} \gB} } \rlap{ \green{ \xrightarrow [\textstyle {\bA \mathrel{\rightadj \cap} B \over B} ] {\kern9em} } } &&&& \gB \\ \end{array} } \]

Here the expressions $\bA, \gB, \bA \mathrel{\rightadj \cap} \gB, X$,

when they are vertices of the diagram, represent sets,

and the arrows between them represent set-theoretic inclusion.

On the other hand,

when those expressions are part of the labels of the arrows of the diagram,

they represent the natural numbers which are the number of elements in the corresponding set.

Thus the quotients shown are either absolute or relative (i.e., conditional) probabilities:

the two shown arrows going into $X$ are labeled with absolute probabilities,

while the two shown arrows going out of $\bA \mathrel{\rightadj \cap} \gB$ are labeled with relative or conditional probabilities.

Now we present Bayes' theorem as it is traditionally stated, using only numbers and elementary algebra.

In its simplest, clearest, form, it is the top equation, Bayes-0, (which is trivially true) in the box below.

{For the time being, ignore the lower equation;

it differs from the top equation only in notation and a trivial equality.)

\[ \boxed{ \begin{array} {ccccc|l} {H \cap D \over D} & \xlongequal{\text{Bayes-0}} & {H \cap D \over H} & \times & \red{ {H \over D} } & \text{ numbers} \\ \\ \hline \\ P(H|D) & \xlongequal{\text{Bayes}} & P(D|H) & \times & \red{ P(H) \over P(D) } & \text{ (sub)sets or properties} \\ \end{array} } \]

That's it! That bit of very elementary algebra

(which we might call "the switch of denominators, i.e. contexts", relative to ${H\cap D}$)

in the top line, Bayes-0,

is the essence of Bayes' Theorem.

What remains is to explain how that relates to the setup for, and traditional statement of, Bayes' Theorem.

I.e., what it all means.

Here is the setup for the situation.

For concreteness, we will consider a special case of the general situation which is easily understood.

Suppose we have a population, a specific group of Americans.

We poll them and ask two questions of them: are you a Democrat (a member of the Democratic Party), and are you a homosexual?

We will call the set of people self-describing as Democrats set $\boxed D$, and similarly we let set $\boxed H$ consist of those who self-describe as homosexuals.

We also, for simplicity, let the symbols $D$ and $H$ denote the number of individuals in each set.

The context should make clear whether the symbol denotes a set or a number.

Next we have the intersection $H \cap D$ of sets $D$ and $H$, consisting of those individuals who described themselves as both Democrats and homosexual.

We denote this set by $\boxed{ H\cap D }$. So $H \cap D$, is the set of homosexual Democrats.

Again, $H\cap D$ will also denote the number of individuals in that set.

With that setup and those definitions out of the way, we can now pose some questions.

1. What fraction of the Democrats are homosexual? The answer is easy, it is the fraction $H\cap D \over D$.

2. What fraction of the homosexuals are Democrats? The answer is again easy, it is the fraction $H\cap D\over H$.

3. What relation, if any, is there between those two fractions?

The answer is given in the boxed equation Bayes-0 above (whose proof is a triviality), which is called "Bayes' Theorem".

4. What is the use of this equation? Well, if you know any two of those three ratios, it shows the third is determined, and gives you an easy way to calculate it.

Let us illustrate this with a hypothetical numerical example.

Suppose 80% of homosexuals were Democrats, so ${H\cap D \over H} = .8$, and

the ratio of Democrats to ALL homosexuals (20% of whom are not Democrats) was 10 to 1, so $\red{ {H\over D} = .1 }$.

Then, instantiating the equation Bayes-0 with these two assumptions, we get

\[ \boxed{ \begin{array} {} {H\cap D \over D} & \xlongequal{\text{Bayes-0}} & {H\cap D \over H} & \times & \red{ {H \over D} } \\ \\ \hline \\ {H\cap D \over D} & \xlongequal{\text{Bayes-0}} & .8 & \times & \red{.1} & = & .08 \end{array} } \]

i.e. our two assumptions imply 8% of Democrats are homosexual.

For a diagrammatic presentation (with some possible numbers for population sizes), see

\[ \boxed{ \begin{array} {} H = 10 & \xleftarrow{ \textstyle {H\cap D \over H} = .8 } & H\cap D = 8 \\ \downarrow & \red{ \searrow \rlap{ {H\over D} = .1 } } & \downarrow \rlap{ {H\cap D \over D} = .08 } \\ X & \xleftarrow{} & D = 100 \\ \end{array} } \]

You can visualize this geometrically.

Imagine a strip divided into three regions.

The first is two units long.

The second is eight units long.

The last is 92 units long.

Let $H$ be the union of the first two parts,

$D$ the union of the last two parts.

Then $H\cap D$ is the second part.

\[ \color{lightpink} { H=10 \atop \Rule{10mm}{5mm}{0mm} } {} \rlap{ \kern-6mm 8 } {} \rlap{ \color{blue} { \kern-10mm \lower5ex { \Rule{100mm}{5mm}{0mm} \atop D=100 } } } \]

(This is a form of Venn diagram.)

So under our two assumptions we have

80% of homosexuals are Democrats, while

8% of Democrats are homosexuals.

Further, these two statistics IMPLY the ratio of Democrats to homosexuals is 10 to 1,

since any two of the ratios in the Bayes equation determine the third.

The fact that we want to stress is:

If 80% of homosexuals are Democrats,

that DOES NOT imply that

80% of Democrats are homosexuals.

I.e., you can't invert conditional probabilities.

In words, the fact that $H\cap D$ is quite large relative to $H$ (80% in our hypothetical example) says nothing about how large $H\cap D$ is relative to $D$.

That depends entirely on the ratio of $D$ to $H$.

Precisely, Bayes Theorem says:

\[ \boxed{ \begin{array} {} { {H\cap D}/H \over {H\cap D}/D } & \xlongequal {\text{Bayes-0}} & \red{ {D \over H} } \\ \\ \hline \\ {80\% \over 8\%} & \xlongequal {\text{Bayes}} & \red{ {10 \over 1} } \\ \end{array} } \; . \]

There is a folk saying related to this phenomenon:

"A big fish in a small pond versus

a small fish in a large pond."

Here clearly $H\cap D$ (the homosexual Democrats) is (are) the "fish",

while the "small pond" and "large pond" are respectively the homosexuals and the Democrats.

Bayes theorem states the relation between the various ratios involved,

the fish to each pond, and between the ponds.

To put it in common sense terms:

If you know the ratio of the pond sizes,

and the fraction of the small pond the fish takes up,

then you can easily calculate the fraction of the large pond it would take up.

E.g.,

if the fish is half of the small pond,

and the large pond is three times the size of the small pond,

then the fish would be $1/6$ of the large pond.

For another, perhaps hypothetical, example,

if 90% of men with prostate cancer have high levels of PSA,

that DOES NOT imply that

90% of men with high levels of PSA will get prostate cancer.

There is a geometrical way of viewing this situation.

Imagine two rectangles, one with its long side horizontal, labeled $H$,

and one with its long side vertical, labeled $V$.

Suppose they have some overlap, i.e., a non-empty intersection, labeled $H\cap V$.

Again, we let the same label stand for both

the designated geometric region (a set of points) and

its area (a number).

Considering the three numbers, i.e. the areas,

as before we have the three ratios and the simple relation between those ratios

\[ \boxed{ \begin{array} {} H\cap V \over V & \xlongequal{\text{Bayes}} & H\cap V \over H & \times & \red{ {H \over V} } \\ \\ \hline \\ P(H|V) & \xlongequal{\text{Bayes}} & P(V|H) & \times & \red{ P(H) \over P(V) } \\ \end{array} } \]

relating the ratios of their areas.

In fact, this same equation applies to any two measurable regions with a measurable intersection, but it is easier to visualize for rectangles.

For the general situation, 3Blue1Brown has a good elementary discussion:

https://youtube.com/watch?v=HZGCoVF3YvM

Here we instantiate our original presentation of Bayes' Theorem into an instance relevant to that video.

Here $L$ = the set of librarians (or the number of such), and $B$ = the set of book-lovers (or the number of such).

\[ \boxed{ \begin{array} {ccccc|l} {B \cap L \over L} & \xlongequal{\text{Bayes-0}} & {B \cap L \over B} & \times & \red{ {B \over L} } & \text{ numbers} \\ \\ \hline \\ P(B|L) & \xlongequal{\text{Bayes}} & P(L|B) & \times & \red{ P(B) \over P(L} ) & \text{ (sub)sets or properties} \\ \hline \text{high} && \text{low} & \times & \red{\text{large}} & \text{ qualitative description} \\ \hline .8 & = & .016 & \times & \red{50} & \text{ made-up, but plausible, numbers} \end{array} } \]

Here we repeat our original description of Bayes' Theorem, but using the notation of that video, and introduce the terminology it uses for various parts of the equation.

Here $H$ = hypothesis, $E$ = evidence.

There is also a five minute video:

https://youtu.be/XQoLVl31ZfQ

-----

Diagrams in preliminary states:

\[ \boxed{ \begin{array} {} \kern2em & {} \rlap{ \xrightarrow [\kern8em] {L\cap B = 8} } && \kern1em && {} \rlap{ \xrightarrow [\kern8em] {L=10} } \\ && \llap{L = 10} \searrow && \llap{ {L \cap B \over L} \kern-.5em } \nearrow & \kern1em {} \rlap{ \lower3ex{ \scriptstyle \kern-2em \text{Bayes-0} } } & \llap{ {L \over B} \kern-.5em } \searrow && \nearrow \rlap{ B = 500 } & \kern4em \\ &&& {} \rlap{ \xrightarrow [ .016 = {8 \over 500} = {L \cap B \over B } = P(L|B) ] {\kern8em} } &&&& \\ \end{array} } \]

\[ \boxed{ \begin{array} {} && \green{B \cap L = 4} \\ & \green{ \llap{ \boxed{{L\over B\cap L } = {10\over 4} = 2.5 } } \swarrow } && \red{ \nwarrow \rlap { {B\cap L \over B\cap F} = {4\over 20} } } \\ \green{ \boxed{L = 10} } & & \green\downarrow && \\ && & && \\ \green\downarrow & \red{ \nwarrow \rlap{ \scriptstyle {L\over F} = {10\over 200} } } & B & \xleftarrow {} & \green{B \cap F = 20} \\ & \swarrow && \swarrow \rlap{ \boxed{{F\over B\cap F} = {200\over 20} = 10} } \\ X & \xleftarrow{} & \boxed{F = 200} \\ \end{array} } \]

An equation:

\[ \boxed{ \begin{array} {} \red{4\over 20} & = & \green{4\over 10} & \times & \red{10\over 200} & \times & {200\over 20} \\ \hline \red{B\cap L \over B\cap F} & = & \green{B\cap L \over L} & \times & \red{L \over F} & \times & {F \over B\cap F} \\ \hline & = & {{B\cap L} / L} \over {{B\cap F} / F} & \times & \red{L \over F} \\ \hline & = & {4/10} \over {20/200} & \times & \red{10 \over 200} \\ \hline & = & 4 & \times & \red{1 \over 20} \\ \end{array} } \]

Two bipartite partitions of a set, say $X$, yields this diagram:

The general case, say $X=A+B$ and $X=W+L$:

\[ \boxed{ \begin{array} {} \kern1em & A\cap W & \xrightarrow{ \textstyle {A\cap W \over W} } & W & \xleftarrow{ \textstyle {B\cap W \over W} } & B\cap W & \kern1em \\ & \llap{ A\cap W \over A } \Bigg\downarrow & \raise1ex{ A\cap W \over X } & \Bigg\downarrow \rlap{ W\over X } & \raise1ex{ B\cap W \over X } & \Bigg\downarrow \rlap{ B\cap W \over B } & \\ & A & \xrightarrow { \raise0ex{ \smash{ \textstyle{ A\over X } } } } & X & \xleftarrow { \raise0ex { \smash{ \textstyle{B\over X} } } } & B & \\ & \llap{ {A\cap L \over A} } \Bigg\uparrow & A\cap L \over X & \Bigg\uparrow \rlap{ L\over X } & B\cap L \over X & \Bigg\uparrow \rlap{ B\cap L \over B } & \\ & A\cap L & \xrightarrow [ \textstyle{A\cap L \over L} ] {} & L & \xleftarrow [ \textstyle{B\cap L \over L} ] {} & B\cap L & \\ \end{array} } \]

For the special case in Section 9.4 of Fat Chance, A=Left, B=Right and W=Tracy, L=Paul,

we get this diagram:

\[ \boxed{ \begin{array} {} \kern4em & T\cap L = 15 & \xrightarrow{} & T = 47 & \xleftarrow{} & T\cap R = 32 & \kern4em \\ & \llap{ \boxed{ {T\cap L \over L} = {3\over 4} } } \Bigg\downarrow && \Bigg\downarrow && \Bigg\downarrow \rlap{ \boxed{ {T\cap R \over R} = {2\over 5} } } & \\ & L = 20 & \xrightarrow { \raise2ex{ \smash{ \boxed{ {L\over X} = {1\over 5} = .2 } } } } & X = 100 & \xleftarrow { \raise2ex { \smash{ \boxed{ {R\over X} = {4\over 5} = .8 } } } } & R = 80 & \\ & \llap{ \boxed{ {P\cap L \over L} = {1\over 4} } } \Bigg\uparrow && \Bigg\uparrow && \Bigg\uparrow \rlap{ \boxed{ {P\cap R \over R} = {3\over 5} } } & \\ & P\cap L = 5 & \xrightarrow{} & P = 53 & \xleftarrow{} & P\cap R = 48 & \\ \end{array} } \]

--------

The following pertains to the video https://youtu.be/R13BD8qKeTg .

The situation is this:

There is a certain disease $\boxed H$, and a certain test $\boxed E$ for that disease.

(See below for the letters.)

You have been tested for the disease,

and tested positive.

But that doesn't necessarily mean you have it - the test is not a perfect indicator.

Some people without the disease test positive (false positives). $\lnot H \land E = E-H$.

And some people with the disease will not be detected by the test (false negatives). $H \land \lnot E = H-E$.

If the test were a perfect indicator we would have $H=E$ and both the above differences would be zero (or empty).

To analyze this situation,

let $\boxed H$ be the hypothesis, that you have the disease. Also the number representing the probability that you have it, that is, the fraction of the general population that has it.

Let $\boxed E$ be the evidence, that you tested positive under the test.

Also the number representing the probability that a general member of the population will test positive in that particular test.

So clearly what we are interested in is $\boxed{P(H|E) = {H\land E \over E}}$, that is, if you tested positive in that particular test (the E), what is the probability that you actually have the disease (the H)?

In the video, the narrator gives three items of numerical information (numbers), using words to describe what those numbers mean:

$\boxed{P(H) = .001 = {1 \over 1000}}$ = fraction of people with the disease.

$\boxed{P(E|H) = {H\land E \over H} = .99 = {99\over 100}}$ = fraction of those with the disease who test positive = the rate of valid positives.

$\boxed{P(E|\lnot H) = {\lnot H \land E \over \lnot H} = .01 = {1\over 100}}$ = fraction of those without the disease who test positive = the rate of false positives.

That is all he tells you.

Those are the knowns.

Note that the last two tell you what the testing probabilities will be IF you know whether you have the disease or not.

I.e. they are backward from what we want : to go from test result to disease probability.

From that we infer

$P(\lnot H) = 1-P(H) = 1-.001 = .999$ (the fraction without the disease).

Now let's see how we can use that information to solve the problem,

i.e. calculate P(H|E).

First we must calculate P(E):

\[ \begin{array} {} P(E) & = && P\big( E\land {(H\lor \lnot H)} \big) \\ & = && P\big( {(E\land H)} \lor {(E\land \lnot H)} \big) \\ & = & P(E\land H) & + & P(E\land{\lnot H}) \\ & = & \boxed{P(E|H)}\,\boxed{P(H)} & + & \boxed{P(E|\lnot H}\,P(\lnot H) \\ & = & \boxed{.99} \times \boxed{.001} & + & \boxed{.01} \times .999 \\ & = & .00099 & + & .00999 \\ && \text{valid positives} && \text{false positives} \\ & = && .01098 \\ & \sim && 1.1\% \\ &&& \text{all positives} \\ \end{array} \]

With that calculation out of the way, now we may apply the simple, basic Bayes' Theorem to get the answer:

First a little recall:

Again, $H$ = hypothesis = you have the disease,

$E$ = evidence = your test returned positive.

\[ \boxed{ \begin{array} {ccccc|l} H\cap E \over E & \xlongequal{\text{Bayes-0}} & H\cap E \over H & \times & \red{ H \over E } & \text{ numbers} \\ \\ \hline \\ P(H|E) & \xlongequal{\text{Bayes}} & P(E|H) & \times & \red{ P(H) \over P(E) } & \text{ (sub)sets or properties} \\ \hline \text{posterior} & \xlongequal{\text{Bayes}} & \text{likelihood} & \times & \red{ \text{prior = hypothesis} \over {} \text{evidence} } & \text{ descriptive words} \\ \\ \hline \\ P(H|E) & \xlongequal{\text{Bayes}} & .99 & \times & \red{ .001 \over .01098 } & \\ & = & .99 & \times & \red{ 1 \over 10.98 } \\ & \sim & .99 & \times & \red{ 1 \over 11 } \\ & = & .09 \\ & = & 9\% \\ \end{array} } \]

And that is the answer the narrator gives, but we went through the details,

and found the various intermediate results, which have their own interest.

-----------------

Let us put some of the above information in a diagram, where the information provided by the narrator is given in boxes:

\[ \begin{array} {} && \text{valid positives} && \leftadj{ \text{all positives} } && \text{false positives} \\ && H\land E \atop P(H\land E) = {99\over100,000} \sim {1\over1,000} & \xrightarrow [{99\over\leftadj{1,098}} \sim {1\over\leftadj{11}}] { H\land E \over \leftadj E} & \leftadj{ E \atop P(E) = {(\blue{99}+\blue{999})=1,098\over100,000} \sim {11\over 1,000} } & \xleftarrow [{999\over\leftadj{1,098}} \sim {10\over\leftadj{11}}] { \lnot H\land E \over \leftadj E} & \lnot H \land E \atop P(\lnot H \land E) = {999\over100,000} \sim {1\over100} = {10\over1,000} \\ && \llap{\text{rate of valid positives} \; \boxed{ {H\land E \over H} = {99\over100} } } \Bigg\downarrow &&&& \Bigg\downarrow \rlap{ \boxed{ {\lnot H \land E \over \lnot H} = {1\over100} } \; \text{rate of false positives} } \\ && \boxed{ H \atop P(H) = .001 = {1\over1000} = {100\over100,000} } &&&& \lnot H \atop P(\lnot H) = .999 = {999\over1000} = {99,900\over100,000} \\ && \text{have the disease} &&&& \text{don't have the disease} \\ \end{array} \]

Here the input parameters are varied to consider another case

(where the test is much more successful),

where the test is 100% successful on those who have the disease,

but for those who don't have the disease, fails only $1\over1K$ of the time:

\[ \begin{array} {} && \text{valid positives} && \leftadj{ \text{all positives} } && \text{false positives} \\ && H\land E \atop P(H\land E) = {1\over 1K} = {1K\over 1M} & \xrightarrow [{1K\over \leftadj{1,999}} \sim {1\over{\leftadj2}}] {H\land E \over \leftadj E} & \leftadj{ E \atop { P(E) = {(\blue{1K}+\blue{999} = 1,999)\over1M} \sim {2K\over1M} = {2\over1K} } } & \xleftarrow { \lnot H\land E \over \leftadj E} & \lnot H \land E \atop P(\lnot H \land E) = {999\over1M} \sim {1K\over 1M} \\ && \llap{ \text{rate of valid positives} \; \boxed{ {H\land E \over H} = 1 } } \Bigg\downarrow &&&& \Bigg\downarrow \rlap{ \boxed{ {\lnot H \land E \over \lnot H} = {1\over1K} = .001 } \; \text{rate of false positives} } \\ && \boxed{ H \atop P(H) = .001 = {1\over 1K} } &&&& \lnot H \atop P(\lnot H) = .999 = {999\over 1K} \sim 1 \\ && \text{have the disease} &&&& \text{don't have the disease} \\ \end{array} \]

Let us now tackle the issues of independence and correlation, whether positive or negative.

Here are several equivalent formulations of positive correlation, showing the symmetry of the formulations.

For negative correlation or independence,

merely replace $\lt$, shown here in vertical mode as $\vee$,

by $\gt$ or $=$.

\[ \boxed{ \begin{array} {ccccc|ccccccccc|ccccc} {H\cap D = HD \over H} & = & {8\over10} & = & {4\over5} & \bA \cap \gB \over \bA && {\bA \cap \gB \over \bA} \times {\bA \over X} & = & \bA \cap \gB \over X & = & {\bA \cap \gB \over \gB} \times {\gB \over X} && \bA \cap \gB \over \gB & {H\cap D = HD \over D} & = & {8\over100} & = & {2\over25} \\ \vee & \iff & \vee & \iff & \vee & \vee & \iff & \vee & \iff & \vee & \iff & \vee & \iff & \vee & \vee & \iff & \vee & \iff & \vee \\ {D\over X} & = & {100\over200} & = & {1\over 2} & \gB \over X && {\bA \over X} \times {\gB \over X} & = & {\bA \over X} \times {\gB \over X} & = & {\bA \over X} \times {\gB \over X} && \bA \over X & {H\over X} & = & 10\over200 & = & 1\over20 \\ \hline &&&&& {(\bA\cap\gB)/\bA} \over \gB/X &&&& (\bA\cap\gB)/X \over (\bA/X) \times (\gB/X) &&&& {(\bA\cap\gB)/\gB} \over \bA/X &&&&& 2/25 \over 1/20 \\ &&&& 8\over5 &&&&& (\bA\cap\gB) \times X \over \bA \times \gB &&&&&&&&& {40\over25} = {8\over5} \\ \end{array} } \]

In the above box, note that

the leftmost general inequality means a positive correlation between $B$ and $A$, while

the rightmost general inequality means a positive correlation between $A$ and $B$.

KWHMath

Thursday, July 27, 2023

A categorical setting for elementary probability theory

Saturday, July 8, 2023

Bayes' Theorem -- relations between ratios