KWHMath: 2023

Tuesday, December 19, 2023

The mathematics of paradoxes (barbers and diagonal arguments)

This document is repetitive in places. Editing is planned.

We look at some ways to formalize some of the popular “paradoxes”.
This introduces some formalization techniques
in a, hopefully, enjoyable, easily understandable setting.

Note: There are two interplays at work here:

The interplay between (informal language) and (mathematical formalisms),

and, within mathematics, the interplay (in fact a bijection) between

(predicates on a set $U$) and (subsets of that set $U$).

$\def\sigmarel{\mathrel\sigma}$

$\def\Irr{ \text{Irr} } \def\To{\Rightarrow} \def\rtimes{\mathop {\rightadj\times}} \def\rtop{{\rightadj\top}} \def\lbot{{\leftadj\bot}} \def\rlnot{{\rightadj\lnot}} \def\lDelta{{\leftadj\Delta}} \def\su{{\source u}} \def\tv{{\target v}}$

$\def\S{\mathrel S}$ $\Newextarrow{\xLongrightarrow}{3,2}{0x21D2}$

For the mathematics, let us start with some recollections.

Given any set $\boxed U$,

there is a "diagonal function" (an example of what can be called a "comultiplication") $\boxed{ \lDelta_U : U \mathrel{\leftadj\to} U \rtimes U : u \mapsto \langle u,u \rangle }$.

In cases where $U$ is evident, the subscript $U$ can be omitted.

Also the set $\boxed{ \bftwo = \{\lbot, \rtop\} }$ has a "negation" endofunction $\boxed{ \rlnot : 2 \mathrel{ \rightadj\to } 2 : \lbot \mapsto \rtop, \rtop \mapsto \lbot }$) which interchanges $\lbot,\rtop$.

The negation endofunction $\rlnot$ is of course a permutation on a finite set,

and may be written using cycle notation as a 2-cycle, the transposition $(\lbot,\rtop)$.

It will later be of significance that this endofunction does not have a fixed point, i.e., is what combinatorists call "a derangement".

Given $\lDelta$ and $\rlnot$, for any relation $\boxed{ R: U\times U \to \bftwo }$,

we can postcompose with $\rlnot$, precompose with $\lDelta$, or both:

\[ \boxed{ \begin{array} {ccccc|l|c|l} R & : & U\times U & \to & \bftwo & \text{basic relation} & \homst u R v & \text{$\su$ is $R$-related to $\tv$} \\ R\rlnot & : & U\times U & \to & \bftwo & \text{negated relation} & \rlnot {\homst u R v} & \text{$\su$ is not $R$-related to $\tv$} \\ \lDelta R & : & U & \to & \bftwo & \text{diagonal of the basic relation} & \hom u R u & \text{$u$ is $R$-related to itself} \\ \lDelta R \rlnot & : & U & \to & \bftwo & \text{diagonal of the negated relation} = {} & \rlnot {\hom u R u} & \text{$u$ is not $R$-related to itself} \\ &&&&& \text{negation of the basic diagonal} \\ \end{array} } \]

The first two rows represent relations on $U$;

the last two rows represent predicates on $U$.

It is the last row, (the negated diagonal) or (the diagonal of the negated relation),

that is essential to the "diagonal arguments" to be discussed.

For want of a better name, I call it the "Irreflexivity" predicate for $R$, $\boxed{ \Irr_R = \lDelta R \rlnot }$.

This last plays such a large role in the development

that it is useful to show in detail how it acts on elements $u\in U$:

\[ \boxed{ \begin{array} {} \boxed{\Irr_R} & : & U & \xrightarrow[\kern3em]{\textstyle \lDelta}& U \rtimes U & \xrightarrow[\kern3em]{\textstyle \boxed{R}} & \bftwo & \xrightarrow[\kern3em]{\textstyle \rlnot} & \bftwo \\ && u & \mapsto & \langle u,u \rangle & \mapsto & \hom u R u & \mapsto & \rlnot \hom u R u \\ \end{array} } \]

This pre- and post- composition business can most perspicuously be presented as a tetrahedron,

where the triple composite $\lDelta R \rlnot$, the Irreflexivity predicate, needs to be imagined as a long northeast diagonal arrow.

\[ \boxed{ \begin{array} {} && U \rtimes U \rlap{ \xrightarrow [\kern9em] {\textstyle R \rlnot} } &&&& \bftwo \\ & \llap{ \lDelta } \nearrow && \searrow \rlap{R} && \nearrow \rlap{\rlnot} \\ U \rlap{ \xrightarrow [\textstyle \lDelta R] {\kern12em} } &&&& \bftwo \\ \end{array} } \]

Note that the two relations are the two arrows whose source is $U \rtimes U$,

while the two predicates are the lower two of the three arrows whose source is $U$.

The basic issue, which is the heart of many of the so-called "paradoxes",

can now be presented very cleanly and simply, using the Irreflexivity predicate.

Theorem (Irreflexivity predicates are never representable)

Given (any binary relation $\boxed {R : U \times U \to 2}$ (2.)) on (any set $\boxed U$ (1.)).

(its Irreflexivity predicate $\boxed{\Irr_R}$ (3.))

cannot be (represented (6) by $R$).

In other words, ($R$-irreflexivity (3.)) is not ($R$-representable (6.)).

(The numbers in parentheses will be used below.

They not only give a time sequence for constructing the argument

but also link together the various ways of presenting the argument.)

There is a well-defined framework within which all these so-called "diagonal arguments" can be placed.

Placing those arguments in this framework helps distinguish "the forest from the trees".

Here we give it in a series of numbered steps.

They assume we are working in the category of sets, but they apply in other suitable categories.

1. Consider a set $\boxed U$, the "universe".

2. Consider a binary relation, $\boxed{ R : U \rtimes U \to (2 = \{\lbot,\rtop\}) : \langle u,v \rangle \mapsto \hom u R v}$ on $U$.

3. Using (the diagonal map $\lDelta : U \leftadj\to U \rtimes U : u \mapsto \langle u,u \rangle$) and (the negation map $\rlnot : 2 \rightadj\to 2 : \lbot \mapsto \rtop, \rtop \mapsto \lbot$),

define (the irreflexivity predicate $\boxed{ \Irr_R = \lDelta R \rlnot : U \to 2 : u \mapsto \rlnot \hom u R u }\;$) on $U$.

4. The question now is, "Can $\Irr_R$ be represented by an element of $U$?".

Suppose it can.

Let $\boxed{ b\in U }$ be the putative representing element.

5. Form the predicate $\boxed{ \hom b R - : U \to 2 }$ on $U$.

6. We now expand the representability assertion.

The assertion that ($b$ represents $\Irr_R$) is

the assertion that (there exists an equality $\boxed{ {(\hom b R - = \Irr_R)} : U \to 2 }$ in $\hom U \Set 2$ ), which is defined to mean

the statement in first order logic ( $(\forall u\in U)(\hom b R u \iff \Irr_R u)$ ), which by the definition of $\Irr$ is

( $(\forall u\in U)(\hom b R u \iff \rlnot \hom u R u )$ ).

7. Instantiate that universal quantification at $b\in U$, getting $\boxed{ \hom b R b \iff \rlnot \hom b R b }$, i.e., $\hom b R b \in 2$ is a fixed point for $\rlnot$.

8. But (the negation $\rlnot : 2 \to 2$) does not have a fixed point ($\rlnot\lbot = \rtop$ and $\rlnot\rtop = \lbot$), so we have reached a contradiction. (In the familiar, concrete, and easily understandable "barber paradox", if we give the English language meaning of $\hom b R b$ in this case, (representability evaluated at $b$) becomes "if the barber shaves himself, then he doesn't shave himself, and if he doesn't shave himself, then he shaves himself".)

9. Therefore such a representing $b\in U$ cannot exist.

Here is a diagram in the category $\Set$ of sets which illustrates that framework:

\[ \boxed{ \begin{array} {} && 1 & \xrightarrow{\textstyle \boxed{b} \; (4.)} & U \\ && \big\uparrow && \big\uparrow \\ && 1 \rtimes U & \xrightarrow{ \smash{\textstyle b \rtimes U} } & U \rtimes U \rlap{ \xrightarrow [\smash{\kern15em}] { \smash{\textstyle \boxed{R} \; (2.)} } } &&&& \bftwo \\ && {\wr\Vert} && \Vert &&&& \Vert \\ && \llap{(1.) \;} U \rlap{ \xrightarrow[\kern27em]{\textstyle \hom b R - \; (5.) } } &&&&&& \bftwo \\ 1 & \xrightarrow { \smash{\textstyle \; \boxed{b} \; (7.) \;} } & \Vert & {} \rlap{ \kern-3.4em \text{? predicate $\Irr_R$ below $R$-representable by $b$ (6.)?} } &&&&& \Vert \\ && \llap{ \boxed{\Irr_R} \; (3.) : {} } U & \leftadj{ \xrightarrow[\textstyle \lDelta]{\kern3em} } & U \rtimes U & \xrightarrow[\textstyle \boxed{R} \; (2.)]{\kern3em} & \bftwo & \rightadj{ \xrightarrow[\textstyle \rlnot] {\kern3em} } & \bftwo \\ && u & \leftadj\mapsto & \langle u,u \rangle & \mapsto & \hom u R u & \rightadj\mapsto & \lnot {\hom u R u} \\ \end{array} } \]

(The numbers in parentheses show a time sequence for building this diagram.)

Now we apply that general result in some special cases.

First, the barber “paradox”, sometimes phrased as

(A town (1.)) has (a barber (4.))

who ((shaves (2.)) all those, and only those (6.))
(who do not shave themselves ("the irreflexives") (3)).

Does (the barber (7.)) shave himself?

We can formalize this situation as follows:

Let $\boxed U$ (for “universe”) be the population in question.
Define a binary relation $\boxed\S$ on $U$ by $u\S v \iff u \mathrel{\text{shaves}} v$.
Form (the irreflexivity predicate $\boxed{\Irr_S(u)} = \rlnot(u\S u)$ on $U$), in words, "$u$ does not shave himself".
Let $\exists \boxed b\in U$ (the barber), the putative barber.
Form the predicate $\hom b S -$, which characterizes the people whom $b$ shaves.
Suppose that $\hom b S -$ represents $\Irr_S$, i.e. $(\forall u\in U)\bigl(b\S u \iff \lnot(u\S u)\bigr)$ - we may say "$\Irr_S$ is $\S$-representable by $b$", or, in the English language, "the barber shaves precisely those who do not shave themselves".
Instantiate the above universal quantification at $b\in U$, obtaining
$\boxed{ (b\S b) \iff \rlnot(b\S b) }$,
a contradiction, since $\rlnot$ does have a fixed point, in particular, $(b\S b)$ is not a fixed point for $\rlnot$.
Conclusion: There cannot exist a $b\in U$ which $S$-represents $\Irr_S$.
In other words, such a barber cannot exist.
However, such a barber can exist, if we strike the requirement that the barber be in the town, and allow him to be outside of the town, i.e. in a larger metropolitan area. Such a $b$ would be outside the scope of applicability of the universal quantification. This is the origin of the large/small distinction which permeates logical studies.

At a "meta"-level, we may note that there are three levels of language being used to describe this situation.

There is the level of category theory, see the diagram above.

I call this the "1-D" level.

Then there is the level of first-order logic, exemplified by such formula as "$(\forall u\in U)\bigl(b\S u \iff \lnot(u\S u)\bigr)$".

I call that the "0-D" level.

Finally, there is the English language.

Examples of that appear above.

<hr / >

It is also worth noting that the problem is caused by the bidirectional nature of the "if and only if";

that is, if we replace the "iff" with a one-directional implication, in either direction,

that one-directional implication can be satisfied.

\[ \boxed{ \begin{array} {ccccc|cccccc} \kern2em & & {} \rlap{ \kern-7em \text{if you don't shave yourself,} } &&&&&& {} \rlap{ \kern-5em \text{if the barber shaves you,} } \\ & & {} \rlap{ \kern-7em \text{then the barber shaves you} } &&&&&& {} \rlap{ \kern-6em \text{then you didn't shave yourself} } \\ \hline & \lnot {(\hom x S x )} & \To & \hom b S x & \kern2em && \kern2em & \hom b S x & \To & \rlnot {(\hom x S x )} \\ \hline & \rlnot {(\hom b S b )} & \To & \hom b S b &&&& \hom b S b & \To & \rlnot {(\hom b S b )} & \kern2em & \red X \\ & \bot && \rtop &&&&& \bot \\ & & \rtop \\ \hline & && &&&&\rlnot {(\hom b S b )} & \To & \rlnot {(\hom b S b )} \\ & &&&&&& \rtop && \rtop \\ & &&&&&&& \rtop \\ \end{array} } \]

The range of possibilities, and their consequences, can be presented by a Venn diagram or an old-fashioned truth table:

Venn diagram (annotated) for $ \big( b S x \iff \rlnot (x S x) \big) $:

\[ \boxed{ \begin{array} {} \kern8em && \lbot \buildrel \rtop \over \Rightarrow \rtop && \kern8em \\ & & \lbot \buildrel \lbot \over \Leftarrow \rtop \\ && \boxed{ \lnot ( \hom b S b ) } \\ \\ & {} \rlap{ \kern-6em \boxed{ \begin{array} {} \\ \rtop \buildrel \rtop \over \Leftrightarrow \rtop \\ \rtop \To \hom b S x & \kern6em \\ \\ \end{array} } } & \begin{array} {} \rtop \buildrel \lbot \over \Rightarrow \lbot \\ \rtop \buildrel \rtop \over \Leftarrow \lbot \\ \boxed{ \hom b S b } \\ \end{array} & {} \rlap{ \kern-6em \boxed{ \begin{array} {} \kern6em \\ \\ & \lbot \buildrel \rtop \over \Leftrightarrow \lbot \\ & \rtop \To \hom x S x \\ \\ \\ \end{array} } } \\ \\ \end{array} } \]

Truth table:

\[ \boxed{ \begin{array} {cccc|cc|cc} \hom b S b & \hom b S x & \hom x S x & \rlnot {(\hom x S x )} & [ \hom b S x \Rightarrow \rlnot {(\hom x S x )} ] & [ \hom b S x \Leftarrow \rlnot {(\hom x S x )} ] & \hom b S x \Leftrightarrow \rlnot {(\hom x S x )} \\ \hline \rtop & \rtop & \rtop & \lbot & \leftadj{ \boxed\bot } & \rtop & \lbot \\ & \rtop & \lbot & \rtop & \rtop & \rtop & \rtop \\ & \lbot & \rtop & \lbot & \rtop & \rtop & \rtop \\ \lbot & \lbot & \lbot & \rtop & \rtop & \leftadj{ \boxed\bot } & \lbot \\ \end{array} } \]

The paradox can be understood in terms of that truth table.

There are three salient facts with regard to that table:

1) the truth values in the second and third columns are opposite,

2) the one-directional implications (in both directions) are between the first and third columns,

3) if we let $x=b$, then the truth values in the first two columns must be identical, thus only the first and last rows of the table are relevant, and for those cases (rows) the truth values in the first and third columns are opposite.

In those cases (rows), precisely one of the one-directional implications is satisfied, not both.

Thus in either case, the truth value of the if-and-only-if is $\lbot$, i.e., "false".

The above argument was presented in classical first-order logic terms.

The argument may also be presented (in terms of category theory), via (a diagram in the category of sets).

(This diagram precisely duplicates the above diagram, except that the letter $R$, for an arbitrary relation, is replaced by the letter $S$, for "shaves".)

\[ \boxed{ \begin{array} {} && 1 & \xrightarrow{\textstyle b \; (3.)} & U \\ && \big\uparrow && \big\uparrow \\ && 1 \times U & \xrightarrow{ \smash{\textstyle b \times U} } & U \times U \rlap{ \xrightarrow [\smash{\kern15em}] { \smash{\textstyle S \; (2.)} } } &&&& \bftwo \\ && {\wr\Vert} && \Vert &&&& \Vert \\ && U \rlap{ \xrightarrow[\kern27em]{\textstyle \hom b S -} } &&&&&& \bftwo \\ 1 & \xrightarrow { \smash{\textstyle \; b \; (4.) \;} } & \Vert & {} \rlap{ \kern-2.8em \Irr_S \text{ $\S$-representable by $b$ (3.) ???} } &&&&& \Vert \\ &\Irr_S \rlap{\; :} & U & \xrightarrow[\textstyle \lDelta]{\kern3em} & U \times U & \xrightarrow[\textstyle S]{\kern3em} & \bftwo & \xrightarrow[\textstyle \lnot] {\kern3em} & \bftwo \\ && u & \mapsto & \langle u,u \rangle & \mapsto & \hom u S u & \mapsto & \rlnot {\hom u S u} \\ \end{array} } \]

The (categorical) point is that

the hypothesized equality (3) that was questioned in "predicate below $\S$-representable by $b$ ???",

when evaluated (4) at $b$, instantly gives a contradiction.

(The paranthesized numbers refer to the text above.)

It is useful to state what we just proved abstractly and mathematically:

Theorem:

Given (any binary relation $\boxed S$ on any set $\boxed U$),

using $\lDelta$ and $\rlnot$ we may form (a predicate $\boxed{ (\text{Irr}_S = \lDelta S \rlnot) }$ on $U$)

which cannot be $S$-represented (i.e., represented by $S$).

<hr />

Leaving (our categorical aside) and (returning to the main text),

that is about as fast as it is possible to dispose of the original question.

It is instructive, however, to compare this proof that (such a barber cannot exist)
to the proof that (there cannot be a truly universal set, i.e., “the set of all sets”),
sometimes called “Russell’s paradox”.
Recall the proof that such a universal set cannot exist:

Let $U$ ("the universe") be the set of all sets.
Let $\text{Irr}_{\in} = \{u\in U \mid u \notin u \}$ (the irreflexives relative to ${\in}$) (axiom schema of separation).
Assume $\text{Irr}_{\in} \in U$.
Consider the proposition: $\text{Irr}_{\in} \in \text{Irr}_{\in}$. Assume it is either true or false. Consider each alternative:
$\text{Irr}_{\in} \in \text{Irr}_{\in} \xLongrightarrow{\text{defn $\text{Irr}_{\in}$}} \text{Irr}_{\in} \notin \text{Irr}_{\in}$.
$\text{Irr}_{\in} \notin \text{Irr}_{\in} \xLongrightarrow{\text{defn $\text{Irr}_{\in}$}} \text{Irr}_{\in} \in\text{Irr}_{\in} $.
Thus in either case we have a contradiction.
Conclusion: $\text{Irr}_{\in}\subseteq U$, but $\text{Irr}_{\in} \notin U$.
Thus the need for a larger universe, which includes both the members of the original universe $U$, and $\text{Irr}_{\in} = \{ u\in U \mathrel | u\notin u \}$.

As a slight variation, we can express the above argument as follows:

Let $U$ (for “universe”) be the population in question, the collection of all sets.

Consider the binary relation $\S$ on $U$ defined by $u\S v \iff u \in v$,

i.e., $S$ is just a name for the membership relation in $U$.

Assume (the predicate $\lnot(u\S u)$ on $U$) is $\S$-representable,

i.e., $\exists X \in U$ such that

$(\forall u\in U)\bigl(u\S X \iff \lnot(u\S u)\bigr)$

(this is (the comprehension scheme) applied to (the predicate $\lnot(u\S u)$ on $U$).

Instantiate the above universal at $X\in U$, obtaining

$\boxed{ \bigl(X\S X \iff \rlnot(X\S X)\bigr) }$,

a contradiction.

Conclusion: There cannot exist an $X\in U$ satisfying the condition asserted for $X$, i.e., $X= \{ u\in U | u\notin u \}$.

Thus, just as

the barber who shaved all those in a town who did not shave themselves

cannot himself be a member of that town,

so $\{ u\in U | u\notin u \}$ cannot itself be a member of $U$,.

Thus the need for another universe to which it can belong.

And if we want the new universe to include the original "universe", our new universe must perforce be a larger universe.

We can express in words parts of the above.

$X = \{u\in U | \rlnot (\hom u S u) \}$ is the set of $\{u\in U\}$ which do not self-$S$-relate, i.e., relate to themselves under $S$.

If that set $X = \{u\in U | \rlnot \hom u S u \}$ is $S$-representable,

then we have our contradiction as above.

<hr />

We can obtain a hybrid of those two proofs as follows:

Let $U$ be the population in question.
Define a binary relation $\S$ on $U$ by $u\S v \iff u \mathrel{\text{shaves}} v$.
Let $X = \{u\in U \mid \rlnot(u\S u) \}$.
Assume $X$ is representable by the relation $\S$, i.e.,
$(\exists b\in U)(u\in X \iff b\S u)$, i.e.,
$X = \{u\in U \mid b\S u\}$.
Ask the equivalent questions:
Does $b\S b$? (This is the barber question.)
Is $b\in X$?
We have
$b\S b \iff b\in X \iff \rlnot(b\S b)$.
A contradiction.
Conclusion: Such a $b$ cannot exist, i.e.,
$X = \{u\in U \mid \rlnot(u\S u)\}$ exists as a legitimate subset of $U$,
but it is not representable by $\S$.

All that is much more than is necessary merely to prove that the hypothesized barber cannot exist,
but contains reasoning, in this simple setting,
that occurs in more complex situations.

We can extend this argument to prove a key result in set theory,
that any set is strictly smaller than its power set (the set of subsets of the original set).
In fact, this is the application in which Georg Cantor originally introduced this argument,
now commonly known as the "Cantor diagonal argument."

Rather than proving the theorem in the form stated above,
we use the result that, given a set $X$,
the set of its subsets is in bijection with (so is of the same size as)
the set of functions from $X$ to $2$.
So we prove that,
for any set $X$, $|X| \lt |2^X|$,
where $2^X = \{\,\text{functions}\, X\to 2\}$.
This is proved by showing that there cannot be a surjection from $X$ to $2^X$,
because the existence of such a surjection would lead to a contradiction.

Let $X$ be a set in $\Set$.
Let $2^X = \hom X \Set 2 = \{\,\text{functions}\, X\buildrel f \over \to 2\}$.
Let \[\begin{array}{} X \times 2^X & \xrightarrow{\text{eval}} & 2\\ \langle x, f \rangle & \mapsto & xf = (x)f\\ \end{array}\] be the evaluation function.
Suppose there exists a surjection $\boxed{X\buildrel \sigma \over \twoheadrightarrow 2^X}$.
(We will draw a contradiction from this assumption.)
The exponential transpose of the function $X\buildrel \sigma \over \twoheadrightarrow 2^X$
is a relation, which we shall also call $\sigma$, $\boxed{X\times X \buildrel \sigma \over \to 2}$.
The hypothesis that (the original function $\sigma$ was a surjection)
is equivalent to (the following assertion about the relation $\sigma$):
For each $X \buildrel f \over \to 2$, there exists an $x_0\in X$ such that
$f = - \mathrel\sigma x_0$ (using $-$ as a placeholder for the missing variable),
i.e., $\boxed{ xf \iff x \mathrel\sigma x_0 }$.
(We call this condition “$\sigma$-representability”.)
Apply this to the function $\Irr_\sigma : X\to 2$
which takes $x\in X$ to $x\, \Irr_\sigma = \rlnot(x \mathrel\sigma x) \in 2$.
By $\sigma$-representability, there exists an $x_0\in X$ such that
$\rlnot (x \mathrel\sigma x) \iff x \mathrel\sigma x_0$.
Take $x=x_0$ to obtain the contradiction.

$\def\P{{\bf P}} \def\twotoP{{2^{\P}}} $

This argument, with a little extra work, show that the real numbers are not countable,
or as it sometimes said, "denumberable."
To get this result we apply the above with $X=\P$, the set of positive integers $\{1,2,3,\ldots\}$.
Applying the above directly we have $\P \lt \twotoP$.

We can embed (produce an injection) of $\twotoP$ in $\R$: $\twotoP \buildrel i \over \rightarrowtail \R$.
The idea is $\twotoP = \{ \text{functions} \; \P \to 2\}$,
so its elements, e.g. $\P \buildrel a \over \to 2$ may be written as
sequences, $a_1, a_2, a_3, \ldots$, indexed by $\{1,2,3,\ldots\}$, of elements of $2$.
Further, we now regard our handy-dandy set $2$ as consisting, not of truth values
but of the integers $0$ and $1$.
Given a sequence $a_1, a_2, a_3, \ldots$ we map it to the real number $$.a_1a_2a_3\ldots = \sum_{n=1}^\infty a_n10^{-n} \in \R ,$$ so we have the injection \[\begin{array}{} \twotoP & \buildrel i \over \rightarrowtail & \R\\ \langle a_0, a_1, a_2, \ldots \rangle & \mapsto & .a_1a_2a_3\ldots = \displaystyle\sum_{n=1}^\infty a_n10^{-n} \end{array}\]

Now, we know there cannot be a surjection of $\P$ onto $\twotoP$
and it would seem intuitive that then there cannot be a surjection from $\P$ onto $\R$,
which is even larger.
However it is worth giving a detailed proof of that,
as it illustrates several ideas which are generally useful and significant.

Suppose there were a surjection $\P \buildrel e \over \twoheadrightarrow \R$.
Then consider the diagram \[\begin{array}{} & & \P \\ & & \twoheaddownarrow \rlap e \\ \twotoP & \buildrel i \over \rightarrowtail & \R \\ \end{array}\] Expand the diagram to: \[\begin{array}{} \twotoP \times_\R \P & \buildrel i' \over \rightarrowtail & \P \\ \llap{e'}\twoheaddownarrow & \raise8pt\hbox{$\lrcorner$}\qquad & \twoheaddownarrow \rlap e \\ \twotoP & \buildrel i \over \rightarrowtail & \R \\ \end{array}\] where $\twotoP \times_\R \P$ is the pullback of $i$ and $e$,
thus $\twotoP \times_\R \P = \{ n\in\P \mid ne \in (\twotoP)i \}$.
(The small $\lrcorner$ is the standard categorial notation indicating that
what is "inside" the corner is a pullback.)

<hr />

We now apply the basic set-theoretical result that, for all sets $X$, $\boxed{ X \buildrel \scriptscriptstyle \text{Cantor} \over \lt \big( 2^X = [X,2] \big) }$ (strictly less than)

to obtain a basic result in the theory of categories:

Theorem (Mac Lane CWM Proposition V.2.3) $\def\bfA{{\mathbf A}}$

If a category $\bfA$ has all limits, no matter how large,

then $\bfA$ is a preorder (i.e., for all $a,b \in \bfA$, there is at most one arrow in $\hom a \bfA b$ ).

Proof:

We argue by contradiction.

Suppose not.

Suppose for some $a, \rightcat b \in \bfA$ there are two distinct arrows $\boxed{ h,k : a \to \rightcat b }$ in $\hom a \bfA {\rightcat b}$, so $2 \cong \{h,k\} \subseteq \hom a \bfA {\rightcat b}$.

Thus for any set $X$, we have

\[ \boxed{ \begin{array} {ccccccccccl} && 2 & \cong & \rightcat{ \{h,k\} } & \subseteq & \rightcat{ \hom {\leftcat a} \bfA b } &&&&&& \text{now apply} [X,-] : \Set\to\Set \\ && [X,2] & \cong & \rightcat{ [X,\{h,k\}] } &\subseteq & \rightcat{ [X, \hom {\leftcat a} \bfA {\rightcat b}] } & \rightadj{ \buildrel * \over \cong } & \leftcat{ \hom a \bfA { \rightadj{ (\prod_{\rightcat{x\in X}} {\rightcat b}) } } } & \subseteq & \text{Arr}(\bfA) & \kern1em & \text{so, taking cardinalities,} \\ |X| & \buildrel \text{Cantor} \over \lt & |[X,2]| & = & \rightcat{ |[X,\{h,k\}]| } &\le & \rightcat{ |[X, \hom {\leftcat a} \bfA {\rightcat b}]| } & \rightadj{ \buildrel * \over \cong } & \leftcat{ | \hom a \bfA { \rightadj{ (\prod_{\rightcat{x\in X}} {\rightcat b}) } } | } & \le & |\text{Arr}(\bfA)| && \text{now take } X = \text{Arr}(\bfA) \\ |{\text{Arr}(\bfA)}| & \buildrel \text{Cantor} \over \lt & |[{\text{Arr}(\bfA)},2]| & = & \rightcat{ |[{\text{Arr}(\bfA)},\{h,k\}]| } &\le & \rightcat{ |[\text{Arr}(\bfA), \hom {\leftcat a} \bfA {\rightcat b}]| } & \rightadj{ \buildrel * \over \cong } & \leftcat{ | \hom a \bfA { \rightadj{ (\prod_{\rightcat{f\in \text{Arr}(\bfA)}} {\rightcat b}) } } | } & \le & |\text{Arr}(\bfA)| \\ \end{array} } \]

Now recall the defining property of a categorical product, for $X$ a set (this is sometimes denoted $X \cotensor {\rightcat b}$, and called the cotensor of $X$ and $\rightcat b$, or power ${\rightcat b}^X$ of $\rightcat b$ by $X$):

$ \hom a \bfA { (\prod_{x\in X} {\rightcat b} ) } \cong [X, \hom a \bfA {\rightcat b} ]$.

Combining that property of the product with the above numerical inequality gives, for any set $X$,

$|X| \lt |\hom a \bfA { (\prod_{x\in X} {\rightcat b}) }|$.

Now let $X = {\text{Arr}(\bfA)} $, the collection ofall arrows in $\bfA$.

The result:

$|\text{Arr}(\bfA)| \lt |\hom a \bfA { (\prod_{f \in \text{Arr}(\bfA)} {\rightcat b}) }|$,

But $|\hom a \bfA { (\prod_{f \in \text{Arr}(\bfA)} {\rightcat b}) }| \le |\text{Arr}(\bfA)|$,

so $|\text{Arr}(\bfA)| \lt |\text{Arr}(\bfA)|$, clearly impossible.

We may sharpen the conclusion as follows:

If $\bfA,a, \rightcat b, h,k,X$ are as above,

then a NECESSARY condition for $\prod_{x\in X} {\rightcat b} = {\rightcat b}^X$ to exist is that $|X| \lt |\text{Arr}(\bfA)|$.

The instance of the diagonal argument which is applicable here runs as follows:

If a function $\sigma : \text{Arr}(\bfA) \to [\text{Arr}(\bfA),2]$ is surjective, then there would have to exist an $f_0\in \text{Arr}(\bfA)$ such that \[ (\forall f \in \text{Arr}(\bfA)) \Big( (f_0 \sigmarel f) \iff \neg (f \sigmarel f) \Big) \, .\] Then, in analogy to the previous cases, instantiating at $f_0$ deliverers a contradiction.

<hr />

Lawvere has generalized this by generalizing the role of the endofunction $\rlnot : 2 \to 2$.

\[ \boxed{ \begin{array} {cccc|cc} 2 & \xrightarrow [\kern 3em] {\textstyle \rlnot} & 2 \\ \Omega & \xrightarrow [\kern 3em] {\textstyle \rlnot} & \Omega &&& (\forall \omega \in \Omega) (\rlnot\omega \not= \omega) \\ Y & \xrightarrow [\kern 3em] {\textstyle t} & Y &&& (\forall y \in Y) (yt \not= y) \\ \end{array} } \]

i.e., each of $\rlnot$ and $t$ lack a fixed point,

i.e., in the parlance of combinatorial mathematics, each is a derangement.

\[ \boxed{ \begin{array} {} && 1 & \xrightarrow{\textstyle \boxed{b} \; (3.)} & U \\ && \big\uparrow && \big\uparrow \\ && 1 \times U & \xrightarrow{ \smash{\textstyle b \times U} } & U \times U \rlap{ \xrightarrow [\smash{\kern20em}] { \smash{\textstyle \boxed{R} \; (2.)} } } &&&& (\bftwo = \Omega = Y) \\ && {\wr\Vert} && \Vert &&&& \Vert \\ && U \rlap{ \xrightarrow[\kern32em]{\textstyle \hom b R - \; (3.) } } &&&&&& (\bftwo = \Omega = Y) \\ 1 & \xrightarrow { \smash{\textstyle \; \boxed{b} \; (4.) \;} } & \Vert & {} \rlap{ \kern-3.4em \text{? predicate $\Irr_R$ below $R$-representable by $b$ (3.) ?} } &&&&& \Vert \\ && \llap{ \boxed{\Irr_R} : {} } U & \xrightarrow[\textstyle \lDelta]{\kern3em} & U \times U & \xrightarrow[\textstyle \boxed{R}]{\kern3em} & (\bftwo = \Omega = Y) & \xrightarrow[\textstyle \rlnot = t] {\kern3em} & (\bftwo = \Omega = Y) \\ && u & \mapsto & \langle u,u \rangle & \mapsto & \hom u R u & \mapsto & \rlnot {\hom u R u} = ({\hom u R u})t \\ \end{array} } \]

Consider $\Irr_R = \lDelta R \rlnot$, the bottom long left-to-right arrow, which starts at $U$.

The question is:

Can (that arrow) be $R$-represented by (an object $b$)?

If so, we have (an arrow $\hom b R -$ as above it),

and (an equality between those two arrows)

But then, precomposing (that pair of parallel arrows) with (the representing object $b$),

we find (a fixed point for $t$),

namely, ($\hom b R b \cong {(\hom b R b )} t$).

This is Theorem 6.5 of Noson S. Yanofsky, Theoretical Computer Science for the Working Category Theorist.

The earlier examples, the barber, Russell, and Cantor "paradoxes", use the contrapositive of this:

If $t: Y \to Y$ does not have a fixed point,

then $\Delta R t$ is not $R$-representable.

Specifically, they take $t: Y \to Y$ to be $\rlnot : 2 \to 2$.

Since $\rlnot \rtop = \lbot$ and $\rlnot \lbot = \rtop$,

$\rlnot$ does not have a fixed point.

<hr>

------------

Here are some useful references:

Yanofsky's excellent 23-page paper:

"A Universal Approach to Self-Referential Paradoxes Incompleteness and Fixed Points",

http://arxiv.org/abs/math/0305282

which is both introductory and comprehensive!

John Baez:

https://golem.ph.utexas.edu/category/2006/12/classical_vs_quantum_computati_8.html

Vishal Lama:

https://topologicalmusings.wordpress.com/2010/07/11/self-referential-paradoxes-incompleteness-and-fixed-points/

<hr>

-----------

What is below is an alternative approach to the paradoxes.

I have a choice on how to present this.

I could present the usual "paradox", which involves negation at two levels.

Or I could initially just present the kernel of the argument, the "positive version", which does not use negation, and replaces

the familiar truth-value set $2 = \{\lbot,\rtop\}$

(with its logical negation endofunction $\rlnot: 2\to 2$)

with an arbitrary set, say $\boxed Y$

(with an arbitrary endofunction on $Y$ called in various sources $\boxed{t:Y\to Y}$ or $\boxed{\alpha:Y\to Y}$ ).

Then, from that positive version, the usual version of the paradoxes can be derived,

by introducing negatives, in three steps, at two levels.

So let us begin with a diagram which illustrates the positive version.

\[ \boxed{ \begin{array} {} && \langle b,b \rangle && \hom b R b \\ && \langle x,x \rangle && \hom x R x \\ && X\times X & \xrightarrow {\textstyle R} & Y \\ && \llap \lDelta \Bigg\uparrow & \Big |\Big | \rlap{ \scriptstyle \kern-3.7em \text{representation} } & \Bigg\downarrow \rlap{t \text{ or } \alpha} \\ 1 & \xrightarrow [\textstyle x,b] {\kern3em} & X & \xrightarrow [\textstyle \hom b R -] {\kern3em} & Y \\ &&&&& \boxed{ \begin{array} {ccl} {\hom x R x}\alpha & \rightadj{ {\hom b R b}\alpha } & \text{top leg} \\ || & \rightadj{ || } \\ \hom b R x & \rightadj{ \hom b R b } & \text{bottom leg} \\ \text{representation at $x$} & \rightadj{ \text{representation at $b$} } \\ \end{array} } \\ \end{array} } \]

Thus we see that

IF $\hom b R -$ equals $\lDelta R \alpha$ (i.e., $b$ $R$-represents $\lDelta R \alpha$)

THEN $\rightadj{ \hom b R b}$ is a $\rightadj{ \text{fixed point} }$ for $\alpha$.

1. Now consider the case $ \alpha = \rlnot : 2\to2$. (This is internal to the category $\Set$.)

2. $\rlnot$ has NO fixed points. (This statement, and the next, are ABOUT the category $\Set$, not internal to it.)

3. Thus $\lDelta R \rlnot$ CANNOT be $R$-represented.

One now obtains the various diagonal "paradoxes",

e.g. the barber and Russell paradoxes,

by suitable selection of the set $X$ (or $U$), and the relation $R$ (or $S$) on it.

<hr />

Thursday, July 27, 2023

A categorical setting for elementary probability theory

(Early draft; I will be expanding this in the future.)

The basic idea is this:

Given a finite set $X$ and a subset $A\subseteq X$,

we may regard the ratio of the size of $A$ to the size of $X$

as being the probability

that a randomly chosen element of $X$ will be in $A$.

Thus given $A$ and its superset $X$,

we have not only

the inclusion relation between $A$ and $X$, which we may regard as being an arrow in the usual category $\Set$ of sets and functions,

but also

the positive rational number $\boxed{ |A| \over |X| }$, a number often called by probabilists the probability of $A$

(when $X$ is assumed known).

Now, the positive rational numbers in fact form the objects of a discrete closed monoidal category (under multiplication) (details below).

Thus the above allows us to view the collection of sets as being the objects of an enriched category, enriched in the discrete closed monoidal category of positive rational numbers.

<hr />

That covers the notion of absolute probability.

Let us now consider the notion of relative or conditional probability.

Suppose $X$ contains two sets, $A$ and $B$.

We may ask:

If $A$ is true, what is the probability that $B$ is true,

and visa versa.

We may also ask:

What is the relation between those two probabilities (the equation that gives that is called "Bayes Theorem").

Let us depict this situation with a diagram;

\[ \boxed{ \begin{array} {} & A\cap B & \xrightarrow{ \textstyle{ A\cap B \over B } } & B & \\ & \llap{ A\cap B \over A } \Bigg\downarrow & \llap{ \scriptstyle \text{Bayes} } \nearrow \rlap{ {A \over B} } & \Bigg\downarrow \rlap{ {B \over X} } & \\ & A & \xrightarrow[ \textstyle{ A \over X } ]{} & X \\ \end{array} } \]

In the diagram,

at the vertices, the symbols $X, A, B, A\cap B$ denote sets;

when labelling arrows, they denote positive integers, viz the number of elements in the set, often denoted e.g. $|A|$.

Thus the ratios shown are ratios of positive integers, thus rational numbers.

Thus in the diagram

the bottom and right arrows give the absolute probabilities of $A$ and $B$ respectively,

The left arrow gives the relative or conditional probability of $B$ relative to $A$;

the top arrow gives that of $A$ relative to $B$.

The commutativity of each triangle follows from elementary algebra.

The commutativity of the upper left triangle, when interpreted in terms of probabilities, is Bayes Theorem:

\[ P(A|B) = P(B|A){P(A) \over P(B)} \]

Another example:

A bipartite partition $A,B$ of $X$ together with a. third sunset $W$.

Note that we have changed notation so that $A,B$ form the two parts of the bipartite partition.

We will diagram this situation as:

\[ \boxed{ \begin{array} {} {} \rlap{ \kern-3em {A\cap W \over A} {A\over X} + {B\cap W \over B} {B\over X} \xlongequal{\text{cancel}} {A\cap W \over X} + {B\cap W \over X} \xlongequal{\text{distribute}} {(A\cup B)\cap W \over X } \xlongequal{A\cup B = X} {W\over X} } \\ \\ \hline \\ \kern5em & A\cap W & \xrightarrow{ \textstyle{A\cap W \over W} } & W & \xleftarrow{ \textstyle{B\cap W \over W} } & B\cap W & \kern5em \\ & \llap{ A\cap W \over A } \Bigg\downarrow && \Bigg\downarrow \rlap{ W \over X } && \Bigg\downarrow \rlap{ B\cap W \over B } & \\ & A & \xrightarrow[ \textstyle{A \over X} ] {} & X & \xleftarrow[ \textstyle{B \over X} ] {} & B & \\ \end{array} } \]

In several of the examples in the chapter on conditional (or relative) probability in <I<Fat Chance,

we are given:

0. The numerical value for the bottom two arrows and the two side arrows.

From those values we may compute (setting $|X| = 1$):

1. The numerical values of the upper two corners $A\cap W, B\cap W$.

2. Using the fact that those last two form a partition of $W$ (the pullback of a partition is a partition), we add those last two values to get $|W|$.

3. Now that this is known, we may compute any and all of the three remaining conditional probabilities given by the three remaining arrows.

Such diagrams are useful in displaying numerical information associated with probability situations.

For example, consider the "Monty Hall" situation described in Section 9.1 of Fat Chance.

(And whose precise assumptions are emphasized here:

https://en.wikipedia.org/wiki/Monty_Hall_problem#Standard_assumptions )

The various expressions mentioned in the book are shown in this diagram:

\[ \boxed{ \begin{array} {} \kern8em & A\land W & \xrightarrow{ } & W & \xleftarrow{ } & B\land W & \kern9em \\ & \llap{ P(W|A) = {k-1 \over n-2} } \Bigg\downarrow && && \Bigg\downarrow \rlap{ P(W|B) = {k \over n-2} } & \\ & \llap{ \text{your guess is right} = {} } A & \xrightarrow[ \textstyle{ P(A) = {k \over n} } ] {} & X & \xleftarrow[ \textstyle{ P(B) = {n-k \over n} } ] {} & B \rlap{ {} = \text{ your guess is wrong} } & \\ &&& {} \rlap{\kern-10em X = \text{set of doors;} \; |X| = n = \text{number of doors;} \; k = \text{number of cars} } \\ \end{array} } \]

To get the original, basic, Monty Hall situation, take $n=3$, $k=1$.

Saturday, July 8, 2023

Bayes' Theorem -- relations between ratios

This is a preliminary draft.

$\def\bA{{\blue A}} \def\gB{{\green B}}$

First, we give a diagrammatic version of the Bayes situation,

using the abstract letters $\bA$ and $\gB$ for two subsets of the larger, but finite, set $X$:

\[ \boxed{ \begin{array} {} && \bA \rlap{ \blue{ \xrightarrow [\kern9em] {\textstyle {A \over \black X}} } } &&&& X \\ & \blue{ \llap{ A \cap \gB \over A } \nearrow } \rlap{ \scriptstyle \kern.8em \red{\text{Bayes}} } && \searrow \rlap{\bA \over \gB} && \green{ \nearrow \rlap{ B \over \black X } } \\ \rightadj{ \boxed{\bA \mathrel{\rightadj \cap} \gB} } \rlap{ \green{ \xrightarrow [\textstyle {\bA \mathrel{\rightadj \cap} B \over B} ] {\kern9em} } } &&&& \gB \\ \end{array} } \]

Here the expressions $\bA, \gB, \bA \mathrel{\rightadj \cap} \gB, X$,

when they are vertices of the diagram, represent sets,

and the arrows between them represent set-theoretic inclusion.

On the other hand,

when those expressions are part of the labels of the arrows of the diagram,

they represent the natural numbers which are the number of elements in the corresponding set.

Thus the quotients shown are either absolute or relative (i.e., conditional) probabilities:

the two shown arrows going into $X$ are labeled with absolute probabilities,

while the two shown arrows going out of $\bA \mathrel{\rightadj \cap} \gB$ are labeled with relative or conditional probabilities.

Now we present Bayes' theorem as it is traditionally stated, using only numbers and elementary algebra.

In its simplest, clearest, form, it is the top equation, Bayes-0, (which is trivially true) in the box below.

{For the time being, ignore the lower equation;

it differs from the top equation only in notation and a trivial equality.)

\[ \boxed{ \begin{array} {ccccc|l} {H \cap D \over D} & \xlongequal{\text{Bayes-0}} & {H \cap D \over H} & \times & \red{ {H \over D} } & \text{ numbers} \\ \\ \hline \\ P(H|D) & \xlongequal{\text{Bayes}} & P(D|H) & \times & \red{ P(H) \over P(D) } & \text{ (sub)sets or properties} \\ \end{array} } \]

That's it! That bit of very elementary algebra

(which we might call "the switch of denominators, i.e. contexts", relative to ${H\cap D}$)

in the top line, Bayes-0,

is the essence of Bayes' Theorem.

What remains is to explain how that relates to the setup for, and traditional statement of, Bayes' Theorem.

I.e., what it all means.

Here is the setup for the situation.

For concreteness, we will consider a special case of the general situation which is easily understood.

Suppose we have a population, a specific group of Americans.

We poll them and ask two questions of them: are you a Democrat (a member of the Democratic Party), and are you a homosexual?

We will call the set of people self-describing as Democrats set $\boxed D$, and similarly we let set $\boxed H$ consist of those who self-describe as homosexuals.

We also, for simplicity, let the symbols $D$ and $H$ denote the number of individuals in each set.

The context should make clear whether the symbol denotes a set or a number.

Next we have the intersection $H \cap D$ of sets $D$ and $H$, consisting of those individuals who described themselves as both Democrats and homosexual.

We denote this set by $\boxed{ H\cap D }$. So $H \cap D$, is the set of homosexual Democrats.

Again, $H\cap D$ will also denote the number of individuals in that set.

With that setup and those definitions out of the way, we can now pose some questions.

1. What fraction of the Democrats are homosexual? The answer is easy, it is the fraction $H\cap D \over D$.

2. What fraction of the homosexuals are Democrats? The answer is again easy, it is the fraction $H\cap D\over H$.

3. What relation, if any, is there between those two fractions?

The answer is given in the boxed equation Bayes-0 above (whose proof is a triviality), which is called "Bayes' Theorem".

4. What is the use of this equation? Well, if you know any two of those three ratios, it shows the third is determined, and gives you an easy way to calculate it.

Let us illustrate this with a hypothetical numerical example.

Suppose 80% of homosexuals were Democrats, so ${H\cap D \over H} = .8$, and

the ratio of Democrats to ALL homosexuals (20% of whom are not Democrats) was 10 to 1, so $\red{ {H\over D} = .1 }$.

Then, instantiating the equation Bayes-0 with these two assumptions, we get

\[ \boxed{ \begin{array} {} {H\cap D \over D} & \xlongequal{\text{Bayes-0}} & {H\cap D \over H} & \times & \red{ {H \over D} } \\ \\ \hline \\ {H\cap D \over D} & \xlongequal{\text{Bayes-0}} & .8 & \times & \red{.1} & = & .08 \end{array} } \]

i.e. our two assumptions imply 8% of Democrats are homosexual.

For a diagrammatic presentation (with some possible numbers for population sizes), see

\[ \boxed{ \begin{array} {} H = 10 & \xleftarrow{ \textstyle {H\cap D \over H} = .8 } & H\cap D = 8 \\ \downarrow & \red{ \searrow \rlap{ {H\over D} = .1 } } & \downarrow \rlap{ {H\cap D \over D} = .08 } \\ X & \xleftarrow{} & D = 100 \\ \end{array} } \]

You can visualize this geometrically.

Imagine a strip divided into three regions.

The first is two units long.

The second is eight units long.

The last is 92 units long.

Let $H$ be the union of the first two parts,

$D$ the union of the last two parts.

Then $H\cap D$ is the second part.

\[ \color{lightpink} { H=10 \atop \Rule{10mm}{5mm}{0mm} } {} \rlap{ \kern-6mm 8 } {} \rlap{ \color{blue} { \kern-10mm \lower5ex { \Rule{100mm}{5mm}{0mm} \atop D=100 } } } \]

(This is a form of Venn diagram.)

So under our two assumptions we have

80% of homosexuals are Democrats, while

8% of Democrats are homosexuals.

Further, these two statistics IMPLY the ratio of Democrats to homosexuals is 10 to 1,

since any two of the ratios in the Bayes equation determine the third.

The fact that we want to stress is:

If 80% of homosexuals are Democrats,

that DOES NOT imply that

80% of Democrats are homosexuals.

I.e., you can't invert conditional probabilities.

In words, the fact that $H\cap D$ is quite large relative to $H$ (80% in our hypothetical example) says nothing about how large $H\cap D$ is relative to $D$.

That depends entirely on the ratio of $D$ to $H$.

Precisely, Bayes Theorem says:

\[ \boxed{ \begin{array} {} { {H\cap D}/H \over {H\cap D}/D } & \xlongequal {\text{Bayes-0}} & \red{ {D \over H} } \\ \\ \hline \\ {80\% \over 8\%} & \xlongequal {\text{Bayes}} & \red{ {10 \over 1} } \\ \end{array} } \; . \]

There is a folk saying related to this phenomenon:

"A big fish in a small pond versus

a small fish in a large pond."

Here clearly $H\cap D$ (the homosexual Democrats) is (are) the "fish",

while the "small pond" and "large pond" are respectively the homosexuals and the Democrats.

Bayes theorem states the relation between the various ratios involved,

the fish to each pond, and between the ponds.

To put it in common sense terms:

If you know the ratio of the pond sizes,

and the fraction of the small pond the fish takes up,

then you can easily calculate the fraction of the large pond it would take up.

E.g.,

if the fish is half of the small pond,

and the large pond is three times the size of the small pond,

then the fish would be $1/6$ of the large pond.

For another, perhaps hypothetical, example,

if 90% of men with prostate cancer have high levels of PSA,

that DOES NOT imply that

90% of men with high levels of PSA will get prostate cancer.

There is a geometrical way of viewing this situation.

Imagine two rectangles, one with its long side horizontal, labeled $H$,

and one with its long side vertical, labeled $V$.

Suppose they have some overlap, i.e., a non-empty intersection, labeled $H\cap V$.

Again, we let the same label stand for both

the designated geometric region (a set of points) and

its area (a number).

Considering the three numbers, i.e. the areas,

as before we have the three ratios and the simple relation between those ratios

\[ \boxed{ \begin{array} {} H\cap V \over V & \xlongequal{\text{Bayes}} & H\cap V \over H & \times & \red{ {H \over V} } \\ \\ \hline \\ P(H|V) & \xlongequal{\text{Bayes}} & P(V|H) & \times & \red{ P(H) \over P(V) } \\ \end{array} } \]

relating the ratios of their areas.

In fact, this same equation applies to any two measurable regions with a measurable intersection, but it is easier to visualize for rectangles.

For the general situation, 3Blue1Brown has a good elementary discussion:

https://youtube.com/watch?v=HZGCoVF3YvM

Here we instantiate our original presentation of Bayes' Theorem into an instance relevant to that video.

Here $L$ = the set of librarians (or the number of such), and $B$ = the set of book-lovers (or the number of such).

\[ \boxed{ \begin{array} {ccccc|l} {B \cap L \over L} & \xlongequal{\text{Bayes-0}} & {B \cap L \over B} & \times & \red{ {B \over L} } & \text{ numbers} \\ \\ \hline \\ P(B|L) & \xlongequal{\text{Bayes}} & P(L|B) & \times & \red{ P(B) \over P(L} ) & \text{ (sub)sets or properties} \\ \hline \text{high} && \text{low} & \times & \red{\text{large}} & \text{ qualitative description} \\ \hline .8 & = & .016 & \times & \red{50} & \text{ made-up, but plausible, numbers} \end{array} } \]

Here we repeat our original description of Bayes' Theorem, but using the notation of that video, and introduce the terminology it uses for various parts of the equation.

Here $H$ = hypothesis, $E$ = evidence.

There is also a five minute video:

https://youtu.be/XQoLVl31ZfQ

-----

Diagrams in preliminary states:

\[ \boxed{ \begin{array} {} \kern2em & {} \rlap{ \xrightarrow [\kern8em] {L\cap B = 8} } && \kern1em && {} \rlap{ \xrightarrow [\kern8em] {L=10} } \\ && \llap{L = 10} \searrow && \llap{ {L \cap B \over L} \kern-.5em } \nearrow & \kern1em {} \rlap{ \lower3ex{ \scriptstyle \kern-2em \text{Bayes-0} } } & \llap{ {L \over B} \kern-.5em } \searrow && \nearrow \rlap{ B = 500 } & \kern4em \\ &&& {} \rlap{ \xrightarrow [ .016 = {8 \over 500} = {L \cap B \over B } = P(L|B) ] {\kern8em} } &&&& \\ \end{array} } \]

\[ \boxed{ \begin{array} {} && \green{B \cap L = 4} \\ & \green{ \llap{ \boxed{{L\over B\cap L } = {10\over 4} = 2.5 } } \swarrow } && \red{ \nwarrow \rlap { {B\cap L \over B\cap F} = {4\over 20} } } \\ \green{ \boxed{L = 10} } & & \green\downarrow && \\ && & && \\ \green\downarrow & \red{ \nwarrow \rlap{ \scriptstyle {L\over F} = {10\over 200} } } & B & \xleftarrow {} & \green{B \cap F = 20} \\ & \swarrow && \swarrow \rlap{ \boxed{{F\over B\cap F} = {200\over 20} = 10} } \\ X & \xleftarrow{} & \boxed{F = 200} \\ \end{array} } \]

An equation:

\[ \boxed{ \begin{array} {} \red{4\over 20} & = & \green{4\over 10} & \times & \red{10\over 200} & \times & {200\over 20} \\ \hline \red{B\cap L \over B\cap F} & = & \green{B\cap L \over L} & \times & \red{L \over F} & \times & {F \over B\cap F} \\ \hline & = & {{B\cap L} / L} \over {{B\cap F} / F} & \times & \red{L \over F} \\ \hline & = & {4/10} \over {20/200} & \times & \red{10 \over 200} \\ \hline & = & 4 & \times & \red{1 \over 20} \\ \end{array} } \]

Two bipartite partitions of a set, say $X$, yields this diagram:

The general case, say $X=A+B$ and $X=W+L$:

\[ \boxed{ \begin{array} {} \kern1em & A\cap W & \xrightarrow{ \textstyle {A\cap W \over W} } & W & \xleftarrow{ \textstyle {B\cap W \over W} } & B\cap W & \kern1em \\ & \llap{ A\cap W \over A } \Bigg\downarrow & \raise1ex{ A\cap W \over X } & \Bigg\downarrow \rlap{ W\over X } & \raise1ex{ B\cap W \over X } & \Bigg\downarrow \rlap{ B\cap W \over B } & \\ & A & \xrightarrow { \raise0ex{ \smash{ \textstyle{ A\over X } } } } & X & \xleftarrow { \raise0ex { \smash{ \textstyle{B\over X} } } } & B & \\ & \llap{ {A\cap L \over A} } \Bigg\uparrow & A\cap L \over X & \Bigg\uparrow \rlap{ L\over X } & B\cap L \over X & \Bigg\uparrow \rlap{ B\cap L \over B } & \\ & A\cap L & \xrightarrow [ \textstyle{A\cap L \over L} ] {} & L & \xleftarrow [ \textstyle{B\cap L \over L} ] {} & B\cap L & \\ \end{array} } \]

For the special case in Section 9.4 of Fat Chance, A=Left, B=Right and W=Tracy, L=Paul,

we get this diagram:

\[ \boxed{ \begin{array} {} \kern4em & T\cap L = 15 & \xrightarrow{} & T = 47 & \xleftarrow{} & T\cap R = 32 & \kern4em \\ & \llap{ \boxed{ {T\cap L \over L} = {3\over 4} } } \Bigg\downarrow && \Bigg\downarrow && \Bigg\downarrow \rlap{ \boxed{ {T\cap R \over R} = {2\over 5} } } & \\ & L = 20 & \xrightarrow { \raise2ex{ \smash{ \boxed{ {L\over X} = {1\over 5} = .2 } } } } & X = 100 & \xleftarrow { \raise2ex { \smash{ \boxed{ {R\over X} = {4\over 5} = .8 } } } } & R = 80 & \\ & \llap{ \boxed{ {P\cap L \over L} = {1\over 4} } } \Bigg\uparrow && \Bigg\uparrow && \Bigg\uparrow \rlap{ \boxed{ {P\cap R \over R} = {3\over 5} } } & \\ & P\cap L = 5 & \xrightarrow{} & P = 53 & \xleftarrow{} & P\cap R = 48 & \\ \end{array} } \]

--------

The following pertains to the video https://youtu.be/R13BD8qKeTg .

The situation is this:

There is a certain disease $\boxed H$, and a certain test $\boxed E$ for that disease.

(See below for the letters.)

You have been tested for the disease,

and tested positive.

But that doesn't necessarily mean you have it - the test is not a perfect indicator.

Some people without the disease test positive (false positives). $\lnot H \land E = E-H$.

And some people with the disease will not be detected by the test (false negatives). $H \land \lnot E = H-E$.

If the test were a perfect indicator we would have $H=E$ and both the above differences would be zero (or empty).

To analyze this situation,

let $\boxed H$ be the hypothesis, that you have the disease. Also the number representing the probability that you have it, that is, the fraction of the general population that has it.

Let $\boxed E$ be the evidence, that you tested positive under the test.

Also the number representing the probability that a general member of the population will test positive in that particular test.

So clearly what we are interested in is $\boxed{P(H|E) = {H\land E \over E}}$, that is, if you tested positive in that particular test (the E), what is the probability that you actually have the disease (the H)?

In the video, the narrator gives three items of numerical information (numbers), using words to describe what those numbers mean:

$\boxed{P(H) = .001 = {1 \over 1000}}$ = fraction of people with the disease.

$\boxed{P(E|H) = {H\land E \over H} = .99 = {99\over 100}}$ = fraction of those with the disease who test positive = the rate of valid positives.

$\boxed{P(E|\lnot H) = {\lnot H \land E \over \lnot H} = .01 = {1\over 100}}$ = fraction of those without the disease who test positive = the rate of false positives.

That is all he tells you.

Those are the knowns.

Note that the last two tell you what the testing probabilities will be IF you know whether you have the disease or not.

I.e. they are backward from what we want : to go from test result to disease probability.

From that we infer

$P(\lnot H) = 1-P(H) = 1-.001 = .999$ (the fraction without the disease).

Now let's see how we can use that information to solve the problem,

i.e. calculate P(H|E).

First we must calculate P(E):

\[ \begin{array} {} P(E) & = && P\big( E\land {(H\lor \lnot H)} \big) \\ & = && P\big( {(E\land H)} \lor {(E\land \lnot H)} \big) \\ & = & P(E\land H) & + & P(E\land{\lnot H}) \\ & = & \boxed{P(E|H)}\,\boxed{P(H)} & + & \boxed{P(E|\lnot H}\,P(\lnot H) \\ & = & \boxed{.99} \times \boxed{.001} & + & \boxed{.01} \times .999 \\ & = & .00099 & + & .00999 \\ && \text{valid positives} && \text{false positives} \\ & = && .01098 \\ & \sim && 1.1\% \\ &&& \text{all positives} \\ \end{array} \]

With that calculation out of the way, now we may apply the simple, basic Bayes' Theorem to get the answer:

First a little recall:

Again, $H$ = hypothesis = you have the disease,

$E$ = evidence = your test returned positive.

\[ \boxed{ \begin{array} {ccccc|l} H\cap E \over E & \xlongequal{\text{Bayes-0}} & H\cap E \over H & \times & \red{ H \over E } & \text{ numbers} \\ \\ \hline \\ P(H|E) & \xlongequal{\text{Bayes}} & P(E|H) & \times & \red{ P(H) \over P(E) } & \text{ (sub)sets or properties} \\ \hline \text{posterior} & \xlongequal{\text{Bayes}} & \text{likelihood} & \times & \red{ \text{prior = hypothesis} \over {} \text{evidence} } & \text{ descriptive words} \\ \\ \hline \\ P(H|E) & \xlongequal{\text{Bayes}} & .99 & \times & \red{ .001 \over .01098 } & \\ & = & .99 & \times & \red{ 1 \over 10.98 } \\ & \sim & .99 & \times & \red{ 1 \over 11 } \\ & = & .09 \\ & = & 9\% \\ \end{array} } \]

And that is the answer the narrator gives, but we went through the details,

and found the various intermediate results, which have their own interest.

-----------------

Let us put some of the above information in a diagram, where the information provided by the narrator is given in boxes:

\[ \begin{array} {} && \text{valid positives} && \leftadj{ \text{all positives} } && \text{false positives} \\ && H\land E \atop P(H\land E) = {99\over100,000} \sim {1\over1,000} & \xrightarrow [{99\over\leftadj{1,098}} \sim {1\over\leftadj{11}}] { H\land E \over \leftadj E} & \leftadj{ E \atop P(E) = {(\blue{99}+\blue{999})=1,098\over100,000} \sim {11\over 1,000} } & \xleftarrow [{999\over\leftadj{1,098}} \sim {10\over\leftadj{11}}] { \lnot H\land E \over \leftadj E} & \lnot H \land E \atop P(\lnot H \land E) = {999\over100,000} \sim {1\over100} = {10\over1,000} \\ && \llap{\text{rate of valid positives} \; \boxed{ {H\land E \over H} = {99\over100} } } \Bigg\downarrow &&&& \Bigg\downarrow \rlap{ \boxed{ {\lnot H \land E \over \lnot H} = {1\over100} } \; \text{rate of false positives} } \\ && \boxed{ H \atop P(H) = .001 = {1\over1000} = {100\over100,000} } &&&& \lnot H \atop P(\lnot H) = .999 = {999\over1000} = {99,900\over100,000} \\ && \text{have the disease} &&&& \text{don't have the disease} \\ \end{array} \]

Here the input parameters are varied to consider another case

(where the test is much more successful),

where the test is 100% successful on those who have the disease,

but for those who don't have the disease, fails only $1\over1K$ of the time:

\[ \begin{array} {} && \text{valid positives} && \leftadj{ \text{all positives} } && \text{false positives} \\ && H\land E \atop P(H\land E) = {1\over 1K} = {1K\over 1M} & \xrightarrow [{1K\over \leftadj{1,999}} \sim {1\over{\leftadj2}}] {H\land E \over \leftadj E} & \leftadj{ E \atop { P(E) = {(\blue{1K}+\blue{999} = 1,999)\over1M} \sim {2K\over1M} = {2\over1K} } } & \xleftarrow { \lnot H\land E \over \leftadj E} & \lnot H \land E \atop P(\lnot H \land E) = {999\over1M} \sim {1K\over 1M} \\ && \llap{ \text{rate of valid positives} \; \boxed{ {H\land E \over H} = 1 } } \Bigg\downarrow &&&& \Bigg\downarrow \rlap{ \boxed{ {\lnot H \land E \over \lnot H} = {1\over1K} = .001 } \; \text{rate of false positives} } \\ && \boxed{ H \atop P(H) = .001 = {1\over 1K} } &&&& \lnot H \atop P(\lnot H) = .999 = {999\over 1K} \sim 1 \\ && \text{have the disease} &&&& \text{don't have the disease} \\ \end{array} \]

Let us now tackle the issues of independence and correlation, whether positive or negative.

Here are several equivalent formulations of positive correlation, showing the symmetry of the formulations.

For negative correlation or independence,

merely replace $\lt$, shown here in vertical mode as $\vee$,

by $\gt$ or $=$.

\[ \boxed{ \begin{array} {ccccc|ccccccccc|ccccc} {H\cap D = HD \over H} & = & {8\over10} & = & {4\over5} & \bA \cap \gB \over \bA && {\bA \cap \gB \over \bA} \times {\bA \over X} & = & \bA \cap \gB \over X & = & {\bA \cap \gB \over \gB} \times {\gB \over X} && \bA \cap \gB \over \gB & {H\cap D = HD \over D} & = & {8\over100} & = & {2\over25} \\ \vee & \iff & \vee & \iff & \vee & \vee & \iff & \vee & \iff & \vee & \iff & \vee & \iff & \vee & \vee & \iff & \vee & \iff & \vee \\ {D\over X} & = & {100\over200} & = & {1\over 2} & \gB \over X && {\bA \over X} \times {\gB \over X} & = & {\bA \over X} \times {\gB \over X} & = & {\bA \over X} \times {\gB \over X} && \bA \over X & {H\over X} & = & 10\over200 & = & 1\over20 \\ \hline &&&&& {(\bA\cap\gB)/\bA} \over \gB/X &&&& (\bA\cap\gB)/X \over (\bA/X) \times (\gB/X) &&&& {(\bA\cap\gB)/\gB} \over \bA/X &&&&& 2/25 \over 1/20 \\ &&&& 8\over5 &&&&& (\bA\cap\gB) \times X \over \bA \times \gB &&&&&&&&& {40\over25} = {8\over5} \\ \end{array} } \]

In the above box, note that

the leftmost general inequality means a positive correlation between $B$ and $A$, while

the rightmost general inequality means a positive correlation between $A$ and $B$.