Joint distribution function
The joint distribution function extends the concept of the CDF to two variables simultaneously. It gives the probability that both \(X\) and \(Y\) fall below specified thresholds at the same time, and is the foundation for computing any probability involving a pair of random variables.
Definition
The joint distribution function (joint CDF) of two random variables \(X\) and \(Y\) is:
\[ F_{X,Y}(x, y) = P(X \leq x,\ Y \leq y) \]
It gives the probability that \(X\) takes a value at most \(x\) and \(Y\) takes a value at most \(y\), simultaneously.
Properties
The joint CDF always satisfies:
- Non-decreasing: \(F_{X,Y}(x,y)\) is non-decreasing in both \(x\) and \(y\) separately.
- Boundary limits:
- \(\lim_{x \to -\infty} F_{X,Y}(x,y) = 0\) for any fixed \(y\).
- \(\lim_{y \to -\infty} F_{X,Y}(x,y) = 0\) for any fixed \(x\).
- \(\lim_{x \to \infty,\, y \to \infty} F_{X,Y}(x,y) = 1\).
- Right-continuous in both arguments.
- Marginal recovery: letting one argument go to \(+\infty\) gives the marginal CDF of the other variable:
\[F_X(x) = \lim_{y \to \infty} F_{X,Y}(x,y), \qquad F_Y(y) = \lim_{x \to \infty} F_{X,Y}(x,y)\]
⚠️ The joint CDF gives corner probabilities, not rectangle probabilities
(F_{X,Y}(x,y)) is the probability of the lower-left quadrant ((-\infty, x] \times (-\infty, y]). To get the probability of a rectangle (P(a < X \leq b,\ c < Y \leq d)), you need the inclusion-exclusion formula:
\[P(a < X \leq b,\ c < Y \leq d) = F_{X,Y}(b,d) - F_{X,Y}(a,d) - F_{X,Y}(b,c) + F_{X,Y}(a,c)\]
This is the two-dimensional analogue of \(P(a < X \leq b) = F(b) - F(a)\).
Discrete case
For discrete variables, the joint CDF is obtained by summing the joint PMF over all pairs \((x', y')\) with \(x' \leq x\) and \(y' \leq y\):
\[F_{X,Y}(x,y) = \sum_{x' \leq x} \sum_{y' \leq y} P(X = x',\ Y = y')\]
The joint PMF can be recovered from the joint CDF using the two-dimensional difference formula:
\[p_{X,Y}(x,y) = F_{X,Y}(x,y) - F_{X,Y}(x-1,y) - F_{X,Y}(x,y-1) + F_{X,Y}(x-1,y-1)\]
Using the joint PMF from the previous section:
| \(Y=1\) | \(Y=2\) | \(Y=3\) | |
|---|---|---|---|
| \(X=0\) | 0.15 | 0.10 | 0.05 |
| \(X=1\) | 0.05 | 0.20 | 0.15 |
| \(X=2\) | 0.02 | 0.08 | 0.20 |
Computing \(F_{X,Y}(1, 2) = P(X \leq 1,\ Y \leq 2)\):
\[F_{X,Y}(1,2) = p(0,1) + p(0,2) + p(1,1) + p(1,2) = 0.15 + 0.10 + 0.05 + 0.20 = 0.50\]
Computing the full joint CDF table:
| \(y=1\) | \(y=2\) | \(y=3\) | |
|---|---|---|---|
| \(x=0\) | \(0.15\) | \(0.25\) | \(0.30\) |
| \(x=1\) | \(0.25\) | \(0.55\) | \(0.70\) |
| \(x=2\) | \(0.27\) | \(0.65\) | \(1.00\) |
Each cell accumulates all the joint probabilities in the top-left rectangle up to that point.
Figure 1: Joint CDF as a heatmap: values accumulate from the top-left corner, reaching 1 at the bottom-right
Continuous case
For continuous variables, the joint CDF is the double integral of the joint PDF:
\[F_{X,Y}(x,y) = \int_{-\infty}^{x} \int_{-\infty}^{y} f_{X,Y}(x', y')\, dy'\, dx'\]
The joint PDF is recovered by differentiating:
\[f_{X,Y}(x,y) = \frac{\partial^2 F_{X,Y}(x,y)}{\partial x\, \partial y}\]
Figure 2: Joint CDF of two independent standard normal variables: the surface rises from 0 at the bottom-left corner to 1 at the top-right
Let (X) and (Y) be independent standard normal variables. What is (P(-1 \leq X \leq 1,\ 0 \leq Y \leq 1))?
Using the inclusion-exclusion formula and the fact that \(F_{X,Y}(x,y) = F_X(x) \cdot F_Y(y)\) for independent variables:
\[P(-1 \leq X \leq 1,\ 0 \leq Y \leq 1) = [F_X(1) - F_X(-1)] \times [F_Y(1) - F_Y(0)]\]
\[= [0.841 - 0.159] \times [0.841 - 0.500] = 0.683 \times 0.341 \approx 0.233\]
About 23% of observations fall in that rectangle.
Independence via the joint CDF
\(X\) and \(Y\) are independent if and only if their joint CDF factorizes:
\[F_{X,Y}(x,y) = F_X(x) \cdot F_Y(y) \quad \text{for all } x, y\]
This is equivalent to the factorization of the joint PMF or joint PDF, but expressed in terms of CDFs. The continuous example above uses this directly: since \(X\) and \(Y\) are independent normals, \(F_{X,Y}(x,y) = \Phi(x)\cdot\Phi(y)\).