What Happens if You Add Randomness?

What happens if you add two independent random variables together? If XX is the result of one 6-sided die and YY is another, what is the distribution of Z=X+YZ = X + Y?

In the real world, this is how we model total noise from multiple sources, the combined weight of items in a box, or the total return of a multi-asset portfolio.

Intuition
Convolution: The Sliding Window

To find P(Z=z)P(Z=z), we need to consider every possible combination of xx and yy that sums to zz. For example, to get Z=7Z=7 with two dice, we sum the probabilities of (1,6),(2,5),(3,4),(4,3),(5,2),(1,6), (2,5), (3,4), (4,3), (5,2), and (6,1)(6,1). For discrete variables, this operation is called a Convolution.

P(Z=z)=xP(X=x)P(Y=zx)P(Z = z) = \sum_{x} P(X = x) P(Y = z - x)

For continuous variables, the sum becomes an integral:

fZ(z)=fX(x)fY(zx)dxf_Z(z) = \int_{-\infty}^{\infty} f_X(x) f_Y(z-x) dx

Convolution: Sum of Two D6 Dice

0000020.02730.05540.08350.11160.13870.16680.13890.111100.083110.055120.027
Transformations and the Jacobian

When you transform a random variable (e.g., let Y=X2Y = X^2 or Y=eXY = e^X), the "density" of the probability changes. For multi-dimensional transformations, like changing from Cartesian coordinates (X,Y)(X,Y) to Polar coordinates (R,Θ)(R, \Theta), the "area" of your probability density can be stretched or compressed.

EExample
A Physical Analogy

Imagine drawing a grid on a sheet of rubber. When you stretch the rubber, the density of the grid lines changes. The Jacobian determinant (J|\mathbf{J}|) is the mathematical factor that tells you exactly how much the density "squished" or "expanded" at each point during the transformation.

fY1,Y2(y1,y2)=fX1,X2(x1(y1,y2),x2(y1,y2))Jf_{Y_1, Y_2}(y_1, y_2) = f_{X_1, X_2}(x_1(y_1,y_2), x_2(y_1,y_2)) \cdot |\mathbf{J}|
Moment Generating Functions: The 'Fingerprint'

Finding the mean (E[X]E[X]), variance (Var(X)Var(X)), and higher-order moments (like skewness) can be algebraically intense. Moment Generating Functions (MGFs) provide a mathematical "fingerprint" that makes these calculations trivial.

MX(t)=E[etX]M_X(t) = E[e^{tX}]

Why is it called 'Generating'?

By taking derivatives of the MGF with respect to tt and then evaluating them at t=0t=0, you literally "generate" the moments of the distribution.

1.Take the first derivative of MX(t)M_X(t): ddtE[etX]=E[ddtetX]=E[XetX]\frac{d}{dt} E[e^{tX}] = E[\frac{d}{dt} e^{tX}] = E[X e^{tX}].
2.Evaluate at t=0t=0: E[Xe0X]=E[X1]=E[X]E[X e^{0 \cdot X}] = E[X \cdot 1] = E[X].
3.The Pattern: The nn-th derivative evaluated at 0 gives the nn-th moment, E[Xn]E[X^n]. To find the variance, you find E[X2]E[X^2] using the second derivative and then use Var(X)=E[X2](E[X])2Var(X) = E[X^2] - (E[X])^2.
Intuition
Uniqueness Property

Every distribution has a unique MGF (if it exists). If you find that the sum of two variables has the MGF of a Normal distribution, then that sum must be Normally distributed. This is a very powerful way to prove the Central Limit Theorem!