I’m interested in how students think about examples, and whether we can train them to “think better” about examples. At the moment this is all just hypothetical, and I’m hoping to formalise it in the future.
There’s a lot of research out there about example generation tasks. The book by Watson and Mason is the obvious starting point, and there are also loads of good papers.
I think I’m more interested in the idea of getting students to use examples to help them to understand a difficult new piece of mathematics. This is difficult to observe, and practically impossible to write a paper on, so is most likely just a good piece of advice to give them rather than anything I can publish.
Suppose you’re looking through some lecture notes, and you come across:
Theorem 2.15. Let \((X_k)_{k\geq 1}\) be i.i.d. random variables with values in \(\{0, 1, 2, . . . \}\) and let \(N \geq 0\) be an integer-valued random variable independent of \(\{X_k\}_{k\geq 1}\).
Then \(S_N := X_1 + \dots + X_N\) has generating function
\[G_{S_N}(s) = G_N \big( G_X(s)\big).\]How should you unravel this? My advice is to come up with three examples, following the process “the good, the bad, and the ugly”.
First, we replace each mathematical object with the nicest example we can think of. This will help us to see roughly what’s going on, as long as everything is nice.
For instance, the first random variable I think of is a Bernoulli\((p)\) random variable, which takes value 1 with probability \(p\) and 0 with probability \(1-p\).
Its generating function is
\(G_X(s) = ps^1 + (1-p) s^0 = ps + (1-p)\).
For our integer-valued random variable, we can take another Bernoulli distribution, \(N \sim \text{Bernoulli}(q)\). This has the same generating function as before (swapping \(p\) for \(q\)). If \(N=0\), then \(S_N\) is also 0; but if \(N=1\), then \(S_N\) is equal to \(X_1\) (which is either 0 or 1). So we get:
\[S_N = \begin{cases} 1 & \text{with probability} \, pq \\ 0 & \text{with probability} \, 1-pq \end{cases}.\]Calculating the generating function two different ways, we have
\[G_{S_N}(s) = s^1 \times pq + s^0 \times (1-pq)\](directly from our distribution of \(S_N\)), or:
\[G_N \big(G_X(s)\big) = G_N ( ps + (1-p)) = q ( ps+ 1-p) + (1-q) = pqs -pq + 1.\]They’re the same: in this simple case, the theorem works.
Now we stretch the theorem by using a “weird” example for each mathematical object. If we’re not feeling confident we can test out one mathematical object at a time. This helps us to start to see how the conditions of the theorem apply, in increasingly detailed ways.
Here we’d have to do a bit more work. For instance, we could replace \(N\) with a dice roll: \(N \sim \text{Unif}\{1,2,3,4,5,6\}\). Then
\[G_N(s) = \frac{1}{6} \left( s^1 + s^2 + s^3 + s^4 + s^5 + s^6\right).\]Our sum \(S_N\) is equivalent to tossing \(N\) coins and counting how many heads we see, so (if we know what \(N\) is) it has a Binomial\((N, p)\) distribution, which has generating function \((ps + 1-p)^N\).
Using a bit of conditional expectation,
\[\begin{aligned} G_{S_N}(s) & = \mathbb{E}[s^{S_N}] = \mathbb{E}\big[ \mathbb{E}[s^{S_N} \vert N] \big] \\ & = \mathbb{E}\big[ (ps + 1-p)^N] \\ & = \frac{1}{6} \left( (ps + 1-p)^1 + (ps + 1-p)^2 + (ps + 1-p)^3 + (ps + 1-p)^4 + (ps + 1-p)^5 + (ps + 1-p)^6\right). \end{aligned}\]This is exactly \(G_N(G_X(s))\). Maybe you think it’s cheating to use conditional expectation – if you’d rather, you can work out the probability that \(S_N\) is equal to each of \(0,1,2,3,4,5,6\) by hand and see if you get the same thing.
Finally, we see what happens when the conditions of the theorem aren’t fulfilled. Here there are a couple of things we can do: we could choose \(X_k\)s which are not i.i.d, or which aren’t independent of \(N\), or have one or both of them not integer-valued. If we try this last one, the generating functions just don’t exist: earlier in the notes we’ve defined these functions only for integer-valued random variables.
We could try breaking independence, though. Firstly, let’s assume the \(X_k\)s are not independent, because they’re all equal to \(X_0\). Now
\[S_N = X_0 + X_0 + ... + X_0 = N X_0\]is just a multiple of \(N\), and we know this has generating function
\[G_{S_N}(s) = \mathbb{E}[s^{NX_0}] = \mathbb{E} [ (s^X_0)^{N}] = \mathbb{E}[G_{X}(s^N)].\]This is not the same thing as \(G_N \big( G_X(s)\big) = \mathbb{E}[(G_X(s))^N]\), in general.
Next we could use \(X_k\)s with different distributions, say Bernoulli(\(p\)) and Bernoulli(\(1-p\)). Or we could use an \(N\) that depends on \(X_0\). Or both! By this point we have a fairly good grasp of what the theorem is telling us, when it works, and when (and why) it doesn’t, and we’re ready to move on to the proof.