© 2011 *by Alexei Kourbatov, JavaScripter.net/math*

*Appendix*)

**Abstract.** We derive a simple formula expressing the
** expected maximal interval**
between rare random events in terms of the average interval:

whereE(max interval) =aln(T/a) +O(a)(1 << a<<T),

The formula holds for random events occurring at
exponentially distributed (real-valued) intervals, as well as
for events at
geometrically distributed discrete (integer-valued) intervals.
(For more information on extreme value distributions of random sequences see [1–4];
for extreme value distributions of integer-valued random sequences,
such as head runs in coin toss sequences, see [5,6] and further references therein.)
We expect that similar formulas would also be applicable to maximal intervals between terms
in certain other kinds of random or pseudorandom sequences.
Moreover, a similar heuristic formula
*a*(ln(*x*/*a*) − *b*)*x*
as well as
maximal gaps between prime constellations
or prime *k*-tuples [7].

**Notation.** In all formulas hereafter:

ln *x* denotes the *natural logarithm of x*

log_{b}*x* denotes the *base-b logarithm of x*

*E*(*x*) denotes the *expected value of a random variable x*

SD *x* denotes the *standard deviation of x*

Var *x* denotes the *variance of x*

*y* = *O*(*x*) is shorthand for *y* is such that |*y*| ≤ *C**x* for some constant *C*.

*y* = *o*(*x*) is shorthand for *y* is such that *y*/*x* tends to zero.

**Problem A.** Consider a non-stop toll bridge with very light traffic.
Let *p* > 1/2*q* = 1 − *p**T* seconds, where *T* is large, while *p* is constant.

**Problem B.** Consider a biased coin with a probability of heads *p* > 1/2*q* = 1 − *p**T* times, where *T* is large.

In both problems, we want to answer the following questions:

(1) What is the expected *total number* of rare events (cars or tails) in the observation series of length *T*?

(2) What is the expected *average interval* *a* between events (i.e. between cars/tails)?

(3) What is the expected *maximal interval* between events, as a function of the average interval *a*?

Before you read on, try to answer the questions yourself. (Notice that the first and the second question are so much easier than the third!)

Here are the easy answers:

(1) Because the probability of the event is *q* at any given second/toss,
we expect a total of *nq* events after *n* seconds/tosses,
and a total of *Tq* events at the end of the entire observation series of length *T*.

(2) To estimate the expected *average* interval *a* between events, we divide the total
length *T* of our observation series by the expected total number of events *Tq*.
So the expected average interval between events is
*a* ≈ *T*/(*Tq*) = 1/*q*

(3) Quite obviously, we can predict that the expected maximal interval is less than *T*, but not less than *a*:

The expected maximal interval will likely depend on botha≤E(max interval) <T.

It is also reasonable to expect that ƒ(E(max interval) = ƒ(a,T).

This is nice – but can we say anything more specific about the expected maximal interval?

**In problem A**, to estimate the most probable maximal interval between cars we proceed as follows:

After *n* seconds of observations, we would have seen about *nq* cars, hence
about *nq* intervals between cars. The intervals are independent of each other and real-valued.
A known good model for the distribution of these intervals is the
exponential distribution:

with probability 1, each interval must be at least 0 seconds;

with probability *p*, any given interval is at least 1 second;^{1}

with probability *p*^{2}, any given interval is at least 2 seconds;

with probability *p*^{3}, any given interval is at least 3 seconds;`...`

with probability *p*^{t}, any given interval is at least *t* seconds.

Thus, after *n* seconds of observations and about *nq* carless intervals,
we would reasonably expect that at least one interval is
*no shorter than t seconds* if we choose

Now it is easy to estimate the most probable maximal intervalp^{t}× (nq) ≥ 1.

p^{tmax}≈ 1/(nq)

(1/p)^{tmax}≈nq

t_{max}≈ log_{1/p}(nq).

**In problem B** we can estimate the longest run of heads *R _{n}* after

In both problems, the estimates for the most probable maximal interval (as a function ofR≈ log_{n}_{1/p}(nq).

or, omitting theln(1/ p) = ln(1/(1 −q)) = ln(1 +q+o(q)) =q+o(q)

ln(1/ p) ≈ ln(1 +q) ≈q,and therefore

1 / ln(1/ p) ≈ 1/q≈a.

This allows us to transform our previous estimate log_{1/p}(*nq*) as follows:

For a long series of observations, with the total length or durationlog _{1/p}(nq) = ln(nq) / ln(1/p) ≈ (1/q) ln(nq) ≈aln(n/a).

**(A) Exponential initial distribution.**
Fisher and Tippett [1], Gnedenko [2], Gumbel [3] and
other authors
showed that, for initial distributions of *exponential type* (including the
exponential distribution as a special case)
the *limiting distribution* of maximal terms in a random sequence is
the double exponential distribution –
often called the Gumbel distribution.
If the initial exponential distribution has the mean *a* and the probability density function (PDF)
*p*^{t} = 1 − *e*^{−t/a}

where γ = 0.5772... is the Euler-Mascheroni constant. The mean value of observed maximal intervals in problem A will converge almost surely to the meanScale=a= 1/ln(1/p) ≈ 1/q

Mode= μ = log_{1/p}(Tq) ≈aln(T/a)

Mean= μ + γa

E(max interval) = μ + γa= log_{1/p}(Tq) + γ/ln(1/p)or, expressing the result in terms of aandT,

E(max interval) ≈aln(T/a) + γa=aln(T/a) +O(a).

*Historical notes:*
Interestingly, Gumbel was not the original discoverer of his namesake distribution.
Fisher and Tippett (1928) [1]
described three types of limiting extreme-value distributions and
showed that the double exponential distribution
is the limiting extreme-value distribution for a certain wide class of random sequences.
They also computed, among other parameters, the *mean-to-mode distance* in the
double exponential distribution (see [1], p.186; it is this result that allows one to conclude that the mean is
*a**domain of attraction* of a given type of limiting distribution.
The three possible types of limiting distributions are the
Gumbel (double exponential),
Fréchet, and
Weibull distributions.

**(B) Geometric initial distribution.**
Somewhat surprisingly, in this case the limiting extreme-value distribution *does not exist*
(see [5], p.203, for a detailed explanation).
For the longest run of heads *R _{n}* in a series of

where the first term is the same as in Problem A (up to a substitutionE(R) = log_{n}_{1/p}(nq) + γ/ln(1/p) − 1/2 +r_{1}(n) + ε_{1}(n)

E(R) =_{n}aln(n/a) +O(a).

**(A) Exponential initial distribution.**
Here the limiting distribution of maximal intervals is
the Gumbel distribution
with the scale *a* = 1/ln(1/*p*)*a*/√6
=
π/(√6 ln(1/*p*))
= *O*(*a*).

**(B) Geometric initial distribution.**
For the longest run of heads *R _{n}* in a series of

where the first term grows asVar R= π_{n}^{2}/(6 ln^{2}(1/p)) + 1/12 +r_{2}(n) + ε_{2}(n) ([5], p.202)

SD R= π/(√6 ln(1/_{n}p)) +O(1) =O(a).

(A) rare events occurring at exponentially distributed intervals;

(B) discrete rare events at geometrically distributed intervals.

These two situations are somewhat different:

in case (A) maximal intervals have a limiting distribution
(the Gumbel distribution), while

in case (B) no limiting distribution exists
(here the Gumbel distribution is simply a good approximation).

Nevertheless, *in both cases* the expected maximal interval between events is

whereE(max interval) =a_{ }ln(T/a) + γa+ lower-order term =a_{ }ln(T/a) +O(a),with standard deviation ≈ π a/√6 =O(a),

*Note.*
A remarkably similar heuristic formula,
*a*(ln(*x*/*a*) − *b*)*−ba**a*,
describes

1. R.A. Fisher and L.H.C. Tippett, Limiting forms of the frequency distribution of the largest and smallest member of a sample,
*Proc. Camb. Phil. Soc.*, 24 (1928), 180-190.

Online: http://digital.library.adelaide.edu.au/dspace/bitstream/2440/15198/1/63.pdf

2. B.V. Gnedenko, Sur la distribution limite du terme maximum d'une série aléatoire.
*Ann. Math.*, 44 (1943), 423-453.

English translation: On the limiting distribution of the maximum term in a random series. In:
*Breakthroughs in Statistics, Volume 1: Foundations and Basic Theory.* Springer, New York (1993), 185-225.

Online: http://books.google.com/books?id=tNKkI3QX-UoC&pg=PA185

3. E.J. Gumbel, Statistical theory of extreme values and some practical applications.
*Applied Mathematics Series*, 33. US National Bureau of Standards (1954)

4. E.J. Gumbel, Statistics of Extremes, Columbia University Press, New York (1958)

5. M. F. Schilling, The longest run of heads. *The College Mathematics Journal*, v.21 (1990), No.3, 196-207.

Online: http://gato-docs.its.txstate.edu/mathworks/DistributionOfLongestRun.pdf

6. L. Gordon, M.F. Schilling, and M.S. Waterman, An extreme value theory for long head runs. *Probability Theory and Related Fields*, v.72 (1986), 279-297.

Online: http://www.cmb.usc.edu/papers/msw_papers/msw-070.pdf

7. A. Kourbatov, Maximal gaps between prime *k*-tuples. In: *Doing math with JavaScript* (2011)

Online: http://www.javascripter.net/math/primes/maximalgapsbetweenktuples.htm

8. A. Kourbatov, Maximal gaps between Cramer's random primes (2014)

Online: http://www.javascripter.net/math/statistics/maximalprimegapsincramermodel.htm

Copyright © 2011-2014, Alexei Kourbatov, JavaScripter.net.