Inspired by / based on:

Consider a number $m$. If we consider the sequence $q_n = 2p_n - 1$ where $p_n$ denotes the $n$th prime number, how frequently is $m$ the smallest factor (other than $1$) of $q_n$?

One thing to note immediately is that the smallest factor of a number (A020639) is always a prime number. Further, in our case as $q_n$ is always odd, this least prime factor can’t be even. So the answer is $0$ unless $m$ is one of the odd prime numbers ($3, 5, 7, 11, \dots$). (The sequence of values $m$ is A023585; we want to know the frequency with which the different odd primes occur in the sequence.)

If one wants a more formal statement of the problem: let

What is the value of $f(m) = \lim\limits_{N \to \infty} f_N(m)$?

Let’s fix a particular value of $m$, say $m = 3$. How frequently is $3$ the smallest prime factor of a number $q_n = 2p_n - 1$?

$3$ divides $q_n$ if and only if $2p_n - 1 \equiv 0 \pmod 3$ which is the same as $p_n \equiv 2 \pmod 3$. As all primes (other than $3$ itself) are either $1$ or $2$ mod $3$, and are not expected to favour either of these congruence classes over the other*, in the long run exactly half the numbers $p_n$ are $2$ modulo $3$, so we should expect to see the number $3$ about half the time as the least prime factor.

[*An important footnote here: the fact that the primes fall into each possible residule class modulo a number $n$ (i.e. those $\phi(n)$ residue classes relatively prime to $n$) with equal frequency is known as the prime number theorem for arithmetic progressions. This has been proved; the frequency of primes in each of the classes is exactly the same. But if we care about the exact value down to which residue class is the “winner” (not a bigger ratio in the limit, but a bigger absolute number at some finite point), then there exists such a thing as the Chebyshev bias. But that is a second-order effect and we don’t have to care about it here…]

Next consider $m = 5$. Again, $5$ divides $q_n$ if and only if $2p_n - 1 \equiv 0 \pmod 5$, which is the same as $p_n \equiv 3 \pmod 5$. Now, again, of all the primes (other than $5$ itself), on average exactly $1/4$th of them fall into each residue class. However, for $5$ to be the smallest prime factor, we first need the number $q_n$ to not be divisible by $3$, which only happens half the time. Thus, $f(5) = (1 - f(3))/4$, or

Next consider $m = 7$. Half the numbers $q_n$ are divisble by $3$. Of the remaining numbers, $1/4$th are divisible by $5$. (That is, $1/4$th of all the numbers $q_n$ are divisible by $5$, and this remains the case even if we consider just the numbers $q_n$ not divisible by $3$.) So by a similar argument, $f(7) = (1 - \frac12)(1 - \frac14)/6$, thus

By exactly a similar argument, we get

and in general, when $m$ is a prime number,

where the product is taken over prime numbers $p$.

(In case one is wondering: when $f(m)$ is expressed as a fraction, the denominator is not always a power of $2$ as it was in the above early examples; for example $f(47) = \frac{788307}{192937984}$ where the denominator is divisible by $23$.)

This same formula for $f(m)$ holds if the sequence $q_n = 2p_n - 1$ is replaced by the sequence $q_n = 2p_n + 1$ or $q_n = 2^kp_n + 1$ or $q_n = 2^kp_n - 1$ for any integer $k$.

And a similar formula holds (and computation can be carried out) for the frequency of least prime divisors of the sequence $q_n = ap_n + b$ for any integers $a$ and $b$:

  • if either $\gcd(a, b) > 1$ or they are both odd, the question becomes trivial: all $q_n$ (at least after the first one) are divisible by a common prime,
  • else, the calculation is as above, and the only difference is that one must exclude factors in the product for primes $p$ that divide either $a$ or $b$.

A note on the asymptotics of $f(m)$: Mertens proved in 1874 that $\displaystyle \prod_{p \leq x} (1-\frac{1}{p}) = \frac{e^{-\gamma}+o(1)}{\log x}$. Here we have $\prod_{2 < p < m}\left(1 - \frac{1}{p-1}\right)$ which we can expect will have a similar rate of growth: namely, $\frac{c}{\log m}$ for some $c$, so that $f(m) \sim \frac{c}{(m-1)\log m}$. And (thanks to asking about it on math.SE), it turns out that it is indeed the case: we have

where $\gamma \approx 0.57721566490153286…$ is the Euler-Mascheroni constant, and $C_2 \approx 0.66016181584686957…$ is the twin prime constant. For example, for $m = 91781$, we have

(Convergence is slow: near $m=100000$ we still barely get three significant digits of accuracy. But we have proof.)