# The distribution of least prime divisors

Inspired by / based on: https://manasataramgini.wordpress.com/2018/05/20/a-note-on-the-least-prime-divisor-sequences-of-2p-plus-or-minus-1/

Consider a number $m$. If we consider the sequence $q_n = 2p_n - 1$ where $p_n$ denotes the $n$th prime number, how frequently is $m$ the smallest factor (other than $1$) of $q_n$?

One thing to note immediately is that the smallest factor of a number (A020639) is always a prime number. Further, in our case as $q_n$ is always odd, this least prime factor can’t be even. So the answer is $0$ unless $m$ is one of the odd prime numbers ($3, 5, 7, 11, \dots$). (The sequence of values of smallest prime factors of $q_n$ is A023585; we want to know the frequency with which the different odd primes $m$ occur in the sequence.)

If one wants a more formal statement of the problem: let

What is the value of $f(m) = \lim\limits_{N \to \infty} f_N(m)$?

Let’s fix a particular value of $m$, say $m = 3$. How frequently is $3$ the smallest prime factor of a number $q_n = 2p_n - 1$?

$3$ divides $q_n$ if and only if $2p_n - 1 \equiv 0 \pmod 3$ which is the same as $p_n \equiv 2 \pmod 3$. As all primes (other than $3$ itself) are either $1$ or $2$ mod $3$, and are not expected to favour either of these congruence classes over the other*, in the long run exactly half the numbers $p_n$ are $2$ modulo $3$, so we should expect to see the number $3$ about **half** the time as the least prime factor.

_{[*An important footnote here: the fact that the primes fall into each possible residule class modulo a number $n$ (i.e. those $\phi(n)$ residue classes relatively prime to $n$) with equal frequency is known as the prime number theorem for arithmetic progressions. This has been proved; the frequency of primes in each of the classes is exactly the same. But if we care about the exact value down to which residue class is the “winner” (not a bigger ratio in the limit, but a bigger absolute number at some finite point), then there exists such a thing as the Chebyshev bias. But that is a second-order effect and we don’t have to care about it here…]}

Next consider $m = 5$. Again, $5$ divides $q_n$ if and only if $2p_n - 1 \equiv 0 \pmod 5$, which is the same as $p_n \equiv 3 \pmod 5$. Now, again, of all the primes (other than $5$ itself), on average exactly $1/4$th of them fall into each residue class. *However*, for $5$ to be the smallest prime factor, we first need the number $q_n$ to *not* be divisible by $3$, which only happens half the time. Thus, $f(5) = (1 - f(3))/4$, or

Next consider $m = 7$. Half the numbers $q_n$ are divisble by $3$. Of the remaining numbers, $1/4$th are divisible by $5$. (That is, $1/4$th of all the numbers $q_n$ are divisible by $5$, and this remains the case even if we consider just the numbers $q_n$ not divisible by $3$.) So by a similar argument, $f(7) = (1 - \frac12)(1 - \frac14)/6$, thus

By exactly a similar argument, we get

and in general, when $m$ is a prime number,

where the product is taken over prime numbers $p$.

(In case one is wondering: when $f(m)$ is expressed as a fraction, the denominator is not always a power of $2$ as it was in the above early examples; for example $f(47) = \frac{788307}{192937984}$ where the denominator is divisible by $23$.)

This same formula for $f(m)$ holds if the sequence $q_n = 2p_n - 1$ is replaced by the sequence $q_n = 2p_n + 1$ or $q_n = 2^kp_n + 1$ or $q_n = 2^kp_n - 1$ for any integer $k$.

And a similar formula holds (and computation can be carried out) for the frequency of least prime divisors of the sequence $q_n = ap_n + b$ for any integers $a$ and $b$:

- if either $\gcd(a, b) > 1$ or they are both odd, the question becomes trivial: all $q_n$ (at least after the first one) are divisible by a common prime,
- else, the calculation is as above, and the only difference is that one must exclude factors in the product for primes $p$ that divide either $a$ or $b$.

A note on the asymptotics of $f(m)$: Mertens proved in 1874 that $\displaystyle \prod_{p \leq x} (1-\frac{1}{p}) = \frac{e^{-\gamma}+o(1)}{\log x}$. Here we have $\prod_{2 < p < m}\left(1 - \frac{1}{p-1}\right)$ which we can expect will have a similar rate of growth: namely, $\frac{c}{\log m}$ for some $c$, so that $f(m) \sim \frac{c}{(m-1)\log m}$. And (thanks to asking about it on math.SE), it turns out that it is indeed the case: we have

where $\gamma \approx 0.57721566490153286…$ is the Euler-Mascheroni constant, and $C_2 \approx 0.66016181584686957…$ is the twin prime constant. For example, for $m = 91781$, we have

(Convergence is slow: near $m=100000$ we still barely get three significant digits of accuracy. But we have proof.)

(Thanks for reading! If you have any feedback or see anything to correct, contact me or edit this page on GitHub.)