(Don’t be fooled by some well-formatted sections below; this is still an incoherent draft. This is my second draft attempt to write this up; the previous in 2016-17. But I think I have all the pieces now and on the third attempt I may complete it!)

(Outline of what I imagine the article will look like:

a quotation of Bell’s story, with proper references to where it was published and republished
some links to the many mutations of the story
saying we should first understand the mathematics, in its proper historical context
(I don’t have this yet so maybe not…) A history of Mersenne numbers and what was known by 1903
a history of the ideas of factorization, leading up to Cole:
- Narayana/Fermat, difference of squares, with worked examples / demo
- Legendre(+Gauss), each residue gives 1/2 exclusion, CFRAC, with worked examples / demo
- Seelhoff’s multiple ideas (how to find small residues, multiple polynomials, when to stop factoring, CoS, probabilistic test)
- translation of 2 main papers (move elsewhere), with multiple worked examples / demo
- Cole’s new exclusion which cuts 3/4th with each residue, with worked examples / demo
- Cole’s paper, with annotation
- After Cole: Kraitchik, Lehmer, … maybe also Dixon, Pomerance, etc. Present day.
summary of mathematical content: Cole built mostly on Seelhoff + added an idea of his own.
history of Bell (Constance Reid’s book is AWESOME! – him in 1903, FNC advisor)
history of Cole (relationship with AMS and its meetings/reports, FNC prize)
plausible reconstruction of events: what is known, what is believable (Sundays, applause?)

Instead this jumble posted here looks like the following.)

How Cole factored $2^{67} - 1$ in 1903

How Cole factored $2^{67} - 1$ in 1903

Apart from Cole’s paper (On the factoring of large numbers, Bulletin of the American Mathematical Society, Volume 10, Number 3 (1903), 134-137) and the only other paper it cites (that of Seelhoff), this is based on:

Factoring Integers Before Computers (1994) by H. C. Williams and J. O. Shallit, in Mathematics of Computation, 1943–1993: A Half-century of Computational Mathematics (which is Proceedings of Symposia in Applied Math., Vol 48, ed. Walter Gautschi, publ. AMS)
Historical Background of the Number Field Sieve Factoring Method (1996), by R.-M. Elkenbracht-Huizing, in Nieuw Arch. Wisk, Volume 14, 375-389.

There’s also a good well-motivated read in:

Smooth numbers and the quadratic sieve (2008), by Carl Pomerance, in Algorithmic Number Theory, MSRI Publications, Volume 44, 2008

Also relevant:

Cole’s report of the meeting: https://projecteuclid.org/download/pdf_1/euclid.bams/1183417790 (T H E OCTOBER MEETING OF TH E AMERICAN MATHEMATICAL SOCIETY., Jan 1904)
https://www.ams.org/journals/bull/1927-33-06/S0002-9904-1927-04477-9/S0002-9904-1927-04477-9.pdf (Cole obituary and bibliography, his place in the AMS)

[Reference for later: see around https://archive.org/details/thoriedesnombres02krai/page/146 ]

Firstly, there was a long history of interest in the numbers of the form $2^n - 1$, and much was known:

In 1856, Reuschle published […] a table of factors of $2^n - 1$ and $10^n - 1$ for $n < 42$, and in 1869, Landry published a table of factors of $2^n ± 1$ for $n < 64$. The “holes” in these and other tables provided a source of challenges in factoring and primality testing that occupied the energies of many workers over several decades …

There’s a fair bit in the paper (and a book by Williams) about Édouard Lucas, who did a lot of great work on primality testing in 13 papers between January 1876 and January 1878 (while writing at least 70 papers on other subjects), before abruptly leaving the field, being soured by a harsh review of his work. Anyway, in 1881 Lucas learned that $M_{61} = 2^{61} -1$ is a prime, missing from Mersenne’s list:

Seelhoff had shown this in 1886, and Hudelot, after 54 hours of labour, in 1887 confirmed Seelhoff’s result. This confirmation was very much needed, as Seelhoff’s proof is certainly not complete.

So there was some doubt already whether $M_{67}$ (listed by Mersenne) was indeed prime. (Also: Pervushin had shown the same in 1883, though references for how he showed this have been surprisingly hard to find.)

Narayana (1356) / Fermat (1643)

The history of factoring proper starts with “Fermat’s method” (actually known to Nārāyaṇa in 1356): to factor a large number $N$, if we can find a number $x$ ($\ge \sqrt{N}$) such that $x^2 - N = y^2 \tag 1$ is a square, then $N = x^2 - y^2 = (x-y)(x+y)$.

Given a number $N$, we let $m = \lfloor\sqrt{N}\rfloor$ and try values $(m + \alpha)^2 - N$ for $\alpha = 1, 2, 3, \dots$ until we find one that is a square. For numbers $N = pq$ that are a product of two primes, this method is useful (faster than trial division) only when $p$ and $q$ happen to be close, i.e. their difference is roughly on the order of $N^{1/4}$.

Legendre (1798)

Legendre observed that if we find integers $(x, y, z, k, a)$ such that $x^2 - kNz^2 = ay^2\tag 2$ (we can take $a$ to be square-free), then this imposes congruence conditions on any prime factor $p$ of $N$: as $x^2 \equiv ay^2 \pmod p$, we can say that $p$ must be such that $a$ is a quadratic residue modulo $p$. For example if $a = 2$ then $p \equiv \pm 1 \pmod 8$; if $a = 3$ then $p \equiv \pm 1 \pmod {12}$, etc — Legendre compiled a table of such conditions. This cuts down on how many primes we need to trial-divide $N$ by. Note that this is equivalent to finding a quadratic residue $ay^2$ mod $N$, and removing squares in it to keep the square-free part $a$.

So already it was known that to factor $N$, it is helpful to find many small quadratic residues mod $N$: find solutions to $x^2 \equiv a \pmod N\tag 3$ This is still the task carried out by the fastest algorithms today.

To actually find such solutions $(x, y, z, k, a)$ as in (2), Legendre proposed using the continued fraction of $\sqrt{kN}$, and I believe this is the idea that was rediscovered to become part of the Continued fraction factorization method (CFRAC).

Gauss (1801)

A simple but important observation, of using exclusion moduli: to solve an equation $f(x) = g(y)$ (for example $22 + 97x = y^2$), we can solve it modulo several different primes $E_1, E_2, \dots$, and each of them imposes some conditions on $x$. If each prime $E_i$ rules out half the possible values modulo $E_i$ (as will be typical for the cases we consider), then after considering $k$ primes we only have to try about one in every $2^k$ values for $x$.

Seelhoff (1886)

Paul Peter Heinrich Seelhoff (1829–1896), high school teacher in Mannheim (earlier Bremen?)

His work (which had many errors) appears to have been largely ignored, but he had a couple of important ideas that Cole used (and cited vaguely).

Let me first try to collect a list of his publications:

1884, Befreundete Zahlen. Arch Math. Phys. 70, 1884, 75–89, vii-viii.
1885, Uber die volkommenen Zahlen, ins-besondere über die bis jetzt zweifelhaften Fälle $2^{40}(2^{41}-1)$, $2^{46}(2^{47} - 1)$ und $2^{52}(2^{53}-1)$. Hoppe Arch (2) II, 327–329 [also cited as Archiv der Mathematik und Physik]
1885 April: Prüfung Grösserer Zahlen auf Ihre Eigenschaft als Primzahlen, http://jstor.org/stable/2369272, American Journal of Mathematics, Vol. 7, No. 3 (Apr., 1885), pp. 264-269 (6 pages) link
1885 September (continued): Prüfung Grösserer Zahlen auf Ihre Eigenschaft als Primzahlen, https://www.jstor.org/stable/2369356, American Journal of Mathematics, Vol. 8, No. 1 (Sep., 1885), pp. 26-38 (13 pages). [Same two papers also listed as Newcomb Am J. VII 264–270, VIII, 26–38]
1885 September (different paper): Nova Methodus Numeros Compositos a Primis Dignoscendi Illorumque Factores Inveniendi, American Journal of Mathematics link Vol. 8, No. 1 (Sep., 1885), pp. 39-44 (6 pages)
1886 Notes mathématiques 6, Mathesis 6 (1886), 100–101
1886: Die Auflösung grosser Zahlen in ihre Factoren, Zeitschrift für Mathematik und Physik, Volume 31, page 166, googlebooks, [This is the paper cited by Cole, and also the one whose French translation appeared in Kraitchik’s(?) journal Sphinx-Oedipe.]
1886: XXII. Ein neues Kennzeichen für die Primzahlen, Zeitschrift fur Mathematik und Physik, Volume 31, page 306, googlebooks
1886 Zur Analyse sehr grosser Zahlen. Hoppe Arch. (2) II, 239–332.
1886 Die Zahlen von der Form $k \cdot 2^n + 1$, Zeitschrift fur Mathematik und Physik, vol. 31 (1886) p. 380, “A short list of primes of the form k(2^n+1) was given by Seelhoff in 1886”
Search this book for Seelhoff: https://books.google.com/books?id=qfPxAAAAMAAJ

Let’s look at the 1886 paper cited by Cole, because it’s still an open question how Cole found so many residues. The title of the paper is “Die Auflösung grosser Zahlen in ihre Factoren” which translates to:

The dissolution of large numbers into their factors.

The method, of breaking up large numbers into their factors, which I would like to discuss in the following, has proved itself in all cases for which I used it, right up to $2^{64} + 1$. Just this number, whose factorization by Mr. Landry was unknown to me at the time, has given reason to further develop and bring to a certain conclusion the method which I had already applied in the main for numbers such as $2^{47}-1$, $2^{53}-1$, etc. In the form thus obtained, I present it.

A note on notation: in the entire paper, it appears that whenever (or at least most of the time) he writes $x \equiv y \pmod p$ — actually in the original he writes simply $x \equiv y\ (p)$ — he means something like “$y$ is the remainder when $x$ is divided by $p$”, similar to the programmer’s definition of $x \bmod p = y$.

Let the number whose factors are to be found be called N. Then set $N = m^2 + r.$ [That is, define $m$ as $\lfloor\sqrt N\rfloor$ and $r$ as $N-m^2$.] Now for a prime $p$, if $N \equiv ϱ \pmod p$ where $ϱ$ is a quadratic residue for $p$, so that $m_1^2 \equiv ϱ \pmod p$ takes place [for some $m_1$], then we can form the following equation: $N = m_1^2 + (m+m_1)(m-m_1) + r.$ If we call the second and third term above together as $b$, then $b = m^2 + r - m_1^2.$ [Paper has typo: writes $b = m^2 + ϱ - m_1^2$.] But you have $m^2 + r \equiv ϱ \pmod p$ and $-m_1^2 \equiv -ϱ \pmod p$ therefore (adding) $b = m^2 + r - m_1^2 \equiv 0 \pmod p.$ [Again, paper has a typo here and writes $b = m^2 + ϱ - m_1^2$ instead.]

If one extends the root $m_1$ to $m_1 + py$, it becomes $\begin{align}N &= (m_1 + py)^2 + (m + (m_1 + py))(m - (m_1 + py)) + r \\ &=a^2 + b\end{align}$ and $b \equiv 0 \pmod p.$ [This $b$ is different from earlier; the point is that we can write $N = a^2 + b$ for various $a = m_1 + py$ and corresponding $b$, by changing $m_1$ to $m_1 + py$, such that $b \equiv 0 \pmod p$ always.]

If one now determines, for the prime numbers $p$ from $2$ on for which $\left(\frac{N}{p}\right) = +1$, up to a limit to be determined by the size of the number, and also for their second powers the values $m_1$ [i.e. if one characterizes for which $m_1$ we can write $N = m_1^2 + b$ with $b$ divisible by $p$ or $p^2$], it is possible, by itself or by their combinations [i.e. use either a single $p$ or pairs of them], to form a larger number of simple binary quadratic representations of the number N. For numbers up to 15 digits it is sufficient to take the relevant prime numbers up to 97; in prime 2, I go to the tenth power, 3 to the sixth, and 5 to the fourth. [Here, correspondingly Cole for the 21-digit $N = 2^{67} - 1$ says it’s enough to take the smallest $70$ primes “available”.] [This is a rather intensive process: e.g. if we have about $30$ such prime powers, then apart from trying say a handful for each of them individually, we also want to try at least one for each pair of them, so at least a thousand computations!]

Now we have the following.

As one considers such decompositions [representations] of $N$, one soon finds either two representations with the same determinant [residue] and hence two factors, or a pair may be formed by eliminating common factors of different determinants in two or more representations, like $a_1^2 + mc_1^2 = \mu N, \quad a_2^2 + mc_2^2 = \nu N,$ belonging to different positive or negative roots of the congruence $Z^2 \equiv -m \pmod N$ and thus lead again to the determination of two factors. [This $m$ is unrelated to $\sqrt{N}$; this is an unfortunate clash of notation that I introduced, by changing Seelhoof’s $\omega$ for $\sqrt{N}$ to $m$ instead. Oops.]

If, on the other hand, $N$ is a prime number, then though it is easy to arrive at such eliminations; these, however, always lead to the same root $\pm Z$. At the same time, the fact that one is dealing with a prime number is to some extent explained [or: we can be reasonably confident that we are dealing with a prime number] if there are a number of determinants consisting of only one factor, and also one repeatedly gets the same determinant $\Delta$ with both + and - , so $\Delta = - 1$ is got as a [residue]. In order to be certain, the determinants obtained are usually more than sufficient to eliminate the primes that are not quadratic residues, which thus can not occur as divisors.

For the execution [i.e. in practice / practical implementation], notice that if $m \mp (m_1 + py) = \alpha$ is set, $m_1 + py = \pm (m - \alpha)$ and either $m + (m_1 + py) \quad\text{or}\quad m - (m_1 + py) = 2m - \alpha,$ [and the other is $\alpha$] so $N = (m - \alpha)^2 + (2m-\alpha)\alpha + r$ [I think he’s pointing out some symmetry here: for $±\alpha$ we get a similar expression?] Then let $2m \equiv \pm 2\beta \pmod p, \quad r \equiv \gamma \pmod p$ thus the congruence $(\pm 2\beta - \alpha)\alpha \equiv -\gamma \pmod p \quad\text{or}\quad \alpha^2 \mp 2\beta\alpha \equiv \gamma \pmod p$ has solutions $\alpha = \pm \beta + Z$ satisfying the congruence $Z^2 - (\beta^2 + \gamma) \equiv 0 \pmod p.$ But as $\beta^2 \equiv m^2 \pmod p$ and $\gamma \equiv r \pmod p$ therefore [adding] $\beta^2 + \gamma \equiv m^2 + r \equiv ϱ \pmod p$ so, as before, $Z^2 - ϱ \equiv 0 \pmod p,\quad\text{i.e.}\quad Z=m_1.$

Further, since $b$ after the first congruence $\pm(m - py_1)$ is put, so is $\alpha = \pm \beta + m_1 = \pm (m - py_1) + m_1 = m \mp (m_1 + py),$ like before.

The fundamental equations and congruences for the method are accordingly $N = m^2 + r;$ $N \equiv ϱ_1 \pmod p,$ $m_1^2 = ϱ_1 \pmod p,$ $m \equiv \pm \beta_1\pmod p;$ $\alpha \equiv \pm \beta \pm m_1,$ $N = (m - a)^2 + \overbrace{(2m - \alpha)\alpha + r}^{b}$ [that is, these are the defining equations for $m$, $r$, $ϱ_1$, $m_1$, $\beta_1$, $\alpha$ and $b$, finally giving the expression $N = (m-a)^2 + b$] and also $N \equiv ϱ_2 \pmod {p^2},$ $m_2 \equiv ϱ_2 \pmod {p^2},$ $m \equiv \pm \beta_2 \pmod {p^2};$ $\alpha = \pm \beta_2 \pm m_2,$ $N = (m-\alpha)^2 + \overbrace{(2m-\alpha)\alpha + r)}^{b}.$

For $p = 2,3,5$, as already mentioned, even higher powers should be considered.

If $N$ is not of the form $8n + 1$, then these are modified so that the factor $2^n$ for $b$ is not lost, insofar as one sets, if e.g., $N$ has the form $8n + 3$, then $N = 3m^2 + r,$ $m_1^2 \equiv \frac{ϱ_1 + px}{3} \pmod p$ and $m_2^2 \equiv \frac{ϱ_2 + p^2x}{3} \pmod {p^2};$ similar for $8n + 5$ and $8n + 7$.

It follows from this, while the rest remains unchanged, $N = 3(m-\alpha)^2 + 3(2m-\alpha)\alpha + r$ etc.

As far as the resolution of the congruences $m_1^2 \equiv ϱ_1 \pmod p$ and $m_2^2 \equiv ϱ_2 \pmod {p^2}$ is concerned, it is easy to produce tables for the former; for the latter, I have created such tables to $47^2$, as well as for the modulus $2^3$ to $2^{10}$, $3^1$ to $3^6$ and $5^1$ to $5^4$. In other cases, the following procedure may be the most appropriate.

If according to the above description $N \equiv ϱ_1 \pmod p$ and $\equiv ϱ_2 \pmod {p^2},$ then let $ϱ_2 = q p + ϱ_1$.

From $m_1^2 \equiv ϱ_1\pmod p$ one has $m_1^2 \equiv q_0 p + ϱ_1$.

If we solve for a given $p$ for all $ϱ_1$ the congruence $2m_1 u \equiv 1 \pmod p$ and also determine the associated $p_0$ [I think he means $q_0$] for each $ϱ$, so can be assembled for each $p$ small tables with four vertical columns, which contain in turn $ϱ_1$, $m_1$, $q_0$ and $u$.

The application is easy. Multiply $(p-p_0)$ by $u$ and determine for the product the smallest (in absolute value) remainder modulo $p$, that is $± \delta$; then $± \delta p + m_1 = m_2$. Two examples may ultimately serve to explain the method.

Let $N=7\cdot 2^{34} + 1 = 120259084289,$ $N = 346783^2 + 635200 \quad\text{so}\quad m = 346783,$ $\left(\frac{N}{7}\right) = +1,$ so you set $N \equiv 1\pmod 7\quad\text{and}\quad N\equiv 15\pmod {7^2},$ from which $m_1 = 1, \quad m_2 = 8$ follows. Further, from $m \equiv 3 \pmod 7 \text{ and } m \equiv 10 \pmod {7^2}, \quad \text{get } \beta_1 = 3 \text{ and } \beta_2 = 10,$ $\alpha = 3 \pm 1 = 2 \text{ and } 4 \text{ for } 7,\quad \alpha = 10 ± 8 = 2 \text{ and } 18 \text{ for } 7^2$ or, extended, $\alpha = 7y+2, 4\text{ and } =7^2 + 2, 18.$ Thus one finds $α = 2^3y + 0, 2, 4, 6$; $2^4y+0,6,8,14$; $2^5y + 0,14,16,30$; $2^6y + 0,30,32,62$; … [look at paper]

In addition, 127 had occurred in a determinant [residue] and [we used it?], so $\alpha = 127y + 49, 97; 127^2y + 1748, 14400.$

For $\alpha = 1950 \quad (5y + 0 \text { with } 37^2y + 581),$ [text has typo and prints $37^0$] $\alpha = 143432 \quad (127^2y + 14400 \text{ with } 11y + 3)$ and $\alpha = -3836 \quad (37^2y + 271 \text{ with } 11y + 3)$ you get

$N = 344833^2 + 2·7·11·(2^4·5·37)^2$

$N = 203351^2 + 7·(2^2·11·19·127)^2$

$N = 350619^2 - 2·11·(2·37·149)^2$

From (1) and (2) you get … [by multiplying the first equation by $11·19^2·127^2$, the second by $2·2^4·5^2·37^2$, and subtracting the now common second term] and from this in connection with (3), ….

The largest common factor of the difference on the left side and the number N is one of the sought-after factors, namely $317306291$; the other factor, which must result from the sum, is $379$ [paper has typo and writes $397$].

For the second example, take the number N = 2971215073 (the 48th number of the Lamé series 0,1,1,2,3,5,8,…). It is $N = …$ and you have $N = …$.

The sum of the two last terms is again $b$.

This then gives, among other things:

for 1, …. … for 29, …

a) From (15) and (19) you get $N = ...$

It follows …

and this representation leads to the congruence

This latter congruence is also obtained from the representation

…

which is derived from (27).

Several such compilations can be made from the above values.

b) It is equally easy to obtain representations whose determinant consists of only one prime number.

c) In 4th and 10th the determinants are +3.7.17 and -3.7.17, …

The number of these cases can be increased by elimination.

d) By means of a number of the above determinants one finds that no prime up to √N can be a divisor of N; the number is therefore a prime number.

In conclusion, I may point out two considerations, which could easily be overlooked in the first use of the given method.

One of these considerations concerns the approximate determination of the remaining factors of a determinant, if some of them have already been established, and then, if they prove too great, to refrain from continuing the calculation. An example may serve for explanation.

For $N = 4810…$

you have, among other things,

$\alpha = …$.

From the first Werther results for y = -1 special a = -8, and the two other values yield the factors 11 and 64 for the determinant.

If one now estimates (1384134 + 8) .8: (11.64.49), the approximate result is 1400 for the remaining factors and the square root of 37. If the determinant thus still has a smaller factor, this could at most be 37 his. However, only o = 7y + 1.6 and = 29 * / + 16, 29 are available, both of which are useless. So the last factor will be around 1400, so it’s probably out of the question. In fact, for a = - 8, the value of b = - 11 is X1153.282.

But that does not mean — and this is the subject of the second recital — that one should renounce such larger pactors under all circumstances. Since one can immediately deduce the corresponding values of o from the calculation, their examination leads in individual cases to a new value for b, which, in conjunction with the first found, permits the elimination of the one large factor and leads to a useful determinant. For example: For the same number N, as above, one also has a = 592y + 2094, and the combination with a = Uy + 3 yields “= 9056, and b = 463.5192 *. Since 9056 = 259 (463) and 2o> = 1387134 = 449 (463), one has

…

And from $\alpha = 190$ you get $b = …$.

If one eliminates the factor 463 from the representations corresponding to the two values of a and b, then one arrives at …

Anderweit was already found for …

From 1) and 2) follows …

and in conjunction with 3) one arrives at a representation like

from which $N = …$

is. Another resolution is incidentally impossible, since both factors are primes.

Well the translation quality dropped out towards the end, but the following is what I understand to be Seelhoff’s method. We are given a number $N = m^2 + r$, and want to find many expressions of the form $N = (m - \alpha)^2 + b$ where $b$ (either positive or negative) has only small prime factors. (Why do we want this? Because for each such expression, $-b \equiv (m-\alpha)^2 \pmod N$ is a quadratic residue modulo $N$.) This is how we go about this:

For each of the small primes and their powers, determine the values of $\alpha$ for which $b$ will be divisible by that prime power. This amounts to solving the quadratic equation $(m - \alpha)^2 \equiv N \pmod {p^k}$ for $\alpha$, which gives a congruence condition on $\alpha$.
By picking a few values of $\alpha$ satisfying either a single or multiple of these congruence conditions, we can hope to get many values of $-b$ that factor into small primes — and keep only the square-free part.
Further, we can “eliminate common factors” from these residues, e.g. if $-2 \cdot 7 \cdot 11$ and $-7$ are both quadratic residues, then so is $2 \cdot 11$.
Ultimately we get either a congruence of squares, or… see Cole’s paper for what else we can do with it.

Finding small residues

Recall that we want solutions to (3), so we want to write some multiple of $N$ (say $N$ itself) as $x^2 - b$ where after removing squares from $b = at^2$ we’re left with a small square-free part $a$. Let $m = \lfloor \sqrt{N} \rfloor$.

We want to write $N$ as $N = (m - \alpha)^2 + r_\alpha$ for various values of $\alpha$, then for each $\alpha$ we want to factor $r_\alpha$ to identify its square-free part which is hopefully small. Which $\alpha$s to try? Seelhoff’s idea was to ask when $r_\alpha$ would be divisible by a certain prime power $p^k$ (so that we get a head-start on factoring it): if $p^k$ divides $r_\alpha$, then $N \equiv (m-\alpha)^2$ must be a quadratic residue mod $p^k$. Specifically, if we find the number $\rho$ modulo $p^k$ such that $\rho^2 \equiv N \pmod {p^k}$, then $(m-\alpha)^2 \equiv \rho^2$ gives $\alpha \equiv m \pm \rho \pmod {p^k}$.

In other words, the method is:

Identify small primes $p$ (say the first few dozen primes) such that $N$ is a quadratic residue mod $p$.
Assemble a set $P$ consisting of: these primes, their squares, and for really small primes some higher powers as well.
For each number $p^k$ in $P$, solve $\rho^2 \equiv N \pmod {p^k}$ to get a condition of the form $\alpha \equiv m + \rho \pmod {p^k}$.
Pick various larger $\alpha$ satisfying multiple of these conditions, and factor $N - (m - \alpha)^2$, only checking for divisors in $P$) — if you obtain a number with all factors in $P$ then retain it, else discard this computation and try again with different $\alpha$.

This gives a set of quadratic residues with all factors in $P$.

(This method was not stated so clearly or explicitly by Seelhoff. Though noticed and used by Cole, it was Kraitchik who resurrected and explained them clearly during 1911 to 1952, and through him these ideas contributed to the modern computational methods still in use today.)

Combining residues

But there’s more! The next part of Seelhoff’s method, after getting these small residues, was to get nontrivial congruences out of them. For example, from three congruences like $2 \cdot 7 \cdot 11 \equiv a_1^2 \pmod N$, $7 \equiv a_2^2 \pmod N$ and $2 \cdot 11 \equiv a_3^2 \pmod N$, one has a good chance of getting a congruence $x^2 \equiv y^2 \pmod N$ for which $\gcd(N, x - y)$ or $\gcd(N, x + y)$ is a nontrivial factor of $N$. Something similar to this is the idea used in the quadratic sieve today, still the fastest known factorization algorithm for numbers around 100 digits.

A new criterion for the prime numbers.

Seelhoff gives more examples of this method in a later paper (whose main contribution is a probabilistic test for the primes, but he’s made a calculation error in the example he chose which is actually composite. Oops.)

Google-Translate OCR+translation below. Useful German references at here, maybe here, here, here.

XXII. Ein neues Kennzeichen für die Primzahlen.

XXII. A new criterion for the primes.

Unter Bezugnahme auf das Verfahren, grössere Zahlen zu analysiren, welches ich in XXXI, 3 dieser Zeitschrift mittheilte, beabsichtige ich, in dem Folgenden den Fall einer besonderen Besprechung zu unterziehen, wenn die zu untersuchende Zahl N eine Primzahl ist, da sich inzwischen bei einer grossen Zahl von Analysen, die ich gemacht habe, als ausnahmslose Thatsache erwiesen hat, dass sich die Primzahlen, welche benutzt wurden, als Reste oder Nichtreste von N ohne Schwierigkeit einzeln absondern liessen und zwar in Uebereinstimmung mit dem Gesetze der Reciprocität. Wie dies zu verwerthen ist, um ein sicheres Kennzeichen für die Primzahlen zu gewinnen, soll unten weiter ausgeführt werden; vorher möchte ich, um diese Arbeit möglichst selbstständig zu erhalten, einige einleitende Worte vorausschicken, wobei freilich Wiederholungen des Früheren nicht ganz zu vermeiden sind.

With reference to the method, of analyzing [factoring] larger numbers, which I communicated in XXXI, 3 of this journal [this is the “Die Auflösung grosser Zahlen in ihre Factoren” paper cited by Cole], I intend, in the following to subject the case to a special discussion [to discuss the special case?], when the number N to be examined is a prime number, since in the meantime, in a large number of analyses [factorings] which I have made, it has proved to be an invariable fact, that the primes which were used, as residues or nonresidues of N, could be individually separated without difficulty, and in accordance with the law of reciprocity. How to exploit this, in order to obtain a reliable [safe/positive] criterion for the primes, will be further explained below; before that I’d like, in order to keep this work as independent as possible, to preface a few introductory words, in which of course some repetitions of the former [earlier work] can not be completely avoided.

Soll die Differenz zwischen einem Quadrate und einer Zahl N durch eine bestimmte Zahl, insbesondere durch eine Primzahl p theilbar sein, so müssen beide bei der Division durch diese denselben Rest liefern. Daraus folgt zunächst, dass N durch p dividirt, einen quadratischen Rest geben muss, ferner geht daraus hervor, dass man die Wurzel des Quadrates so zu wählen hat, um bei der Division desselben durch p denselben Rest zu erhalten, welchen N giebt. Will man mehrere Quadrate nach einander subtrahiren und geht von dem Quadrate aus, welches kleiner als N ist, aber dieser Zahl am nächsten kommt, so sei dessen Wurzel $a$ und irgend eine andere Wurzel $a - 𝛼$; dann handelt es sich um die Differenz $N - (a - 𝛼)^2$ oder um den Werth, welcher ihr gleich kommt, nämlich $(2a - 𝛼)𝛼 +r$, wenn $N= a^2+r$ ist. Damit nun dieser Ausdruck $(2a - 𝛼)𝛼 + r$, der durch $b$ bezeichnet werden soll, durch $p$ theilbar ist, muss $a$ so gewählt werden, dass $(a- 𝛼)^2$ denselben Rest $ϱ$, wie $N$ selbst giebt, wenn man durch $p$ dividirt.

If the difference between a square and a number $N$ is divisible by a certain number, in particular by a prime number $p$, then both [the square and $N$] give the same remainder when divided by it [$p$]. From this it follows, first, that N when divided by $p$, must give a quadratic residue [mod $p$], and further it follows that the root of the square must be such that it [the square of this root] gives the same remainder when divided it by $p$ which $N$ gives. [In modern notation, everything up to here is a tautology: basically, if $N \equiv \square \pmod p$ then $N \equiv x^2 \pmod p$.] If one wants to subtract several squares one after the other and starts from the square which is smaller than $N$, but comes closest to this number, let its root be $a$ and some other root $a - 𝛼$; then the difference is $ N - (a - α) ^ 2 $, or equivalently, $(2a - 𝛼)𝛼 +r$, if $N = a^2 + r$. If we want this expression $(2a - 𝛼)𝛼 + r$, which we can denote by $b$, to be divisible by $p$, then $a$ must be chosen such that $(a- 𝛼)^2$ gives the same remainder $ϱ$ as $N$ itself, if we divide by $p$.

Schreibt man dementsprechend die Gleichung N - (a - q)2= b, so muss also N für jeden Factor von b quadratischer Rest sein. Umgekehrt, wenn man die Plätze von N und b vertauscht, also die Gleichung - b-J (a– u)’ = - N bildet, muss auch für jeden Factor von N der Werth - b quadratischer Rest sein. Man kann aber b immer in der Form mc” darstellen, und wenn -b Rest für jeden Factor von N sein muss, so gilt dies auch von -m, da ja c” an und für sich ein quadratischer Rest und da der zweite Factor -m also auch Rest sein muss. Schreibt man nun N=(a- aF-l-mci. so ist -m bekanntlich die Determinante der quadratischen Darstellung auf der rechten Seite, und diese muss mithin quadratischer Rest für jeden Factor von N sein. Ist das Quadrat kleiner als die Zahl, so ist die Determinante negativ, im anderen Falle positiv. Statt der Differenz zwischen einem einfachen Quadrate und der Zahl kann man auch die Differenz zwischen dem Vielfachen eines Quadrates und der Zahl in Betracht ziehen; es sei diese N-k(a-u)’*‘ und die entsprechende Gleichung N-k (a- a)2=k(2a– a) a+r, wenn N=ka2+r ist. Für die rechte Seite der Gleichung setze man b, dann müssen für jeden Theiler von b die Zahl N und k(a—a)2 gleiche Reste liefern; demnach sind N und k zugleich Reste oder Nichtrcste für diese Theiler. Und werden in der Gleichung N—k(a-a)2=b die Zahl N und der Werth b vertauscht, schreibt man also —b—k(a-a)’=—N‚ so kann man statt b setzen m0’, und man erhält, wenn beide Seiten noch mit k multiplicirt werden, —kmc2 — k’ (a —— a)”: — kN, wo dann — km für jeden Factor von N quadratischer Rest sein muss. —km ist aber die Determinante in der quadratischen Darstellung N =k(a—a)’+ mc”. Also wie vorher folgt schliesslich, dass diese Determinante quadratischer Rest für jeden Factor von N sein muss.

If one writes accordingly the equation $N - (a - 𝛼)^2 = b$, then $N$ must be a quadratic residue for every factor of $b$. Conversely, if one interchanges the places of $N$ and $b$, that is, forms the equation $-b - (a- 𝛼)^2 = -N$, then for every factor of N the value $-b$ must be a quadratic residue. But $b$ can always be represented in the form $mc^2$, and if $-b$ must be a quadratic residue for every factor of $N$, then this also applies to $-m$, since $c^2$ in and of itself is a quadratic residue and thus the second factor $-m$ must also be a residue. If we now write $N = (a- 𝛼)^2 + mc^2$, then $-m$ is known to be the determinant [square-free part?] of the quadratic representation on the right side, and this must therefore be a quadratic residue for every factor of $N$. If the square is smaller than the number [that is, $(a- 𝛼)^2$ is smaller than $N$], in this case the determinant is negative, in the other case positive. [This seems like a good place to insert a paragraph break…] Instead of the difference between simple squares and the number, one can also consider the difference between the multiple of a square and the number; namely $N - k(a- 𝛼)^2$ and the corresponding equation $N - k(a- 𝛼)^2 = k(2a- 𝛼)𝛼 + r$, if $N = ka^2 + r$. For the right side of the equation put $b$, then for every factor of $b$ the number $N$ and $ k(a- 𝛼)^2$ give identical residues, therefore $N$ and $k$ are either both residues or both non-residues for these factors. And in the equation $N - k(a- 𝛼)^2 = b$ if the number $N$ and the value $b$ are interchanged, so that one writes $-b-k(a- 𝛼)^2 = - N$, then one can substitute $mc^2$ for $b$, and one obtains, if both sides are multiplied by $k$, [the equation] $-kmc^2 - k^2(a- 𝛼)^2 = -kN$, where then $-km$ must be for every factor of $N$ a quadratic residue. $-km$ is the determinant in the quadratic representation $N = k(a - 𝛼)^2+ mc^2$. So, as before, it follows that this determinant must be the quadratic residue for every factor of $N$.

[Footnote: * Diese Differenz ist einfacher, als die zuerst von L, Euler vorgeschlagene und stets gebrauchte kN-(a — a)‘, wie sie in einem Schreiben an mich auch noch vor Kurzem von einem namhaften französischen Mathematiker gebraucht wurde.]

[Footnote: * This difference [$N - k(a - 𝛼)^2$] is simpler than the $kN - (a - 𝛼)^2$ first proposed by L. Euler and always used, as recently used in a letter to me by a well-known French mathematician.]

Sind dann nach dem früher angegebenen Verfahren in dem einen oder dem anderen Falle, d. h. ob man die Differenz N— (a-a)’ oder N—k(i1—-u)’ benutzt, mehrere einfache Determinanten gefunden, so sollen zu diesen etwa -1, 2, 5 und 13 gehören. Dementsprechend müssen die Factoren von Ndie Form Sn-l-l besitzen, sie müssen durch 5 dividirt die Reste 1 oder 4 und durch 13 dividirt die Reste 1, 3, 4, 9, 1G, 12 geben; derallgemeine Ausdruck für sie ist mithin Ö20u+ 1, 9, 49, 81, 121, 129, 209, 289, 321, 329, 361, 441. Berechnet man nach und nach die Werthe dieses Ausdrucks, wenn nöthig bis l/N, und schliesst unter Benutzung der übrigen Determinanten die Primzahlen aus, die keine Factoren von N sein können, so wird durch jede Determinante die Zahl der noch möglichen Factoren ungefähr auf die Hälfte gebracht, so dass bei einer Gesammtzahl von r Determinanten die Anzahl der noch verbleibenden 1/2^r von der Anzahl aller Primzahlen, die kleiner als sind, beträgt. Dies Verfahren der Ausschliessung, vermittelst dessen also im gegebenen Falle ermittelt werden kann, ob N eine Primzahl ist, leidet jedoch an Eintönigkeit und erfordert grosse Aufmerksamkeit. Bei den Versuchen mit den einzelnen Determinanten darf man keine Division durch die ihr angehörigen Divisoren ohne Probe lassen, sonst ist man nur dann seiner Sache sicher, wenn man einen Divisor der Zahl N findet. Lassen die gewonnenen Determinanten mit einiger Sicherheit annehmen, dass N eine Primzahl sei, so kann man auch dazu übergehen, Gebrauch von dem Fermaflschen Satze zu machen und zu versuchen, ob x“—’Il(zn0d N) ist. Zu diesem Zwecke ist es aber wieder unbedingt nöthig, N -1 in seine Factoren zu zerlegen. Denn sei N zusammengesetzt und die Anzahl der relativen Primzahlen < N gleich q: (N), so muss ja xWNl E 1 (mod N) sein. Haben in diesem Falle q: (N) und N —l einen gemeinsamen Factor z: und ist schon W51 (mod N ), so ist auch xN”1E1(mod N); aber man hat keine Primzahl vor sich. So ist z. B. 42;] (15), q: (N) = 8 und N— l= I4 haben den gemeinsamen Factor 2, für welchen schon die Congruenz 4’E1(m0d I5) besteht, aus welcher auch 4“ E 1 (mod 15) folgt. Die Factoren von N - l zu bestimmen. ist aber in vielen Fällen nicht viel leichter, als die Factoren von N selbst zu ermitteln.

Then following the procedure given earlier in the one or the other case, e.g. whether one uses the difference $N - (a - 𝛼)^2$ or $N - k (a - 𝛼)^2$, several simple determinants may found, [This is a good place to insert a paragraph break, let alone a sentence break!] say these are $-1$, $2$, $5$ and $13$ respectively. Correspondingly, the factors of N must have the form $8n+1$, they must when divided by $5$ give the remainders $1$ or $4$, and when divided by $13$ the remainders $1, 3, 4, 9, 10, 12$; the general expression for them is therefore $520u + 1, 9, 49, 81, 121, 129, 209, 289, 321, 329, 361, 441$. If one calculates the value of this expression gradually, if necessary, up to $1/\sqrt{N}$, and excludes, using the remaining determinants, the primes which can not be factors of $N$, thus, by every determinant, the number of possible factors is reduced to about half, so that with a total of $r$ determinants, we’re left with $1/2^r$ of the number of primes that are less than $\sqrt{N}$. However, this method of exclusion, by means of which it can therefore be determined in the given case whether $N$ is a prime number, suffers from monotony and requires great attention. In the trials with the individual determinants one must not skip any division by the divisors belonging to it without a trial, otherwise one can be sure of one’s work only if one finds a divisor of the number $N$. If the determinants allow one to assume with some certainty that $N$ is a prime, one may also resort to making use of Fermaf’s theorem and trying to find out whether $x^{N-1} \equiv 1 \pmod N$ is true. For this purpose, however, it is absolutely necessary again to break $N-1$ into its factors. For if $N$ is composite and the number of relative primes $<N$ is equal to $\phi(N)$, then we must have $x^{\phi(N)} \equiv 1 \pmod N$. If, in this case, $\phi(N)$ and $N-1$ have a common factor $\pi$ and if already $x^\pi \equiv 1 \pmod N$, then $x^{N-1} \equiv 1 \pmod N$, but you do not have a prime in front of you. [Wow this is like Pratt certificates! What Lehmer used to call “the converse of Fermat’s little theorem”.] For example, $4^2 \equiv 1\pmod {15}$, $\phi(N)=8$ and $N-1=14$ have the common factor $2$, for which the congruence $4^2 \equiv 1 \pmod {15}$ already exists, from which also $4^{14} \equiv 1 \pmod {15}$. However, determining the factors of $N-1$ is in many cases not much easier than determining the factors of $N$ itself. [This is also the problem with Pratt certificates.]

Gegenüber diesen beiden Methoden ist nun das neue Verfahren ein einfaches und zuverlässiges. Ich nehme zuerst an, dass man Gebrauch macht von der Gleichung $N = (a - 𝛼)^2 + (2a - 𝛼)𝛼+r$. Wenn dann N eine Primzahl ist, so gelingt es immer, zuweilen in ganz kurzer Zeit, alle benutzten Primzahlen entweder direct, oder durch Elimination als Reste für N einzeln hinzustellen. Für die Elimination hat man nur zu beachten, dass eine ungerade Anzahl negativer Determinanten eine negative Determinante liefern; im Uebrigen hat man die gemeinsamen Factoren paarweise wegzustreichen. Ist nun die Menge der Primzahlen <,}/N==n und die Anzahl der benutzten Primzahlen p gleich r, so muss man soviele Primzahlen für die Determinanten benutzen, dass ä wesentlich kleiner als I ist. Das macht gar keine Schwierigkeiten, wenn man einmal annähernd den Werth ä = I erreicht hat. Es kommt aber als nothwendige Bedingung noch hinzu für Zahlen von der Form 8n+ 1, dass — 1 und 2 Reste sind; für 8124-3, dass — 1 Nichtrest und -2 Rest, für Sn-l-Ö, dass —l Rest und 2 Nichtrest, für 8n+ 7, dass — 1. Nichtrest und 2 Rest ist.

Compared to these two methods, the new method is now a simple and reliable one. First, assume that one makes use of the equation $N = (a - 𝛼)^2 + (2a - 𝛼)𝛼+r$. If N is a prime number, then it is always possible, sometimes in a very short time, to represent all the primes used, either directly or by elimination, as residues for N individually. For the elimination one has only to note that an odd number of negative determinants give a negative determinant; besides, one has to strike away the common factors in pairs. If the number of prime numbers $< √N$ is $n$ and the number of used prime numbers $p$ is $r$, one has to use as many primes for the determinants so that $n/2^r$ is much smaller than $1$. This is not much of a problem once one has approximately reached the value $n/2^r$. But, as a necessary condition, it is necessary for numbers of the form $8n + 1$ that $-1$ and $2$ are residues; for $8n+3$, that $-1$ is a non-residue and $-2$ a residue, for $8n+5$, that $-1$ is a residue and $2$ a non-residue, for $8n+7$, that $-1$ is a non-residue and $2$ a residue.

Auf einen Fall muss man besonders merken. Ist nämlich N = 811 + I und aus zwei Factoren 4124-3 zusammengesetzt, so müssen beide zugleich für die gebrauchten Primzahlen Reste oder Nichtreste sein, weil N Rest für sie ist; also ist umgekehrt jedes einzelne p für beide Factoren Rest oder Nichtrest und sämmtliehe gebrauchte Primzahlen treten einzeln mit + oder — 1 als Reste auf. Aber ein Schluss auf N als Primzahl wäre verkehrt. Es fehlt ja in diesem Falle die Determinante — 1 ‚ die entscheidend ist.

You have to remember a case in particular. Namely, if $N = 8n + 1$ and composed of two factors $4n+3$, then both must at the same time be residues or non-residues for the used prime numbers, since $N$ is a residue for them; so, conversely, every single $p$ is residue or non-residue for both factors, and all used prime numbers appear singly with $+$ or $-1$ as residues. But a conclusion that $N$ is a prime number would be wrong. In this case the determinant $-1$, which is crucial, is missing.

Benutzt man die Gleichung N= k (a— a)2+ k (2a — a) a + r, so erhält man alle benutzten p, welche nach dem Reciprocitätsgesetze Reste von N sind, einzeln als Determinanten, diejenigen, welche Nichtreste sind, aber paarweise und zwar in der Weise, dass alle Paare einen Nichtrest als gemeinsamen Factor haben. Sind diese Bedingungen erfüllt und hat man noch —-1 als Rest oder Nichtrest nach den Formen 4-n+1 oder 4104-3, und tritt auch 2 in richtiger Weise als Determinante auf, so ist der Nachweis, dass N eine Primzahl ist, mit voller Sicherheit geführt. Zum Beleg für das Gesagte füge ich zwei Beispiele an, von welchen sich das erste auf eine kleine Zahl, das zweite auf eine elfstellige Zahl bezieht. Erstere habe ich ganz willkürlich herausgegriffen, das zweite Beispiel habe ich erst für die vorliegende Arbeit berechnet, ohne zu wissen, dass ich dasselbe schnell zu Ende führen konnte.

Using the equation $N = k(a - 𝛼)^2 + (2a - 𝛼)𝛼 + r$, [I think the $k$ here is typo and shouldn’t be there] one obtains all the used $p$, according to the reciprocity law, those which are residues of $N$ individually as determinants, and those which are non-residues in pairs but in such a way that all pairs have a non-residue as a common factor. If these conditions are fulfilled and one still has $-1$ as a residue or non-residue for [$N$ being of] the forms $4n+1$ or $4n+3$ respectively, and if $2$ also occurs correctly as a determinant, then the proof that N is a prime number, is obtained with complete confidence. [Another good place for a paragraph break.] As proof of what has been said, I give two examples, of which the first is a small number, the second refers to an eleven-digit number. The former I picked out quite arbitrarily, the second example I have calculated only for the present work, without knowing that I could finish it up quickly .

Für die Zahl 457 nehme man beispielsweise 11:3, da. =+l ist. Subtrahirt man von ihr die Quadrate, deren Wurzel 33/ i 1 ist, so enthält der Rest stets den Factor 3. Für 331i 1 = 1 bis 331i 1 =16 ergeben sich der Reihe nach die Determinanten —-2.3.19, —3.lÖ1, — 1, —3‚ -2.3.17‚ —3.13l, —3.7.17, -—3.7, —29‚ —3.67, also sind Reste von 457 zunächst —1, —3, —29, ferner 151, 131, 7, 17, 2, 19, 67 und es ist ersichtlich, dass 457 eine Primzahl sein muss.

For the number $457$, for example, take $p = 3$, since $\left(\frac{457}{3}\right) = 1$. If one subtracts from it the squares whose roots are $3y ± 1$, the remainder always contains the factor $3$. For $3y ± 1 = 1$ to $3y ± 1 = 16$, in turn one obtains the determinants -2.3.19, -3.151, -1, -3, -2.3.17, -3.131, -3.7.17, -3.7, -29, -3.67, thus the residues of 457 are firstly -1, -3, -29, further 151, 131, 7, 17, 2, 19, 67 and it can be seen that 457 must be a prime number. [This is slick so let me go slower: the set contains -1, -3, -29 by themselves, then these paired with 2.19, 151, 2.17, 131, 7.17, 7, 67, so 151, 131, 7, 67 are residues, and further cancelling them, we’re left with 2.19, 2.17, 17, so 2 and therefore 19 must be residues as well. Nothing is paired anymore; every prime occurs by itself.]
sage: N = 457
....: a = int(sqrt(N))
....: for alpha in range(17):
....:     if alpha % 3 == 0: continue
....:     s = alpha**2 - N
....:     print(alpha, s, factor(squarefree_part(s)))
....:     
(1, -456, -1 * 2 * 3 * 19)
(2, -453, -1 * 3 * 151)
(4, -441, -1)
(5, -432, -1 * 3)
(7, -408, -1 * 2 * 3 * 17)
(8, -393, -1 * 3 * 131)
(10, -357, -1 * 3 * 7 * 17)
(11, -336, -1 * 3 * 7)
(13, -288, -1 * 2)
(14, -261, -1 * 29)
(16, -201, -1 * 3 * 67)
Der zweite Factor von $2^{43}-1$ ist $N=20408568497= (142858-𝛼)^2 + (285716-𝛼)𝛼 + 160333$.

The second factor of $2^{43}-1$ is $N=20408568497= (142858-𝛼)^2 + (285716-𝛼)𝛼 + 160333$.

Man erhält für:

[table goes here, same in English]

One gets for: [Recall that $b = N - (a-𝛼)^2= (285716-𝛼)𝛼 + 160333$, and $D$ is the square-free part of $-b$.]

$𝛼=9$, $b=17.83.44^2$, $D=-17.83$

[etc: putting here in code form because I’m too lazy to copy the table from the paper]
  sage: a = int(sqrt(N))
  ....: printed = 0
  ....: for alpha in [9, 26, -29, 2083, -61, 243, -821, 447, -3599, 2204, 531, -23, -76083, 90666, 873, 1329]:
  ....:     printed += 1
  ....:     b = N - (a - alpha)**2
  ....:     D = squarefree_part(-b)
  ....:     fs = [p for (p, e) in factor(D)]
  ....:     print(printed, alpha, ('-' if D < 0 else '') + '.'.join(str(f) for f in fs))
  ....: 
  (1, 9, '-17.83')
  (2, 26, '-7.31')
  (3, -29, '7.11.17.97')
  (4, 2083, '-17')
  (5, -61, '19.53.67')
  (6, 243, '-2.7.17.19')
  (7, -821, '7.11.19.83')
  (8, 447, '-2.23')
  (9, -3599, '7.11.17.43')
  (10, 2204, '-7.43')
  (11, 531, '-2.7.53')
  (12, -23, '7.19.23.131')
  (13, -76083, '2.17.131')
  (14, 90666, '-11.17.19')
  (15, 873, '-43.61')
  (16, 1329, '-11.97.113')
Zunächst ist +43 oder -43 ein Rest für etwaige Factoren von N, da sie zu dem Binom $2^{43}-1$ gehören. Dann findet man in 4) -17 und erhält aus 1) +83. Verbunden mit 7) und 9) ergiebt sich -19 oder +19, je nachdem 43 positiv oder negativ ist. Aus 10) erhält man ebenso -7 oder +7 und aus 9) +11 oder -11. Benutzt man diese Deteriminanten, so erhält man 14) unter beiden Annahmen +43 oder -43 die Determinante -1. Fährt man in ähnlicher Weise fort, so gewinnt man ausser -1 als Determinanten noch die 15 folgenden: 2, 7, 11, 17, 19, 23, 31, 43, 53, 61, 67, 83, 97, 113, 131. Die Anzahl der Primzahlen $<√N$ ist geringer als 1600; durch die obigen Determinanten ist die Zahl derjenigen, welche Divisoren von N sein könnten, auf 1/2^16=1/65536 zurückgebracht, einen Werth, den man noch beliebig verringern könnte durch Hinzunahme weiterer Primzahlen p, und N ist als Primzahl erwiesen.

First, $+43$ or $-43$ is a residue for any factors of $N$, since they belong to the binomial $2^ {43} -1$ [the correct one happens to be -43]. Then one finds in 4) -17 and obtains from 1) [which was -17.83] +83. With this, together (multiplied), 7) [7.11.19.83 so 7.11.19 now] and 9) [7.11.17.43] give -19 or +19, depending on whether 43 is positive or negative. [This is true too.] From 10) [-7.43] one gets -7 or +7 [respectively, depending on whether it’s +43 or -43 that’s the residue], and from 9) [7.11.17.43] +11 or -11 [the error is here: discarding -17, from the remaining -7.43.11 one actually obtains +11 unconditionally]. If one uses these determinants, one obtains from 14) [-11.17.19] under both assumptions +43 or -43 the determinant -1. [This is false! Alas… this gives the determinant +1, and I think we may actually have a congruence of squares that we can use for factoring.] If one continues in a similar way, then, apart from -1, one still wins the following 15 as determinants: 2, 7, 11, 17, 19, 23, 31, 43, 53, 61, 67, 83, 97, 113, 131.The number of prime numbers $<√N$ is less than 1600 [actually it’s 13252 as pointed out by Lehmer… I suspect Seelhoff accidentally computed $π(π(√N)) = 1575$ instead of $π(√N) = 13252$]; by the above determinants, the number of those who could be divisors of $N$ is returned to $\frac{1}{2^{16}} = \frac{1}{65536}$, a value that could be reduced arbitrarily by adding further prime numbers $p$, and $N$ has proved to be a prime number.

Das Entscheidende liegt eben darin, dass die Determinanten, weil sie einfache Primzahlen sind, vollständig unabhängig von einander sind und dass nicht etwa zwei Factoren existiren können, für welche alle diese Primzahlen zugleich Reste oder Nichtreste sind.

The point is that the determinants, being simple primes, are completely independent of each other, and that there can not exist two factors for which all these primes are at the same time residues or non-residues.

Fasst man das Gesagte zusammen, so ergiebt sich folgendes Resultat: „Wird eine Zahl N dargestellt entweder durch (a- a)2+ b oder k( — a)‘ +1), bestimmt man dann Werthe von a in der Weise, dass b durch eine hinreichende Anzahl von Primzahlen p nach und nach theilbar wird, so erhält man eine Reihe einfacher Determinanten. Erhält man aus diesen direct oder durch Elimination sämmtliche benutzten p als Reste oder Nichtreste von N in der Weise, wie es das Reciprocitätsgesetz verlangt, ist endlich für N=4n+l die Zahl - 1 Rest und für N==4n+3 Nichtrest, so ist N eine Primzahl.“

Summing up what has been said, the result is: “If a number $N$ is represented either by N $(a - 𝛼)^2 + b$ or $k(a - 𝛼)^2 + b$, one then determines the value of $𝛼$ such a way that $b$ becomes divisible by a sufficient number of prime numbers $p$, this gives a series of simple determinants. If we obtain from these either directly or by elimination all the used $p$ used as residues or non-residues of $N$ in this way, as required by the Reciprocity Law, and finally for $N = 4n + 1$ the number $-1$ is a residue and for $N = 4n + 3$ a non-residue, then $N$ is a prime.”

P. Seelhoff

Here ends the paper, and it is unfortunate that a slip here might have made people not take the result seriously — only Cole in 1903 (some 17 years later) and Kraitchik in 1929 (some 43 years after 1886) seem to have followed up, and even Lehmer threw water on that too.

Instead let’s see if we can use the above residues for factoring. We have these residues, with their serial number in the order listed by Seelhoff:

#n, alpha_n, D
(4, 2083, '-17') # So -17
(1, 9, '-17.83') # So 83
(7, -821, '7.11.19.83') # So 7.11.19
(9, -3599, '7.11.17.43') # So -7.11.43 => -19.43 => 19 => 11
(14, 90666, '-11.17.19') # So 11.19 => 7
(10, 2204, '-7.43') # So -43

So far, we’ve got 7, 11, -17, 19, -43 and 83 as residues. In modern notation, let’s write the above with certain primes as the columns:

  |  n |   𝛼_n | 7 | 11 | 17 | 19 | 43 | 83 | -1 |
  |----+-------+---+----+----+----+----+----+----|
  |  4 |  2083 |   |    |  1 |    |    |    |  1 |
  |  1 |     9 |   |    |  1 |    |    |  1 |  1 |
  |  7 |  -821 | 1 |  1 |    |  1 |  1 |    |    |
  |  9 | -3599 | 1 |  1 |  1 |    |  1 |    |    |
  | 14 | 90666 |   |  1 |  1 |  1 |    |    |  1 |
  | 10 |  2204 | 1 |    |    |    |  1 |    |  1 |

Giving the matrix:

sage: A = matrix([
....: [ 0 , 0 , 1 , 0 , 0 , 0 , 1 ],
....: [ 0 , 0 , 1 , 0 , 0 , 1 , 1 ],
....: [ 1 , 1 , 0 , 1 , 1 , 0 , 0 ],
....: [ 1 , 1 , 1 , 0 , 1 , 0 , 0 ],
....: [ 0 , 1 , 1 , 1 , 0 , 0 , 1 ],
....: [ 1 , 0 , 0 , 0 , 1 , 0 , 1 ]
....: ])
sage: A = A.change_ring(GF(2))
sage: A.left_kernel()
Vector space of degree 6 and dimension 0 over Finite Field of size 2
Basis matrix:
[]

Oops, we don’t have a “cycle” yet. Let’s look at more. Reminder: we have 7, 11, -17, 19, -43 and 83 as residues, after “using up” rows 1, 4, 7, 9, 10, 14. What remains:

(2, 26, '-7.31') # So -31 (we won't need this)
(3, -29, '7.11.17.97') # So -97 (we won't need this)
(5, -61, '19.53.67') # So 53.67 => 67 (we won't need this)
(6, 243, '-2.7.17.19') # So 2
(8, 447, '-2.23') # So -23
(11, 531, '-2.7.53') # So -53 (we won't need this)
(12, -23, '7.19.23.131') # So -131
(13, -76083, '2.17.131') # Ah we knew this already!
(15, 873, '-43.61') # So 61
(16, 1329, '-11.97.113') # So 113

Let’s collect all the relevant ones so far:

(4, 2083, '-17') # So -17
(1, 9, '-17.83') # So 83
(14, 90666, '-11.17.19') # So 11.19 (effectively: 19)
(7, -821, '7.11.19.83') # So 7
(10, 2204, '-7.43') # So -43
(9, -3599, '7.11.17.43') # So 11
(6, 243, '-2.7.17.19') # So 2
(8, 447, '-2.23') # So -23
(12, -23, '7.19.23.131') # So -131
(13, -76083, '2.17.131') # Ah we knew this already!

And write it as a table again:

 |  n |    𝛼_n | -1 | 2 | 7 | 11 | 17 | 19 | 23 | 43 | 83 | 131 |
 |----+--------+----+---+---+----+----+----+----+----+----+-----|
 |  4 |   2083 |  1 |   |   |    |  1 |    |    |    |    |     |
 |  1 |      9 |  1 |   |   |    |  1 |    |    |    |  1 |     |
 |  7 |   -821 |    |   | 1 |  1 |    |  1 |    |    |  1 |     |
 |  9 |  -3599 |    |   | 1 |  1 |  1 |    |    |  1 |    |     |
 | 14 |  90666 |  1 |   |   |  1 |  1 |  1 |    |    |    |     |
 | 10 |   2204 |  1 |   | 1 |    |    |    |    |  1 |    |     |
 |  6 |    243 |  1 | 1 | 1 |    |  1 |  1 |    |    |    |     |
 |  8 |    447 |  1 | 1 |   |    |    |    |  1 |    |    |     |
 | 12 |    -23 |    |   | 1 |    |    |  1 |  1 |    |    |   1 |
 | 13 | -76083 |    | 1 |   |    |  1 |    |    |    |    |   1 |

and then as a matrix:

sage: matrix(GF(2), [
....: [ 1 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 ],
....: [ 1 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 1 , 0 ],
....: [ 0 , 0 , 1 , 1 , 0 , 1 , 0 , 0 , 1 , 0 ],
....: [ 0 , 0 , 1 , 1 , 1 , 0 , 0 , 1 , 0 , 0 ],
....: [ 1 , 0 , 0 , 1 , 1 , 1 , 0 , 0 , 0 , 0 ],
....: [ 1 , 0 , 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 ],
....: [ 1 , 1 , 1 , 0 , 1 , 1 , 0 , 0 , 0 , 0 ],
....: [ 1 , 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 ],
....: [ 0 , 0 , 1 , 0 , 0 , 1 , 1 , 0 , 0 , 1 ],
....: [ 0 , 1 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 1 ]
....: ]).left_kernel()
Vector space of degree 10 and dimension 1 over Finite Field of size 2
Basis matrix:
[1 1 1 1 0 1 0 1 1 1]

It’s always embarrassing and feels like something one ought to have done by hand. Anyway, let’s double-check that keeping all but two of the above rows is enough:

 |  n |    𝛼_n | -1 | 2 | 7 | 11 | 17 | 19 | 23 | 43 | 83 | 131 |
 |----+--------+----+---+---+----+----+----+----+----+----+-----|
 |  4 |   2083 |  1 |   |   |    |  1 |    |    |    |    |     |
 |  1 |      9 |  1 |   |   |    |  1 |    |    |    |  1 |     |
 |  7 |   -821 |    |   | 1 |  1 |    |  1 |    |    |  1 |     |
 |  9 |  -3599 |    |   | 1 |  1 |  1 |    |    |  1 |    |     |
 | 10 |   2204 |  1 |   | 1 |    |    |    |    |  1 |    |     |
 |  8 |    447 |  1 | 1 |   |    |    |    |  1 |    |    |     |
 | 12 |    -23 |    |   | 1 |    |    |  1 |  1 |    |    |   1 |
 | 13 | -76083 |    | 1 |   |    |  1 |    |    |    |    |   1 |

Yep, all columns add up to $0$ mod $2$. Now the fun with large numbers begins.

sage: def short_factor(n):
....:     fs = factor(n)
....:     fss = []
....:     for (p, e) in fs:
....:         fss.append('%s' % p if e == 1 else '%s^%s' % (p, e))
....:     return '%s = %s' % (n, ('-' if n < 0 else '') + '·'.join(fss))
....: 
sage: a = int(sqrt(N))
....: for alpha in [2083, 9, -821, -3599, 2204, 447, -23, -76083]:
....:     root = abs(a - alpha)
....:     mb = root**2 - N
....:     print('$%s^2 ≡ %s$' % (root, short_factor(mb)))
....:     
$140775^2 ≡ -590967872 = -2^6·11^2·17·67^2$
$142849^2 ≡ -2731696 = -2^4·11^2·17·83$
$143679^2 ≡ 235086544 = 2^4·7·11^3·19·83$
$146457^2 ≡ 1041084352 = 2^6·7·11·17^3·43$
$140654^2 ≡ -625020781 = -7·11^2·43·131^2$
$142411^2 ≡ -127675576 = -2^3·7^4·17^2·23$
$142881^2 ≡ 6411664 = 2^4·7·19·23·131$
$218941^2 ≡ 27526592984 = 2^3·11^2·17·113^2·131$

namely:

$140775^2 ≡ -590967872 = -2^6·11^2·17·67^2$

$142849^2 ≡ -2731696 = -2^4·11^2·17·83$

$143679^2 ≡ 235086544 = 2^4·7·11^3·19·83$

$146457^2 ≡ 1041084352 = 2^6·7·11·17^3·43$

$140654^2 ≡ -625020781 = -7·11^2·43·131^2$

$142411^2 ≡ -127675576 = -2^3·7^4·17^2·23$

$142881^2 ≡ 6411664 = 2^4·7·19·23·131$

$218941^2 ≡ 27526592984 = 2^3·11^2·17·113^2·131$

Multiplying the squares on the left gives: $265157168639098812516314119677233805390450^2$.

And those on the right: $2^{30} · 7^8 · 11^{12} · 17^8 · 19^2 · 23^2 · 43^2 · 67^2 · 83^2 · 113^2 ·131^4$ = $2358944449737458333250153453420544^2$

And taking gcd of the sum and difference with $N$ gives the factors:

sage: (265157168639098812516314119677233805390450^2 - 2358944449737458333250153453420544^2) % N
0
sage: gcd(265157168639098812516314119677233805390450 - 2358944449737458333250153453420544, N)
2099863
sage: gcd(265157168639098812516314119677233805390450 + 2358944449737458333250153453420544, N)
9719

So the method works, though one can see reasons some would not have been its fans.

There’s an erratum on page 320:

In einer Anmerkung zu dem Artikel XIII der Kleineren Mittheilungen im laufenden Jahrgang S. 174 dieser Zeitschrift gab ich auf Grund einer von mir gemachten Notiz über frühere Analysen an, dass der zweite Factor von 257-1 bis 259-1 eine Primzahl sei. Ich habe nunmehr das in der Mittheilung S. 303 angegebene Verfahren angewandt und gefunden, dass für 247-1 der zweite Factor 59862819377 und für 255-1 der zweite Factor 129728784761 zusammengesetzt sein mussten. In der That fand sich für den erstgenannten das Product 451313264529 und für den zweiten das Product 636120394401. Weiter geht die Zusammensetzung aber nicht. Für die übrigen angegebenen Fälle fand ich volle Bestätigung. Ein solches Uebersehen ist nach dem neueren Verfahren ausgeschlossen.

In a note to article XIII of the Minor Communications in this year’s journal, p. 174, on the basis of a note I have made on earlier analyzes, I indicated that the second factor of $2^{37}-1$ to $2^{59}-1$ was a prime number. I have now applied the method given in the message p. 303 and found that for $2^{47}-1$ the second factor 59862819377 and for $2^{55}-1$ the second factor 129728784761 had to be composite. In fact, the product was found to be 4513.13264529 for the former, and 6361.20394401 for the second. But the composition does not go any further. For the other cases I found full confirmation. Such oversight is excluded by the newer method.

Let’s stop here.

The main things to look at, for his influence on Cole:

Dickson devotes a paragrapy to Seelhoff: here (Vol 1. p. 364. Of Cole’s paper he just says “discussed Seelhoff’s method”.
G. B. Mathews in “Theory of Numbers” volume 1 (1892), devotes about a page (267-268) to Seelhoff’s method, explaining it reasonably clearly, and ending with “especially valuable when m [the number to find quadratic residues of] is very large”. link, link1,

Some further references to Seelhoff in other literature:

Dickson’s “History of the Theory of Numbers: Divisibility and Primality, Volume 1”, on p. 25 refers to his demonstration that $2^{61}-1$ is prime, Cole’s reference to it, his factorization of $M_n$ for $n=37, 47, 53, 59$, and proof that 2^{31}-1$ is prime. link, link, link. On p.48 he refers to his 3 new amicable pairs found, his correction of Bernoulli not factorizing $10^{11} + 1 = 11^2 \cdot 23 \cdot 4093 \cdot 8779$ (?), and on p. 214 a brief mention of Seelhoff’s method of using various quadratic residues of $p$ to solve $x^2 \equiv r \pmod p$. On p. 353 mentions $k\cdot 2^n + 1$ for $k \le 100$.
The same Dickson volume 1, in chapter on factoring methods, refers to Seelhoff’s earlier method (p. 363) in some detail (i.e. an entire paragraph), including “eliminating common factors”; Cole’s paper as simply “F. N. Cole discussed Seelhoff’s method of factoring”. Also p. 377 where Seelhoff gave the factor $5\cdot2^{39} + 1$ of $F_{36}$.
D. H. Lehmer’s “Guide to tables in the theory of numbers” (1941) refers to Seelhoff’s earlier tables
Richard A. Mollin, An Introduction to Cryptography footnote on p. 199 cites Williams (pp 127-129) to say that “Paul Seelhoff (1829-1896), a high school teacher in Mannheim, Germany, had the basic idea for the Quadratic Sieve before Kraitchik, but the import of Kraitchik’s work and the details of development ensure that it deserves to be called Kraitchik’s method”. [This cited book is “Edouard Lucas and Primality Testing”]
Walter William Rouse Ball in Mathematical Recreations and Essays cites Seelhoff for finding $M_{61}$ prime. link
This May 1921 paper on amicable numbers says Seelhoff gave two. Another mention, also David Eugene Smith, Tattersall
Hardy and Wright (p. 16) refer to Seelhoff’s and Pervushin’s discovery of $M_{61}$ being prime. (link). Thomas Heath mentions “the ninth perfect number was found by P. Seelhoff” link, so does Cajori. Also this book. And Oystein Ore. A mention in the interesting “Notes and Queries: A monthly magazine of History, Folk-lore, Mathematics, Mysticism, Art, Science, etc”. Something here
Something about determinants. Another paper (note?) by Seelhoff
Lucas mentions “D’ailleurs, on ne connait pas d’autre demonstration de ce curieux resultat, du a M. Seelhoff, de Breme” – Besides, we do not know any other proof of this curious result, by Mr. Seelhoff, of Bremen”. He seems to be refering to Seelhoff finding the factor $2748779069441 = of $2^{2^36} + 1$. Another reference. Also here with exact reference = here, here with typo.
Lehmer’s (unfortunate for the “list of references” remark) review of Dickson mentions Seelhoff’s tables link
Interestingly, Kraitchik seems to only refer to him in one place, for factor of $F_{36}$.
This seems to contain reviews (mentions) of the French translations of his papers! Oh wait, actually just French translations of titles of his 1886 German (etc.) papers.

Cole (1903)

Finally we come to Cole. Already in 1894 Fauqembergue had declared $M_{67}$ composite, and though Cole doesn’t cite it, he does mention Lucas’s declaration that it was composite. So he would have had some expectation that it wouldn’t be a fruitless search, i.e. there do exist factors to be found.

His 1903 paper On the Factoring of Large Numbers is IMO one of those that are perfectly clear only if you already know what is being explained, but unreadable otherwise. With the above context it makes sense though, and we can do a close reading of the entire text of the paper.

(Read before the American Mathematical Society, October 31, 1903.)

This was in New York.

In resolving a large number $N$ into its prime factors, a table of quadratic remainders of $N$ can be made to render efficient service in several different ways.

Having a list of small quadratic residues mod $N$ helps both in ruling out prime factors (because of the imposed congruence conditions), and in “building” the factors by combining congruences. Cole completely omits how to obtain such a table, but see Seelhoff’s paper (above).

For a twenty-two place $N$, the remainders of the table may be restricted to products of about seventy of the smallest prime numbers available.

As $349$ is the $70$th prime, he’s proposing only keeping a table of quadratic residues that are “$349$-smooth” i.e. those for which all their prime factors are no larger than $349$. (The number $349$ is not very important; just given here concretely as an example.) But actually, perhaps he means $70$ primes for which $N$ happens to be a quadratic residue, so a bigger limit. Either way, there’s a lot of work here already that he’s omitted dwelling on.

If, for example, $N$ is always expressed in the form $N= x^2 \pm a$, with variable $x$ and $a$, the remainders $\mp a$ will contain only those primes of which $N$ is quadratic remainder.

The “for example” is somewhat misleading: what he means is that instead of simply keeping quadratic residues $a \equiv x^2 \pmod N$, i.e. numbers $a$ such that $a - x^2 = kN$ for some $k$, if we in fact (a stronger condition namely $k = 1$) keep numbers $a$ such that $a - x^2 = N$ (where $a$ is a positive or negative number small in magnitude) then we can say that modulo any prime $p$ dividing $a$, as $N \equiv x^2 + a \equiv x^2 \pmod p$, that $N$ is a quadratic residue mod $p$. This imposes conditions on $p$.

By a gradual elimination of common factors from the remainders $\mp a$, we finally obtain a table of remainders not admitting further reduction.

This is the method of Seelhoff: by picking appropriate $x$, we can arrange for $a = x^2 - N$ to have several known small factors (among the first $70$ primes, for example). We see what an enormous reduction in computation the method makes: to find quadratic residues if for $x$ we blindly tried values $m + \alpha$, then even after trying all values $\lvert\alpha\rvert \le 1000000$, we’d only find six $349$-smooth residues:

def isprime(n): return n > 1 and all(n % d for d in range(2, n))
primes = []
p = 2
while len(primes) < 70: 
    if isprime(p): primes.append(p) 
    p += 1
def factor_small(n):
    known_factors = []
    for p in primes:
        while n % p == 0:
            known_factors.append(p)
            n //= p
    return (known_factors, n)

N = 2**67 - 1
m = int(math.sqrt(N))

for alpha in range(-1000000, 1000000):
    fs, rem = factor_small((m + alpha)**2 - N)
    if rem not in [1, -1]: continue
    print(alpha, fs, rem)

namely the following:

$\alpha = -768681$ gives $- 3^{13} \cdot 23 \cdot 61 \cdot 181 \cdot 193 \cdot 239$
$\alpha = -281121$ gives $- 3 \cdot 7 \cdot 13^{2} \cdot 53 \cdot 83 \cdot 97 \cdot 113 \cdot 167 \cdot 239$
$\alpha = -256954$ gives $- 2 \cdot 3 \cdot 13^{2} \cdot 23^{2} \cdot 37^{2} \cdot 157 \cdot 173 \cdot 313$
$\alpha = -122508$ gives $- 2 \cdot 3^{2} \cdot 7 \cdot 13 \cdot 61 \cdot 83 \cdot 89 \cdot 113 \cdot 127 \cdot 281$
$\alpha = 177038$ gives $2 \cdot 3 \cdot 7^{2} \cdot 13^{2} \cdot 97^{2} \cdot 137 \cdot 239 \cdot 281$
$\alpha = 337366$ gives $2 \cdot 7^{2} \cdot 13 \cdot 23 \cdot 37 \cdot 53 \cdot 67 \cdot 71 \cdot 157 \cdot 191$

Performing two million computations of $(m + \alpha)^2 - N$, then removing small factors, and all that for just six results, is too much.

On the other hand, with the smarter method, the very first requirement (that $N$ must be a quadratic residue mod $p$) rules out $37$ of the first $70$ primes, leaving only $33$. For each of those primes $p$, there are only two valid $\alpha$ mod $p$… let’s look at this later.

Other forms for expressing $N$, such as $N = 2x^2 ± b$, are of course often advantageous, according to circumstances.

I think the point is that then $\mp 2b \equiv 4x^2 \pmod N$ is a quadratic residue mod $N$. Seelhoff also mentions taking $N = 3x^2 + b$ if $N \equiv 3 \pmod 8$. The idea seems to be a precursor of using multiple polynomials as in the multiple-polynomial quadratic sieve.

If the final table of remainders consists entirely of the individual primes employed, each with the proper sign + or - , $N$ is undoubtedly a prime number*; in fact a much smaller sequence of prime remainders would suffice to justify this conclusion, the wide range specified above being required only to ensure the success of the preliminary elimination process. If, on the other hand, it is found impossible, on attempt, to isolate the prime numbers as remainders, there is a strong presumption amounting almost to certainty that N is compound.

The footnote is a citation to Seelhoff’s paper. I think the “undoubtedly” here should be taken as “we can make a good guess”, as otherwise it’s not true. See Lehmer’s “fallacious principle” paper for further discussion. (Incidentally, the other Seelhoff paper cited by it, Ein neues Kennzeichen für die Primzahlen, is also worth reading, even though its main idea is wrong, as it contains examples of finding residues and eliminating common factors.)

A digression: I think Lehmer misunderstands the idea to some extent. Though it’s not rigorous, it’s not entirely empty. The point is that when $N$ is prime and $-1$ is not a square mod $N$, then for every prime $p$ at least one of $p$ and $-p$ is a square mod $N$. On the other hand, when $N$ is composite, this happens for at most about half the primes $p$, so if we do see this happening for every $p$, it is quite likely that $N$ is in fact prime.

By the aid of the quadratic remainders of the table it is now theoretically possible to construct the factors of N synthetically.

Incidentally, though Seelhoff had pointed out that we can use these residues to obtain a nontrivial congruence of squares $x^2 \equiv y^2 \pmod N$ with $x \not\equiv \pm y \pmod N$, almost everyone at the time (Cole, Lehmer etc) seems to have been concentrating on the Legendre/Gauss idea of combining the congruence conditions imposed by each residue (I think this is what the Lehmers’ bicycle-chain sieve was also doing). So that’s what he’s talking about here, as he elaborates:

Occasionally the form of these factors is so far known in advance that it is not an over serious task to sift all possible cases up to several millions. In the sifting process each known quadratic remainder reduces the number of possible factors of N by one-half. Thus, with twenty-four remainders at hand, sixteen million numbers would be reduced to one or two.

$2^{24} \approx 16 \text{ million}$.

In the practice a few of the known remainders would be employed to establish linear series $lx + m$ with variable $x$; the other remainders would then be used to sift the values of $x$.

Each quadratic residue gives some constraints of the form $p$ must be one of $m_1, m_2, \dots$ modulo $l$ for some $l$. Combining these for different $l$ gives bigger $l$ but also a big list of $m$s that can get unwieldy to do by hand (that’s why the bicycle-chain sieves that the Lehmers came up with). This is saying that we go up to as much as we can deal with by hand, then we use future constraints to rule out $x$s.

If, however, the smallest factor of N has nine or more places, a much more expeditious method is to employ the known quadratic remainders to reduce $N$ to the form
\[x^2 -y^2 = (\frac12(u+v))^2 - (\frac12(u - v))^2 = uv,\]
the factors u and v being so chosen that their difference $u - v$ is as small as possible.

Finally it appears he’s mentioning the congruence-of-squares method, but actually he’s not! All he’s doing is, instead of considering possible values of $u$ in a factorization $N = uv$, equivalently considering possible values of $u+v$ in a factorization $N = uv$ leading to the above equation. To put it differently: any factorization $N = uv$ can be written as $N = uv = \left(\frac{u+v}{2}\right)^2 - \left(\frac{u-v}{2}\right)^2 = x^2 - y^2$. Not sure what he’s saying about $u-v$ being as small as possible though… it’s not as if we’re spoiled for choice among factorizations of $N$.

The advantage of this method is the following: If the table of quadratic remainders is constructed solely from resolutions $N = x^2 + a$, [then] $u$ and $v$ are quadratic remainders of every prime remainder $r$ of the table,

Not sure what he means here, as with the natural understanding I have, the statement isn’t true. For example, with $u = 11$ and $v = 71$ take $N = uv = 781$. Then writing $N = 28^2 - 3$, we get $r = 3$ as a remainder in the table, but neither $u$ nor $v$ is a quadratic residue mod $r$. (Other examples are $19 \times 47$ with prime $7$, or $43 \times 83$ with prime $31$.) Or if the negative sign is significant, take $u = 19$ and $v= 31$ and write $N = uv = 589$ as $24^2 + 13$, but again neither $u$ nor $v$ is a quadratic residue mod $13$.

It’s possible he may have meant it the other way around: every quadratic residue $r$ modulo $N$ should also be a quadratic residue mod $u$ and $v$. But this neither requires the table to be constructed from $N = x^2 + a$ (rather than the general $kN = x^2 + a$) nor does it require $r$ to be a prime. So its really unclear.

But a clarification by Kraitchik cited in Williams and Shallit seems to be correct: the statement is that if $N = uv$ and $N$ is a quadratic residue mod $r$ (so either $N = x^2 + kr$ or $N = x^2 - kr$ will do; what won’t do is $kN = x^2 + r$) and $(-1)^{(r-1/2)} r$ is a quadratic residue mod $N$ (that is, $r$ if $r$ is $1$ mod $4$ and $-r$ if $r$ is $3$ mod $4$), then both $u$ and $v$ must be quadratic residues mod $r$.

To make this concrete:

let $p$ be a prime that is $1$ mod $4$. Then for any odd number $u$ such that $p$ is a quadratic residue mod $u$, it is true that $u$ is a quadratic residue mod $p$
let $p$ be a prime that is $3$ mod $4$. Then for any odd number $u$ such that $-p$ is a quadratic residue mod $u$, it is true that $u$ is a quadratic residue mod $p$

So in this case for such primes $r$ satisfying either of the conditions above, both $u$ and $v$ must be quadratic residues mod $r$.

and as the product $uv = N$ is known, the symmetric function $u + v$ admits at most $\frac14(r + 3)$ values, mod $r$.

This is something about which Williams and Shallit say “He made the important observation (refined later by Kraitchik) that if a prime $r$ (taken with the proper sign) is a quadratic residue of $N$, then there can be at most $(r+ 3)/4$ possible values of $a$ modulo $r$ satisfying $N \equiv a^2 - b^2 \pmod r$. Notice that this is an improvement on the upper bound of $(r + 1)/2$ mentioned [earlier].”

~~Again, note the important conditions on $r$ mod $4$ just mentioned above. From p. 518 of the Williams and Shallit paper, a few clarifications:~~

~~$p$ must be an odd prime, as it obviously doesn’t apply to $p = 2$ — both $0^2 - 1^2 \equiv 1$ and $1^2 - 0^2 \equiv 1$ so there are always $2$ solutions $a$ to $a^2 - b^2 \equiv 1 \pmod 2$.~~
~~We need $N$ also to be a quadratic residue mod $p$, which is where it matters that $N = x^2 \pm p\cdot\text{something}$ (rather than $kN$ for some $k > 1$).~~
We need not $p$ but $(-1)^{(p-1)/2}p$ to be a quadratic residue mod $N$. That is, for primes that $1$ mod $4$ we need $p$ to be a residue, but for primes that are $3$ mod $4$ we need $-p$ to be a residue, mod $N$.

~~But this is still not enough for the conclusion; e.g. for $N = 19$ and $p = 17$, there are $8$ solutions $a$ (or $9$ including $a=0$) for $a^2 - b^2 \equiv 19 \pmod {17}$. So what’s going on?~~

~~We know that $u$ and $v$ are quadratic residues mod $p$, and we know $uv = N$ too.~~

Forget all that number theory; it’s a simple counting argument about the number of distinct possible $(u + v)$ given their product $uv$. Both $u$ and $v$ come from the same set of $k \approx r/2$ elements. For each fixed product $uv$, there is (at most) one solution for each $u$, so at most $k$ solutions total. Of these, there are at most two in which $u$ and $v$ are the same (this is the place where we use the fact that we’re looking for solutions to $u^2 = N$), and for all the other (up to) $k-2$ solutions, they can be paired as $(u, v)$ and $(v, u)$ having the same value of $(u + v)$. So the number of distinct values of $(u + v)$ possible is at most $(k-2)/2 + 2 = k/2 + 1.$ Further, as $k$ is $(r - 1)/2$, this works out to $(r-1)/4 + 1 = (r+3)/4$.

That Cole was good at combinatorics is shown by his paper on Kirkman parades, among others.

This $(r+3)/4$ principle was elaborated by Kraitchik in pages 150–153 of his chapter 12, starting here https://archive.org/details/thoriedesnombres02krai/page/144

Every known prime remainder $r$ therefore cuts down the number of possible values of $u+v$ to about one-fourth their number. Other limitations on the value of $u+v$ and $u-v$ may also occur to facilitate the process.

Thus, instead of using known squares mod $N$ to impose conditions on its factors $u$ and $v$ modulo various $p$, we can better use this knowledge to impose conditions on $u+v$ modulo various $p$.

This concludes the “theory” part of Cole’s paper, before he turns to $M_{67}$ specifically. I won’t quote part 2 in great detail, except to point out:

the author [Lucas] announces that his calculations indicate that $2^{67}-1$ and $2^{89}-1$ are compound.

This is another reason to be mindful of the impression given by Bell’s account: it was already “known” that $M_{67}$ is not prime — perhaps not with the certainty that a factorization would give (as one needs to trust that another author carried out computations correctly without error), but still, it’s not as if Cole’s factorization was the first flaw to be found in Mersenne’s list, as Bell seems to suggest.

There was also Pervushin’s number (that even Seelhoff wrote a paper on: “the ninth perfect number”).

Then in the third part of the paper, Cole shows how he did the work.

After reading everything, we can summarize it as follows. The (mechanical) “work” (calculations/computations) that Cole did for the factorization can be divided into three phases:

The first phase is finding many small quadratic residues. This can be quite intensive and must have taken quite a lot of effort. The method is Seelhoff’s: for various prime powers $p^k$, try a few $\alpha$ such that when one writes $N = (m - \alpha)^2 + r_\alpha$, then $r_\alpha$ is divisble by $p^k$. (Each of these gives a congruence condition on $\alpha$.) Try picking $\alpha$ satisfying both single ones of these congruences and, if that’s not enough, multiple of them. Try to factorize each $r_\alpha$ you obtain (trying only small prime factors) and “eliminate common factors” i.e. do Gaussian elimination on the exponent vectors to obtain simpler residues.

Seelhoff’s point was that with enough effort, this is all that is needed — eventually you’ll obtain a nontrivial congruence of squares. But no one until Kraitchik had faith in this method. Instead Cole used this method only to find about $24$ small quadratic residues (he missed nine namely $-5$, $11$, $19$, $31$, $43$, $103$, $107$, $131$, $163$, but found sixteen $2$, $-3$, $-7$, $13$, $37$, $41$, $61$, $-67$, $-71$, $89$, $97$, $101$, $-127$, $137$, $173$, $181$ — so he found all the primes $p \le 181$ for which $\pm p$ is a quadratic residue and also $1 \bmod 4$). I suspect that in doing this work he would have already obtained a nontrivial congruence.
The second phase is using the obtained quadratic residues to establish congruence conditions on any potential prime factor of $N$. For example, if $-3$ is a quadratic residue mod $N$, then any factor $u$ of $N$ must be $1 \pmod 3$. And so on. We also obtain two further congruences from the fact that $N = 2^{67} - 1$ namely $u \equiv 1 \pmod {67}$ and $u \equiv \pm 1 \pmod 8$. We can combine a few of these congruences — specifically, the ones for $67$, $8$, and $3$ — to get $u \equiv 1 \text{ or } 1207 \pmod {1608}$, i.e. either $u = 1608x + 1$ for some $x$, or $u = 1608x + 1207$ for some $x$. Then each other known quadratic residue imposes conditions on $x$, e.g. the fact that $-7$ is a quadratic residue mod $u$ implies that $u$ is $1, 2, \text{ or }4 \pmod 7$. If we look modulo $7$ at $u = 1608x + 1 \equiv 5x + 1$, then for this to be one of $1, 2, 4$, correspondingly $x$ must be one of $0, 2, 3$ modulo $7$. Similarly for $u = 1608x + 1207 \equiv 5x + 3$, we get that $x$ must be one of $1, 3, 4$ modulo $7$. Each successive quadratic residue imposes more and more conditions on $x$, until very few valid values remain and we can check whether each of them divides $N$. (This can be a painful process which led to the Lehmers’ bicycle-chain sieves for example.)

Cole used this method to rule out factors of $N$ below 16 million, but beyond that it was too much work to continue with this method.
The third phase, which is Cole’s innovation, is establishing congruence conditions instead on possible values of $u+v$ in $N = \left(\frac{u+v}{2}\right)^2 - \left(\frac{u-v}{2}\right)^2$. This is similar to the second phase above, except that each quadratic residue cuts down the set of possible values by roughly a factor of $4$, instead of a factor of $2$ as above.

From looking at it mod $2^3$, $67^2$, $3^4$, $5$, $7$ and $13$, he obtained that $\frac12(u+v)$ belongs to one of about $48$ linear progressions; e.g. the right one is $132353676x + 1160932384$ (the coefficient of $x$ here is missing a factor of $2$, so I imagine this may have been special already?). Then each successive quadratic residue imposes constraints on $x$ — just looking at five of those residues leaves $x = 287$ as the smallest possible value, which in fact works (though you wouldn’t try to compute the corresponding $u-v$ via a square root until doing a few more tests) and gives a factor.