Pandigital Polydivisible Numbers

I recently read the excellent Things to Make and Do in the Fourth Dimension by Matt Parker. In the second chapter he discusses his attempt to find pandigital polydivisible numbers in different bases. I decided to pick up where he left off.

The book outlines the known values in bases up to 16:

  • Base 2: 1
  • Base 4: 123, 321
  • Base 6: 14325, 54321
  • Base 8: 3254167, 5234761, 5674321
  • Base 10: 381654729
  • Base 12: No solutions
  • Base 14: 9C3A5476B812D
  • Base 16: No solutions

Higher bases were left as an open problem to the reader. Several other readers have given it a go but no more solutions have been found. Perhaps none exist but so far no-one has managed to prove it.

Spoiler alert: I haven't found any more solutions. Nor do I have a proof that they don't exist.

However, I will try to argue that there probably aren't any more.

What Does Polydivisible Mean?

Let's use the base 10 solution as an example. We consider the 9 numbers created by removing digits from the end. The full number is called polydivisible if all of these prefixes are divisible by their lengths:

  • 3 is divisible by 1
  • 38 is divisible by 2
  • 381 is divisible by 3
  • 3816 is divisible by 4
  • 38165 is divisible by 5
  • 381654 is divisible by 6
  • 3816547 is divisible by 7
  • 38165472 is divisible by 8
  • 381654729 is divisible by 9

So when the number is truncated to k digits it must be divisible by k.

As an aside, the longest polydivisible number in base 10 is 3608528850368400786036725, with 25 digits.

What Does Pandigital Mean?

Pandigital refers to a number that has each of the possible digits once. Definitions vary as to whether 0 is required or not. For this particular problem it doesn't matter because the polydivisibility constraint ensures that if 0 were included it would always be the final digit. So, using the base 10 example, we can extend it to 3816547290 to include the zero. The solutions are equivalent no matter which definition we choose, we just add or remove a zero accordingly.

The sections that follow will assume that zero is not included.

Only Even Bases Can Have Solutions

In base 10 it's possible to test whether a number is divisible by 9 just by adding up the digits and checking whether that sum is divisible by 9. There are various ways to prove this but it falls out relatively painlessly if you're happy with modular arithmetic. As an example, consider the number 5342. We can expand out the digits of this number to get:

5342 = [5 × 1000] + [3 × 100] + [4 × 10] + [2 × 1]

As 10 is congruent to 1 modulo 9 we get:

5342 ≡ [5 × 1] + [3 × 1] + [4 × 1] + [2 × 1] (mod 9)

which is just:

5342 ≡ 5 + 3 + 4 + 2 (mod 9)

The result follows because a number is divisible by 9 if and only if it is congruent to 0 modulo 9.

This applies to numbers in any base n, but instead of divisibility by 9 it'd be divisibility by n - 1.

What does this have to do with pandigital polydivisible numbers?

Well, because those numbers are pandigital we know that they will have the digits 1 to n exactly once. For base 10 this gives us a sum of 45. This is divisible by 9, so it follows that any pandigital number in base 10 will be divisible by 9.

Can we extend this to other bases?

The digit sum in base n is ½⁢n⁢(n - 1) and we need to know whether this is divisible by n - 1. At first glance it might appear that it is but the division by 2 puts a spanner in the works. If n is even then everything works out fine but if n is odd we'll end up with a factor of 2 missing.

This is why the list of solutions listed earlier only has solutions for even bases. It is never possible for a pandigital number in an odd base to be divisible by n - 1, so it can't be polydivisible.

Last Digit Divisibility Tests

There are other divisibility rules in base 10. If a number ends in an odd digit then we can immediately rule out divisibility by any even number. There's also divisibility by 5 to consider. These rules are a consequence of 10 being 2 × 5. Similar to the divisibility by 9 example we saw earlier, considering numbers modulo 2 or modulo 5 will immediately discard most of the digits such that only the last digit matters.

From these divisibility rules we can conclude that the fifth digit must be a 5 and the even-placed digits must be even numbers. That leaves four odd digits to go in the four remaining positions. A solution must be of the form oeoe5eoeo, which 381654729 is.

Equivalent rules exist for other bases. For base n the digit d in position k must satisfy gcd(n, d) = gcd(n, k). As an example consider the sixth digit in base 8. So n is 8 and k is 6. The value of gcd(n, k) is 2. If we then check through all the possible values of d we find that gcd(n, d) is only 2 if d is either 2 or 6. So the sixth digit must be either 2 or 6, there's no need to try any other digits. Importantly, the number 4 isn't in that list. That's because in base 8 the 4 has to go in the middle. It's the equivalent of putting the 5 in the middle in base 10. Using gcd turns out to be the appropriate magic to filter down the list of suitable digits in each position.

So in base 8:

  • Digit 1: Must be 1, 3, 5 or 7
  • Digit 2: Must be 2 or 6
  • Digit 3: Must be 1, 3, 5 or 7
  • Digit 4: Must be 4
  • Digit 5: Must be 1, 3, 5 or 7
  • Digit 6: Must be 2 or 6
  • Digit 7: Must be 1, 3, 5 or 7

So the 4 has to go in the middle and the 2 and 6 have to go in positions 2 and 6 in some order. That leaves four other digits to go in the remaining four positions. As mentioned earlier, the solutions are 3254167, 5234761 and 5674321, which are clearly in the required form.

Using gcd filtering on the digits dramatically reduces the size of the search space for a brute-force computer attack on the problem.

Computer Searches

I have personally checked all even bases up to 60, as well as 64, 66 and 72. The code I used is on GitHub. The gcd filtering works best for numbers with more factors. For 58 I had to distribute the search across multiple computers to get it finished in a reasonable length of time. It still took days.

Update 01-Jul-2020: I have now also checked base 70.

  • 58 - No solutions
  • 60 - No solutions
  • 62 - Not yet checked
  • 64 - No solutions
  • 66 - No solutions
  • 68 - Not yet checked
  • 70 - No solutions
  • 72 - No solutions
  • 74 - Not yet checked

The worst case for the gcd filtering is the semi-primes. 46 is a semi-prime but at that stage the numbers are still small enough to be able to test relatively quickly. The next semi-prime after 46 is 58, which is much more difficult to check. 62 is also semi-prime, as is 74.

As we approach the limits of what can be checked by brute force we might start to ask questions like:

  • Are some numbers more likely to yield solutions that others?
  • How many more solutions should we expect to exist?

Obviously a formal proof would be even better but the questions above are a little more within reach.

The search algorithm gradually builds up numbers from left to right. Only digits passing the gcd filter are considered and then the relevant divisibility check is used to decide whether to continue up to the next digit. We also exclude any numbers that use a digit that has already been used.

We can make an estimate of how many candidates will make it to each stage. Let's consider base 10. At a length of 3 we might expect that roughly one third of candidates would be divisible by 3. For some divisors it gets more complicated. For the fourth digit we will only ever be considering even values, so we already know the resulting number is divisible by 2. So while it might seem obvious to reduce the size of the candidate pool by a factor of 4, it would actually only be a factor of 2.

Likewise, the middle digit is always guaranteed. It has to be 5 and the resulting number will always be divisible by 5.

This extends more generally to other bases. The formula to calculate the number of candidates at each stage is:

candidates_next = candidates_previous × possible_digits × gcd k

As an example in base 18:

  • Digit 1: There are 6 digits with gcd(18, d) = 1. So 6 candidates to start with.
  • Digit 2: There are 6 digits with gcd(18, d) = 2. That gives 36 candidates. They will all be even, so are all suitable candidates. Using the formula this is 6 × 6 × gcd ÷ 2, which gives 36.
  • Digit 3: For gcd(18, d) = 3 there are only 2 values for d, 3 and 15. These guarantee divisibility by 3 and the formula takes this into account. 32 × 2 × gcd ÷ 3. That's 72.
  • Digit 4: For gcd(18, d) = 2 we've already used up one of the digits, so there are only 5 digits left. Divisibility by 2 is guaranteed but we need divisibility by 4. So 72 × 5 × gcd ÷ 4, which is 180.
  • Digit 5: gcd(18, d) = 1, so 180 × 5 × 1 ÷ 5, which is 180.
  • Digit 6: gcd(18, d) = 6, so 180 × 2 × 6 ÷ 6, which is 360.

So it continues. When we get to the end we need to make a special case for the final digit. The digit sum guarantees that we'll always have the divisibility we require so all candidates from the previous stage will work.

How good are these estimates? Well, here's a graph that shows the number of candidates at each stage for a given base. One line shows the estimate and the other shows the actual value.


k:
Est:
Real:

Yes, this was largely just an excuse to play with animating SVGs.

The vertical scale changes dramatically depending on the base. The maximum value is shown in the legend and you can edit that value if you'd prefer to keep the scale fixed between bases.

The scale is linear. The multiplicative nature of the estimate calculations does suggest using a log scale but the real values tend to end up at 0, which is not log friendly.

The estimates for the semi-prime bases seem particularly prone to being too large, though the general shape of the curve and the orders of magnitude involved are not significantly different. I have not made any attempt to study the distributions of the errors and how they combine, though I suspect this is the reason for the exaggerated estimates.

In summary, the chart above is intended to illustrate that the estimates are pretty close to the true values.

Finding an Equation

With that in mind we can consider the final value to be the expected number of solutions for a given base. Yes, we've entered smoke, mirrors and hand-waving territory. The usual caveats apply around the use of probability terminology to explore deterministic properties of numbers.

Using those final values as estimates we can create another chart. This shows the even bases from 2 up to 200. The blue dots are the semi-primes.


Base:
Est:

This time the vertical scale is logarithmic and the horizontal lines represent powers of 10. These numbers get small very quickly.

The semi-primes have significantly larger values than the bases around them. The reason is closely related to why they take longer to check: the gcd filtering is not very effective so more candidate numbers make it through to each stage.

You may have noticed that the semi-primes form something close to a straight line. The yellow line makes it hard to miss. That line isn't quite straight though, especially at the left of the chart.

The equation used to generate that yellow line is:

y = π n 3 2 n + 1

The square root sign is somewhat misleading. Writing it that way looks tidier but it hides what's really going on.

The general form of this equation is:

y = c n v w n

The constants have values c = π 2, v = 3 2 and w = 1 2, which combine to give the previous equation. The code I used to calculate those constants can be found on GitHub. Essentially it just uses a few large values for n, a bit of division and some logs.

The Elephant in the Room

Yes, there's a π in that equation.

Initially it wasn't clear to me whether it really was π but after staring at the calculations for a bit it dawned on me that the estimation algorithm is really just calculating factorials, so we might expect π to make an appearance via Stirling's approximation.

For semi-prime bases the estimation algorithm yields the same value as the following formula:

y = 2 n2 -2 n [ ( n 2 - 1 ) ! ] 2 ( n - 2 ) !

This formula is exact. Sure, the values it's calculating are ultimately estimates, but this formula yields the exact values for those estimates.

Stirling's approximation is:

n ! 2 π n ( n e ) n

Plugging that in and cancelling gives:

y π n 2 ( n - 2 ) 2 n + 1

For large n the n - 2 can be replaced with n to match the equation from earlier.

Wrapping Up

So what was the point of all that?

Earlier I posed a couple of questions:

  • Are some numbers more likely to yield solutions that others?
  • How many more solutions should we expect to exist?

Using the age-old technique of drawing a chart and bodging in a wonky line we can now attempt to answer those questions:

  • Generally, as the base gets larger the chance of finding another solution diminishes. However, numbers with fewer factors, such as the semi-primes, are the most likely to yield a solution if there is one.
  • While the 'yellow line' equation only applies to semi-primes, the other bases all lie below that line, so their expected values will all be much smaller than the equation predicts. While it isn't quite a geometric series, the exponential part does dominate, so the expected total number of solutions across the infinitely many remaining bases is very small indeed. The odds of finding another solution are still slightly better than winning most lotteries but that will flip once a few more bases have been ruled out by computer searches.

Of course this proves nothing. My analysis is built on various assumptions and there's plenty of scope for mistakes to creep in. But let's suppose that I haven't totally mucked it up...

It would have been more interesting if the equation had implied that more solutions do exist. But it didn't. I've instead provided an argument, however informal, for why we shouldn't expect any more solutions. It may not be an exciting result but it's the result I got. It's better than just looking at dozens of consecutive zeroes and guessing that everything is zero from then on.

Most of the relevant code is available on GitHub. If you have any further insights to add to this article then feel free to file an issue against the GitHub repository.