The research path followed these steps:
First tests trying approaches on numbers 2-100,000
Language | Description | CPU Utilization | Time (min:sec) |
Rexx | External Factoring Program | 9% | 15:9 |
Rexx | Internal Factoring Brute Force | 100% | 0:25 |
PHP | External Factoring Program | 9% | 3:15 |
PHP | Internal Factoring Brute Force | 100% | 0:5.6 |
Java | Internal Factoring Brute Force | 100% | 0:1.8 |
Java | Internal Factoring Table of Primes(1) | 100% | 0:1.15 |
PHP | Internal Factoring Table of Primes(2) | 100% | 34:18 |
(1)Table of primes up to 10,000,000 supporting factoring numbers up to 1014
(2) Only on numbers 2-1000. PHP support for large arrays such as a table of primes is very slow.
Java Using table of Primes: Using a table of primes is most valuable with larger numbers. Summary of tests proving how valuable using Java:
Case | Range | Time (min:sec) |
PHP | 2-1,000,000 | 1:16 |
Java Table of Primes | 2-1,000,000 | 0:7.9 |
Java Brute Force | 1,000,000-1,500,000 | 0:7.8 |
Java Table of Primes | 1,000,000-1,500,000 | 0:5.3 |
Table of Minutes per Million Numbers checked using 1.8Gz Processor.
Language | Range(millions) | Minutes/Million | |
PHP | 1-11 | 2.08 | |
PHP | 500-600 | 9.52 | |
PHP | 1500-1750 | 15.67 | |
Java | 1-11 | 0.14 | |
Java | 2500-2750 | 1.02 | |
Java | 6250-7250 | 1.48 | |
Java | 9660-9999 | 1.71 | |
Java | 10000-10100 | 1.34(1) |
(1) Optimized version of factoring.
Java code was instrumented to show time spent factoring versus determining if the number is a Hahn number.
Range | Time Factoring(sec) | Time determining if Hahn |
2-1,000,000 | 5.2 | 4.87 |
1,000,000-2,000,000 | 6.7 | 5.21 |
5,000,000,000-5,001,000,000 | 85.004 | 6.05 |
Conclusion: With larger numbers, only the factoring time grows significantly.
To obtain the speed in Java, "long" types were used, rather than arbitrary length arithmetic.
The largest number which can be processed using "long" types is: 264 -1 or approx 9.22 * 1018.
Different optimizations were used.
Optimization | Effectiveness |
Use a file of primes up to SQRT of largest number planned. Put this is a program array. There are 664579 primes less than 10,000,000 which supports factoring numbers up to 14 digits.(1) | Order of magnitude or more improvement. |
Each time a factor is found, see if its digits are in the number being tested. If not, then the number is not a Hahn number so stop factoring. | Significant, even with large numbers. 10% or more of numbers tested avoid full factoring. |
During factoring, when a power of 2 or more is found, see if that digit(s) are in the number. If not, the number is not a Hahn number so stop factoring. | 5% or more improvement is run time. |
Before even dividing by a prime, check if digits of the prime are in the number. If not, then even if this prime is a factor, the number won't be a Hahn number. This check can be fast using bitmaps.(2)(3) | Several % improvement in performance. |
1. To use this method for all Java long
(8 byte
integers) numbers requires primes up 3,037,000,493. There are
146,144,318 such primes. An array of this many numbers is practical
and its use is far better than "brute force" (dividing by every odd
number).
2. Using the algorithm runs the risk that we don't correctly factor a number and get a false positive Hahn number. Thus when find a Hahn number, we have to go back and recheck by factoring without this optimization.
3. Using this algorithm with small numbers is self-defeating because small factors are so common. Experimentation said that it was optimal to use this only for factors more than about 27000.
Using Java with multi-core systems, experiments were run with both multiple processes and multiple threads. The advantage of Threads in Java is that they share memory. In particular, this means only one copy of the array of primes is in memory. Total RAM is never stressed so this is not an issue. However, it was discovered that on some machines, the threaded version ran up to twice as fast as the multiple process version. It is believed, but not proven, that the difference is in CPU cache invalidation. However, it was not possible to predict which systems preferred Threads by their different cache sizes.
Factoring using an in-memory array of primes uses the algorithm:
0. Start with the first prime.