Computing Challenges and Opportunities

  1. Factoring is a numeric computing process. Thus, numeric speed is important.
  2. Factoring requires working with numbers, determining if a Hahn number works with digits.
  3. Determine if a number is a Hahn number is most straightforward using character processing treating digits as characters. (Could have been done using many divides to get digits as numbers).
  4. Problem is ready-made for parallel processing - searching one range of numbers for Hahn numbers is independent of searching a different range.

Research Path

The research path followed these steps:

  1. "Quick" program in convenient language with good text processing (REXX)

  2. Investigate using external, optimized, factoring program

  3. Re-write in PHP, also with no types so automatic number/string conversion.

  4. Run on quad-core server to exploit parallel processing

  5. Compute many numbers but became slower with large numbers

  6. Tried Java which has strong typing so more difficult to program. But 10 times as fast.

  7. Found all numbers up to 9,999,999,999 (plus some)

  8. Wrote analysis program analyze results

  9. Refined Java code but only modest speed improvements.

Computational Results

First tests trying approaches on numbers 2-100,000
LanguageDescriptionCPU UtilizationTime (min:sec)
RexxExternal Factoring Program9%15:9
RexxInternal Factoring Brute Force100%0:25
PHPExternal Factoring Program9%3:15
PHPInternal Factoring Brute Force100%0:5.6
JavaInternal Factoring Brute Force100%0:1.8
JavaInternal Factoring Table of Primes(1)100%0:1.15
PHPInternal Factoring Table of Primes(2)100%34:18

(1)Table of primes up to 10,000,000 supporting factoring numbers up to 1014

(2) Only on numbers 2-1000. PHP support for large arrays such as a table of primes is very slow.

Java Using table of Primes: Using a table of primes is most valuable with larger numbers. Summary of tests proving how valuable using Java:
CaseRangeTime (min:sec)
PHP2-1,000,0001:16
Java Table of Primes2-1,000,0000:7.9
Java Brute Force1,000,000-1,500,0000:7.8
Java Table of Primes1,000,000-1,500,0000:5.3

Effect of Large Numbers

Table of Minutes per Million Numbers checked using 1.8Gz Processor.
LanguageRange(millions)Minutes/Million
PHP1-112.08
PHP500-6009.52
PHP1500-175015.67
Java1-110.14
Java2500-27501.02
Java6250-72501.48
Java9660-99991.71
Java10000-101001.34(1)

(1) Optimized version of factoring.

Instrumented Program

Java code was instrumented to show time spent factoring versus determining if the number is a Hahn number.
RangeTime Factoring(sec)Time determining if Hahn
2-1,000,0005.24.87
1,000,000-2,000,0006.75.21
5,000,000,000-5,001,000,00085.0046.05

Conclusion: With larger numbers, only the factoring time grows significantly.

Perfomance by CPU type

Computational Limitation Facts

To obtain the speed in Java, "long" types were used, rather than arbitrary length arithmetic.

The largest number which can be processed using "long" types is: 264 -1 or approx 9.22 * 1018.

Optimization

Different optimizations were used.
OptimizationEffectiveness
Use a file of primes up to SQRT of largest number planned. Put this is a program array. There are 664579 primes less than 10,000,000 which supports factoring numbers up to 14 digits.(1) Order of magnitude or more improvement.
Each time a factor is found, see if its digits are in the number being tested. If not, then the number is not a Hahn number so stop factoring.Significant, even with large numbers. 10% or more of numbers tested avoid full factoring.
During factoring, when a power of 2 or more is found, see if that digit(s) are in the number. If not, the number is not a Hahn number so stop factoring. 5% or more improvement is run time.
Before even dividing by a prime, check if digits of the prime are in the number. If not, then even if this prime is a factor, the number won't be a Hahn number. This check can be fast using bitmaps.(2)(3) Several % improvement in performance.

1. To use this method for all Java long (8 byte integers) numbers requires primes up 3,037,000,493. There are 146,144,318 such primes. An array of this many numbers is practical and its use is far better than "brute force" (dividing by every odd number).

2. Using the algorithm runs the risk that we don't correctly factor a number and get a false positive Hahn number. Thus when find a Hahn number, we have to go back and recheck by factoring without this optimization.

3. Using this algorithm with small numbers is self-defeating because small factors are so common. Experimentation said that it was optimal to use this only for factors more than about 27000.

Threads vs. Processes

Using Java with multi-core systems, experiments were run with both multiple processes and multiple threads. The advantage of Threads in Java is that they share memory. In particular, this means only one copy of the array of primes is in memory. Total RAM is never stressed so this is not an issue. However, it was discovered that on some machines, the threaded version ran up to twice as fast as the multiple process version. It is believed, but not proven, that the difference is in CPU cache invalidation. However, it was not possible to predict which systems preferred Threads by their different cache sizes.

Factoring Algorithm

Factoring using an in-memory array of primes uses the algorithm:

0. Start with the first prime.

  1. See if number divisible by current prime.
    1. If not divisible, go to next prime. If that prime more than square root of number, done.
    2. If divisible, save factor, return to step 1.