# HG changeset patch # User Arun Giridhar # Date 1633482588 -32400 # Node ID 39a4ab124fd09cce50c787b370c172b2add087e0 # Parent b00ff462e0f291bca599b4c8607080a001075dc9 factor.m: significant speedup for large input quantities (> 1e14) (bug #61129). - scripts/specfun/factor.m: overhaul function for performance. - NEWS: announce speedup. Co-authored-by: Michael Leitner diff -r b00ff462e0f2 -r 39a4ab124fd0 NEWS --- a/NEWS Fri Oct 01 15:25:51 2021 -0400 +++ b/NEWS Wed Oct 06 10:09:48 2021 +0900 @@ -76,6 +76,12 @@ same number of digits for each value such as `[0x00_00_01; 0x00_01_00; 0x01_00_00]`. +- The `factor` function has been overhauled for speed. Especially, for +large input quantities (> 1e14) with small prime factors, e.g. +`intmax ("uint64")`, the function can be up to 10,000 times faster. +However, for large input quantities with large prime factors, the function +can be up to 4x faster. + - The `mldivide` function (i.e., the `\` operator) now uses an LU decomposition to solve nearly singular full square matrices. This is Matlab-compatible and yields results which more nearly minimize `norm diff -r b00ff462e0f2 -r 39a4ab124fd0 scripts/specfun/factor.m --- a/scripts/specfun/factor.m Fri Oct 01 15:25:51 2021 -0400 +++ b/scripts/specfun/factor.m Wed Oct 06 10:09:48 2021 +0900 @@ -37,7 +37,7 @@ ## @code{prod (@var{pf} .^ @var{n}) == @var{q}}. ## ## Implementation Note: The input @var{q} must be less than @code{flintmax} -## when the input is a floating point class (double or single). +## when the input is a floating-point class (double or single). ## @seealso{gcd, lcm, isprime, primes} ## @end deftypefn @@ -63,26 +63,109 @@ error ("factor: Q too large to factor (> flintmax)"); endif + ## The basic idea is to divide by the prime numbers from 1 to sqrt(q). + ## But primes(sqrt(q)) can be very time-consuming to compute for q > 1e16, + ## so we divide by smaller primes first. + ## + ## This won't make a difference for prime q, but it makes a big (100x) + ## difference for large composite q. Since there are many more composites + ## than primes, this leads overall to a speedup. + ## ## There is at most one prime greater than sqrt(q), and if it exists, ## it has multiplicity 1, so no need to consider any factors greater - ## than sqrt(q) directly. [If there were two factors p1, p2 > sqrt(q), - ## then q >= p1*p2 > sqrt(q)*sqrt(q) == q. Contradiction.] - max_p = feval (cls, sqrt (q)); # restore cls after sqrt conversion to double - p = primes (max_p); - pf = []; - while (q > 1) - ## Find prime factors in remaining q. - p = p(rem (q, p) == 0); - if (isempty (p)) - ## Can't be reduced further, so q must itself be a prime. - p = q; - endif - pf = [pf, p]; + ## than sqrt(q) directly. If there were two factors p1, p2 > sqrt(q), then + ## + ## q >= p1*p2 > sqrt(q)*sqrt(q) == q, + ## + ## which is a contradiction. + ## + ## The following calculation of transition and number of divisors to use + ## was determined empirically. As of now (October 2021) it gives the best + ## overall performance over the range of 1 <= q <= intmax ("uint64"). + ## + ## For future programmers: check periodically for performance improvements + ## and tune this transition as required. Trials that didn't yield success + ## in (October 2021): + ## + ## 1.) persistent smallprimes = primes (FOO) + ## + ## For various fixed FOO in the range 10 <= FOO <= 10e6. + ## (FOO is independent of q.) The thought had been that making it + ## persistent would cache it so it didn't need to be recomputed for + ## subsequent calls, but it slowed it down overall. It seems calling + ## primes twice with smaller q is still faster than one persistent + ## call for a large q. + ## + ## 2.) smallprimes = primes (q ^ FOO) + ## + ## For various values of FOO. For FOO >= 0.25 or FOO <= 0.16, the + ## performance is very poor. FOO needs to be in the 0.17 to 0.24 range, + ## somewhat. Benchmark experiments indicate it should increase gently + ## from 0.18 to 0.21 as q goes from 10^11 to 10^18. + ## + ## But putting in such an expression would require calculating the log + ## of q, which defeats any performance improvement. Or a step-wise + ## approximation like: + ## + ## foo = 0.18 + 0.01 * (q > 1e12) + 0.01 * (q > 1e14) ... + ## + 0.01 * (q > 1e16); + ## smallprimes = primes (feval (cls, q^foo)); + ## + ## where the RHS of foo would go from 0.18 to 0.21 over several orders + ## of magnitude without calling the log. Obviously that is overly + ## empirical, so putting in q^0.2 seems to be the most robust overall + ## for 64-bit q. - ## Reduce q. - q /= prod (p); - endwhile - pf = sort (pf); + ## Lookup table for sufficiently small values for q. + if (q < 10e9) + ## Lookup, rather calling up to primes(100) is about 3% faster, than the + ## previous value of primes(30). Same for very small q < 1e6. + ## + ## For 1e9 < q < 10e9 the lookup approach is about 7% faster. + + smallprimes = feval (cls, ... + [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]); + + ## Only for really small values of q, statements like + ## + ## smallprimes(smallprimes > q) = []; + ## + ## are relevant and slow down significantly for large values of q. + else + # For sufficiently large q, go up to the 5th root of q for now. + smallprimes = primes (feval (cls, q^0.2)); + endif + + ## pf is the list of prime factors returned with type of input class. + pf = feval (cls, []); + [pf, q] = reducefactors (q, pf, smallprimes); + + ## pf now contains all prime factors of q within smallprimes, including + ## repetitions, in ascending order. + ## + ## q itself will be divided by those prime factors to become smaller, + ## unless q was prime to begin with. + + ## Now go all the way to sqrt(q), where q is smaller than the original q in + ## most cases. + ## + ## Note: Do not try to weed out the smallprimes inside largeprimes, whether + ## using length(smallprimes) or max(smallprimes) -- it slows it down! + largeprimes = primes (sqrt (q)); + [pf, q] = reducefactors (q, pf, largeprimes); + + ## At this point, all prime factors <= the sqrt of the original q have been + ## pulled out in ascending order. + ## + ## If q = 1, then no further primes are left. + ## If q > 1, then q itself must be prime, and it must be the single prime + ## factor that was larger than the sqrt of the original q. + if (q > 1) + pf(end+1) = q; + endif + + ## At this point, all prime factors have been pulled out of q in ascending + ## order. There is no need to sort(pf). ## Determine multiplicity. if (nargout > 1) @@ -93,6 +176,26 @@ endfunction +function [pf, q] = reducefactors (qin, pfin, divisors) + pf = pfin; + q = qin; + ## The following line is a few milliseconds faster than + ## divisors (mod (q, divisors) ~= 0) = []; + divisors = divisors (mod (q, divisors) == 0); + + for pp = divisors # for each factor in turn + ## Keep extracting all occurrences of that factor before going to larger + ## factors. + ## + ## Note: mod() was marginally faster than rem(), when assessed over 10e6 + ## trials of the whole factor() function. + while (mod (q, pp) == 0) + pf(end+1) = pp; + q /= pp; + endwhile + endfor +endfunction + ## Test special case input %!assert (factor (1), 1)