Pairwise summation

Innumerical analysis,pairwise summation,also calledcascade summation,is a technique to sum a sequence of finite-precision floating-pointnumbers that substantially reduces the accumulatedround-off errorcompared to naively accumulating the sum in sequence.^[1]Although there are other techniques such asKahan summationthat typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence ofnnumbersx_nworks byrecursivelybreaking the sequence into two halves, summing each half, and adding the two sums: adivide and conquer algorithm.Its worst-case roundoff errors growasymptoticallyas at mostO(ε logn), where ε is themachine precision(assuming a fixedcondition number,as discussed below).^[1]In comparison, the naive technique of accumulating the sum in sequence (adding eachx_ione at a time fori= 1,...,n) has roundoff errors that grow at worst asO(εn).^[1]Kahan summationhas aworst-case errorof roughlyO(ε), independent ofn,but requires several times more arithmetic operations.^[1]If the roundoff errors are random, and in particular have random signs, then they form arandom walkand the error growth is reduced to an average of $O(\varepsilon {\sqrt {\log n}})$ for pairwise summation.^[2]

A very similar recursive structure of summation is found in manyfast Fourier transform(FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.^[2]^[3]

The algorithm

Inpseudocode,the pairwise summation algorithm for anarrayxof lengthn≥ 0 can be written:

s=pairwise(x[1...n])
ifn≤Nbase case: naive summation for a sufficiently small array
s= 0
fori= 1 ton
s=s+x[i]
elsedivide and conquer: recursively sum two halves of the array
m=floor(n/ 2)
s=pairwise(x[1...m]) +pairwise(x[m+1...n])
end if

For some sufficiently smallN,this algorithm switches to a naive loop-based summation as abase case,whose error bound is O(Nε).^[4]The entire sum has a worst-case error that grows asymptotically asO(ε logn) for largen,for a given condition number (see below).

In an algorithm of this sort (as fordivide and conquer algorithmsin general^[5]), it is desirable to use a larger base case in order toamortizethe overhead of the recursion. IfN= 1, then there is roughly one recursive subroutine call for every input, but more generally there is one recursive call for (roughly) everyN/2 inputs if the recursion stops at exactlyn=N.By makingNsufficiently large, the overhead of recursion can be made negligible (precisely this technique of a large base case for recursive summation is employed by high-performance FFT implementations^[3]).

Regardless ofN,exactlyn−1 additions are performed in total, the same as for naive summation, so if the recursion overhead is made negligible then pairwise summation has essentially the same computational cost as for naive summation.

A variation on this idea is to break the sum intobblocks at each recursive stage, summing each block recursively, and then summing the results, which was dubbed a "superblock" algorithm by its proposers.^[6]The above pairwise algorithm corresponds tob= 2 for every stage except for the last stage which isb=N.

Accuracy

Suppose that one is summingnvaluesx_i,fori= 1,...,n.The exact sum is:

S_{n}=\sum _{i=1}^{n}x_{i}

(computed with infinite precision).

With pairwise summation for a base caseN= 1, one instead obtains $S_{n}+E_{n}$ ,where the error $E_{n}$ is bounded above by:^[1]

|E_{n}|\leq {\frac {\varepsilon \log _{2}n}{1-\varepsilon \log _{2}n}}\sum _{i=1}^{n}|x_{i}|

where ε is themachine precisionof the arithmetic being employed (e.g. ε ≈ 10⁻¹⁶for standarddouble precisionfloating point). Usually, the quantity of interest is therelative error $|E_{n}|/|S_{n}|$ ,which is therefore bounded above by:

{\frac {|E_{n}|}{|S_{n}|}}\leq {\frac {\varepsilon \log _{2}n}{1-\varepsilon \log _{2}n}}\left({\frac {\sum _{i=1}^{n}|x_{i}|}{\left|\sum _{i=1}^{n}x_{i}\right|}}\right).

In the expression for the relative error bound, the fraction (Σ|x_i|/|Σx_i|) is thecondition numberof the summation problem. Essentially, the condition number represents theintrinsicsensitivity of the summation problem to errors, regardless of how it is computed.^[7]The relative error bound ofevery(backwards stable) summation method by a fixed algorithm in fixed precision (i.e. not those that usearbitrary-precision arithmetic,nor algorithms whose memory and time requirements change based on the data), is proportional to this condition number.^[1]Anill-conditionedsummation problem is one in which this ratio is large, and in this case even pairwise summation can have a large relative error. For example, if the summandsx_iare uncorrelated random numbers with zero mean, the sum is arandom walkand the condition number will grow proportional to ${\sqrt {n}}$ .On the other hand, for random inputs with nonzero mean the condition number asymptotes to a finite constant as $n\to \infty$ .If the inputs are allnon-negative,then the condition number is 1.

Note that the $1-\varepsilon \log _{2}n$ denominator is effectively 1 in practice, since $\varepsilon \log _{2}n$ is much smaller than 1 untilnbecomes of order 2^1/ε,which is roughly 10^10¹⁵in double precision.

In comparison, the relative error bound for naive summation (simply adding the numbers in sequence, rounding at each step) grows as $O(\varepsilon n)$ multiplied by the condition number.^[1]In practice, it is much more likely that the rounding errors have a random sign, with zero mean, so that they form a random walk; in this case, naive summation has aroot mean squarerelative error that grows as $O(\varepsilon {\sqrt {n}})$ and pairwise summation has an error that grows as $O(\varepsilon {\sqrt {\log n}})$ on average.^[2]

Software implementations

Pairwise summation is the default summation algorithm inNumPy^[8]and theJulia technical-computing language,^[9]where in both cases it was found to have comparable speed to naive summation (thanks to the use of a large base case).

Other software implementations include the HPCsharp library^[10]for theC#language and the standard library summation^[11]inD.

References

^^a ^b ^c ^d ^e ^f ^gHigham, Nicholas J. (1993), "The accuracy of floating point summation",SIAM Journal on Scientific Computing,14(4): 783–799,CiteSeerX10.1.1.43.3535,doi:10.1137/0914050
^^a ^b ^cManfred Tasche and Hansmartin ZeunerHandbook of Analytic-Computational Methods in Applied MathematicsBoca Raton, FL: CRC Press, 2000).
^^a ^bS. G. Johnson and M. Frigo, "Implementing FFTs in practice,inFast Fourier Transforms,edited byC. Sidney Burrus(2008).
^Higham, Nicholas (2002).Accuracy and Stability of Numerical Algorithms (2 ed).SIAM. pp. 81–82.
^Radu Rugina and Martin Rinard, "Recursion unrolling for divide and conquer programs,"inLanguages and Compilers for Parallel Computing,chapter 3, pp. 34–48.Lecture Notes in Computer Sciencevol. 2017 (Berlin: Springer, 2001).
^Anthony M. Castaldo, R. Clint Whaley, and Anthony T. Chronopoulos, "Reducing floating-point error in dot product using the superblock family of algorithms,"SIAM J. Sci. Comput.,vol. 32, pp. 1156–1174 (2008).
^L. N. Trefethen and D. Bau,Numerical Linear Algebra(SIAM: Philadelphia, 1997).
^ENH: implement pairwise summation,github /numpy/numpy pull request #3685 (September 2013).
^RFC: use pairwise summation for sum, cumsum, and cumprod,github /JuliaLang/julia pull request #4039 (August 2013).
^https://github /DragonSpit/HPCsharpHPCsharp nuget package of high performance C# algorithms
^"std.algorithm.iteration - D Programming Language".dlang.org.Retrieved2021-04-23.

[Higham93-1] ^^a ^b ^c ^d ^e ^f ^gHigham, Nicholas J. (1993), "The accuracy of floating point summation",SIAM Journal on Scientific Computing,14(4): 783–799,CiteSeerX10.1.1.43.3535,doi:10.1137/0914050

[Tasche-2] Manfred Tasche and Hansmartin ZeunerHandbook of Analytic-Computational Methods in Applied MathematicsBoca Raton, FL: CRC Press, 2000).

[JohnsonFrigo08-3] S. G. Johnson and M. Frigo, "Implementing FFTs in practice,inFast Fourier Transforms,edited byC. Sidney Burrus(2008).

[4] Higham, Nicholas (2002).Accuracy and Stability of Numerical Algorithms (2 ed).SIAM. pp. 81–82.

[5] Radu Rugina and Martin Rinard, "Recursion unrolling for divide and conquer programs,"inLanguages and Compilers for Parallel Computing,chapter 3, pp. 34–48.Lecture Notes in Computer Sciencevol. 2017 (Berlin: Springer, 2001).

[Castaldo08-6] Anthony M. Castaldo, R. Clint Whaley, and Anthony T. Chronopoulos, "Reducing floating-point error in dot product using the superblock family of algorithms,"SIAM J. Sci. Comput.,vol. 32, pp. 1156–1174 (2008).

[7] L. N. Trefethen and D. Bau,Numerical Linear Algebra(SIAM: Philadelphia, 1997).

[8] ENH: implement pairwise summation,github /numpy/numpy pull request #3685 (September 2013).

[9] RFC: use pairwise summation for sum, cumsum, and cumprod,github /JuliaLang/julia pull request #4039 (August 2013).

[10] ttps://github /DragonSpit/HPCsharpHPCsharp nuget package of high performance C# algorithms

[11] "std.algorithm.iteration - D Programming Language".dlang.org.Retrieved2021-04-23.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]