fredrikj.net / blog /

New partition function record: p(1020) computed

March 2, 2014

The partition function p(n) counts the number of partitions of n, i.e. the number of ways n can be written as a sum of positive integers when disregarding the order of the terms. For example, 4 has the five distinct partitions 4, 3+1, 2+2, 2+1+1, 1+1+1+1, so p(4) = 5. This week, I set a new record by computing the exact number of partitions of n = 1020 = 100,000,000,000,000,000,000 (100 quintillion). The number p(1020) has more than 10 billion digits (11,140,086,260 digits to be precise), starting with

1838176508344882643646057515196394970366128860187133818794921830680916179355851922605087258953579721...

...9597661250174602479861524302262001955970770703287582462984472325700899198905833521126231756788091448

and ending with

The computation was done using git versions of Arb (slightly tweaked) and FLINT, built against MPIR 2.6.0 and MPFR 3.1.1. It took four and a half days on a system with a Xeon E5-2650 CPU and 256 GB of RAM (the "popeye" server at RISC).

Why?

In 1946, banknotes of 1020 Hungarian pengő were issued as a result of hyperinflation. Now, the world finally has the answer to the pressing question: in how many different ways can you make change for a 1020-pengő banknote using stacks of 1-pengő coins?

= +



+



+ 100000000000000000000 1 3 3 99999999999999999993 Illustration: one of the approximately 1.8381765 ×1011,140,086,259 partitions of 1020. Image credits: 1, 2 (CC-BY-SA).

The skeptical scientist might be tempted to verify the result experimentally, by explicitly counting all the possible ways to make change. The pre-war 1-pengő coin has a diameter of 23 mm and a thickness of 1.5 mm. A single vertical stack of 1020 coins requires 16 light-years of headroom, so you would have to choose a location for the experiment that stays clear of various celestial bodies such as Proxima Centauri (4.24 light-years away). Even if you split the stacks up, the total coin volume of 68720 cubic kilometers (assuming a circle-packing ratio of 0.9069) is sufficient to fill Lake Superior five times over and still have change left. The silver content of the coins alone would make the budget for such a project very difficult to justify to many funding agencies. Besides, even using your favorite cryptocurrency instead of physical coins, you would not have time to go through a significant fraction of the (more than) 1011,140,086,259 partitions before the presumed heat death of the universe sets in 10150 years from now.

More seriously, a motivation for computing the partition function is to study congruences. Ramanujan famously discovered the identities $$p(5k+4)\equiv 0 \pmod {5}$$ $$p(7k+5)\equiv 0 \pmod {7}$$ $$p(11k+6)\equiv 0 \pmod {11}$$ when inspecting a table of p(n) for n up to 200, which MacMahon had computed by hand! Today an extensive body of knowledge exists about partition congruences (they are closely related to modular forms), but several important problems are open, and experimental investigations could potentially lead to new insights. With current technology, the most efficient way to determine p(n) modulo a small integer is to compute the full value p(n) and then reduce it.

Calculating the partition function is also a quite nice benchmark problem for number theory software. The basic idea is to compute a sufficiently precise numerical approximation of p(n) from an infinite series (the Hardy-Ramanujan-Rademacher formula), and rounding to the nearest integer. The Hardy-Ramanujan-Rademacher formula states that $$p(n)=\frac{1}{\pi \sqrt{2}} \sum_{k=1}^\infty \sqrt{k}\, A_k(n)\,

\frac{d}{dn} \left(

\frac {1} {\sqrt{n-\frac{1}{24}}}

\sinh \left[ \frac{\pi}{k}

\sqrt{\frac{2}{3}\left(n-\frac{1}{24}\right)}\right]

\right)$$ where $A_k(n)$ is a certain sum over roots of unity with a complicated definition.

This requires arbitrary-precision arithmetic, including the ability to compute $\pi$ and the exponential function $\exp(x)$ to high precision (in this case over 10 billion digits). In fact, the computation requires a small number of arithmetic operations at very high precision and a large number of operations at decreasing precision, with roughly equal time spent doing arithmetic with numbers of all sizes. It also requires doing a lot of arithmetic with word-size integers (modular arithmetic, factoring, GCDs, modular square roots, etc.). The same core operations are used for all kinds of tasks in computational number theory, so optimizing and stress-testing them is useful in a general sense.

Correctness

Since numerical approximations are used, it's a challenge to guarantee that the result actually is correct. For example, past versions of Maple gave the wrong value of p(n) for n = 11269, 11566, 12376, 12430..., presumably due to floating-point roundoff error and/or too few terms in the Hardy-Ramanujan-Rademacher series being used.

To avoid such problems, my implementation uses ball (mid-rad interval) arithmetic to bound the contribution of all rounding errors. The error that results from truncating the infinite series after a finite number of terms is also accounted for (using Rademacher's rigorous error bound). The final numerical result is a ball $[m-\varepsilon, m+\varepsilon]$ guaranteed to contain p(n), and the code checks that this interval contains exactly one integer, which then provably is the correct value. Of course, "provably correct" comes with the usual caveat that there could be bugs in the code (or even hardware errors). In this computation, the distance between the approximation $m$ and the nearest integer was smaller than 10−10, which provides a strong consistency check.

With computations of this magnitude, all kinds of things risk breaking. For example, after dumping p(1020) to a file, reading it back gave a different value! It turns out that the size information written by the GMP/MPIR mpz_out_raw function overflows when the number takes more than 232 bytes. Luckily, mpz_out_raw still writes all the extra bytes, so the correct value could be recovered (just not directly with mpz_inp_raw ).

Software improvements

The computation was done with the same general algorithm as the previous record of p(1019), described in my 2012 paper, but with completely rewritten code for numerical evaluation. The implementation used for the 2011 record (still available in FLINT and easily accessed via Sage) uses floating-point arithmetic along with (incomplete) a priori error bounds. Deriving error bounds this way was cumbersome, and interfered with code and algorithm optimizations (changing the algorithm required doing a new analysis). This was one of my main motivations for developing Arb and reimplementing the partition function there.

The algorithm has time and space complexity O(n0.5+ε), which is softly optimal since p(n) has Θ(n0.5) bits. In theory, p(1020) is therefore 100.5 ≈ 3.16 times more expensive to compute than p(1019). If you account for the nε factor, the time increases by a factor closer to 4 in practice when n increases by a factor 10. Memory increases quite precisely by a factor 100.5. In fact, the computation of p(1020) only required 110 hours and 130 GB of memory, comparable to the 100 hours and 150 GB used for the computation of p(1019) done in 2011. Thus the efficiency (both time and space) has improved by more than a factor three. Computing p(1019) is now about as expensive as computing p(1018) used to be, and so on. Some of the speedup (about 30%) is simply due to using faster hardware, but most of it is due to the following improvements:

I wrote a new high-precision implementation of the exponential function (commit 1, 2), requiring 25% less time and 75% less memory than the old implementation (which called MPFR). This change reduced the memory requirements for the partition function by two thirds, making p (10 20 ) achievable on a system with 256 GB memory without swapping. There is also slightly faster code in Arb for low-precision exponentials.

(10 ) achievable on a system with 256 GB memory without swapping. There is also slightly faster code in Arb for low-precision exponentials. Another speedup came from using a more efficient algorithm to compute roots of unity.

Since ball arithmetic is used instead of a priori error estimates to check that the final numerical approximation is close enough to an integer, the number N of terms in the Hardy-Ramanujan-Rademacher series is chosen slightly less conservatively (further optimization is possible here).

error estimates to check that the final numerical approximation is close enough to an integer, the number of terms in the Hardy-Ramanujan-Rademacher series is chosen slightly less conservatively (further optimization is possible here). I also used the latest version (2.6.0) of MPIR, benefiting from Bill Hart's new Schönhage-Strassen FFT.

Finally, I cut the wall time nearly in half by using two threads (one thread for the first 16 terms in the Hardy-Ramanujan-Rademacher series, and one for all the remaining terms). I kept randomly getting "NaN"s as output when I tried writing multithreaded code 2011, but this time everything seems to work!

Not everything has gotten faster. Computing the error bounds automatically at runtime adds some overhead. The original FLINT implementation also uses hardware floating-point arithmetic and libm transcendental functions for low-precision calculations, which Arb avoids in order to guarantee the error bounds (actually, the libm log is used in one place right now, but in a safe way). These de-optimizations only turn out to make a significant difference for small n, up to 108 or so.

Since 1020 exceeds the maximum value of a 64-bit unsigned integer (264−1 ≈ 1.8 × 1019), I had to change the implementation to take a multiprecision integer as input. This is a trivial edit and doesn't cause any slowdown, because all "index" variables in the algorithm are smaller than some fixed multiple of n0.5 and thus comfortably fit in a single word on a 64-bit system. As a byproduct, the implementation now works for input greater than 232−1 ≈ 4.3 × 109 on 32-bit systems. I tested computing p(1010), p(1011), ..., p(1016) on a 32-bit system, and all came out with the correct value. The fact that nothing broke when n crossed the word size boundary on a 32-bit system provided some reassurance that the word size boundary wouldn't pose a problem on a 64-bit system either. I haven't tested what p(1017) does to a 32-bit system yet; presumably, malloc will fail at some point since the computation requires more than 4 GB of memory.

I will push the latest modifications to the partition function code to the Arb git repository in the near future.

Going further

At the moment, my implementation can't be used to compute p(1021). The limitation is the GMP/MPIR mpz type, which overflows around 40 billion digits. The output would have 35 billion digits, and the internal calculations involve numbers up to twice as large.

To compute bigger values, you would also want better parallelization (the edit-run-debug cycle gets cumbersome when the "run" step starts taking weeks). The "tail" in the Hardy-Ramanujan-Rademacher series parallelizes embarrassingly well, but the first term takes nearly 1/2 of the total CPU time, so by Amdahl's law, it's only possible to gain roughly a factor two by distributing the terms. There is an "easy" way to parallelize the computation of the first term, but it would use much more memory. What you really want is parallel (and memory-efficient) FFT code to split individual bignum multiplications across several cores. You also need multithreaded code for the leaf computations in the binary splitting algorithms for $\pi$ and exp, but that's trivial to implement.

With a more sophisticated implementation, present-day hardware should allow computing numbers at least as large as p(1024) (1.1 trillion digits). The $\pi$ record is 12 trillion digits (set by Shigeru Kondo and Alexander Yee over 94 days in 2013). Computing the partition function to a similar number of digits as $\pi$ requires slightly more memory (or swap space), and 10-20 times more computation time.

Computation statistics

The partition function was computed for a range of powers of ten to get an accurate picture of the efficiency, and to compare results with the previous computation up to n = 1019 (see the paper for the old timings).

n Digits N Mem (MB) Wall (s) CPU (s) T1 (s) T2 (s) Pi (s) Exp (s) 108 11132 3328 2.8 0.037 0.07 0.031 0.032 0.004 0.016 109 35219 9721 3.05 0.237 0.29 0.061 0.227 0.008 0.035 1010 111391 28601 4.89 0.441 0.75 0.307 0.429 0.039 0.184 1011 352269 84632 9.82 1.834 3.29 1.449 1.805 0.168 0.841 1012 1113996 251600 19.82 6.247 10.39 6.169 4.159 1.004 3.475 1013 3522791 750925 61.31 25.637 43.06 25.436 17.435 3.282 15.834 1014 11140072 2248842 202.81 104.07 188.33 103.57 84.346 12.389 66.326 1015 35228031 6754864 553.77 441.603 845.42 404.175 440.417 47.035 263.662 1016 111400846 20343453 1469.73 1614.84 2858.31 1611.65 1244.68 194.188 1099.58 1017 352280442 61413562 4593.42 7398.2 13643.7 6251.35 7389.06 762.067 4167.94 1018 1114008610 185795691 12937.41 23354.6 44330.3 23327.1 20997 2913.12 16171.8 1019 3522804578 563188404 38881.84 87825.3 170934 87742.6 83184.9 10860.9 61796.3 1020 11140086260 1710193158 131071 399206 738425 339610 398959 42394.2 239099

Digits: number of decimal digits in p(n)

N: number of terms used in the Hardy-Ramanujan-Rademacher series

Mem: peak memory usage (resident set) during the computation

Wall: total wall time of the computation

CPU: total CPU time of the computation

T1: time for thread 1 (terms 1 to 16)

T2: time for thread 2 (terms 17 to N)

Pi: time to compute $\pi$ to full precision (T1)

Exp: time to compute $\exp(x)$ to full precision (T1)



Edit: Ondrej Certik has put up some plots of the timing data.

Data about p(1020)

The full number is not currently available for download, as it is 4.6 GB in packed binary format. If you really want a copy, contact me.

First (most significant) 1000 digits:

18381765083448826436460575151963949703661288601871 33818794921830680916179355851922605087258953579721 40223563322445693766015017321098501101538956018970 56750704881757794605360897959220562692957211960097 79165931101694124929410280437321791578706853243973 22343094930548700253436959560776678370296907881368 37733272436267730567242440339288196974095792769740 01925654844091988401959507036970411771619993123017 55657669949970363935278042362879420060693386042760 13262928827958224606797122816605218568004925911126 29907354977685310316372900346545431298736807328236 50133058574692900439957866102150361523427469350908 59399440163148946448629438752730130574754978538031 93701800260362511405139309272932783176699960093172 44722089087499304505113827984945896266787952281116 32461474616064076042784673307459319183309808217053 62780971602345567755068535553357149769564868882234 06599266703753282060975654469606040406400277593325 73615936642247999221376534428162036372745061861599 90865952198639022438268000053417742912214985581755

Last 1000 digits:

78197224010162778696625071481567773251610572738944 98214946475632569813210181967142533195900338936713 00729210224771807814522142198236195554260002554854 55685193690894876043353240892759544284884999526365 17052321124780138610958503741195160905409982613603 99012436018076726035176887245814122589275259511505 76750908721608880594390177630345942812168691609194 42278684582644873256506162684010823277784461258669 33049338620374335600858723383185679297378002523565 35242780532986925919014811158854151260764076718769 12107459394370915408731078173696015862413625559934 64357221260733772591049732717331101157274865785996 97365738341459496183914647648616383023616407003926 60829384832774130551031058595191226473452491491856 86443513570173392572278853185935691354917765496681 36185969884296385252170324208992786041520483060786 38273697959512945079613267334079249909549771893343 24588609360545342374268332247926111640460856581625 95976612501746024798615243022620019559707707032875 82462984472325700899198905833521126231756788091448

Digit distribution (fun fact: radix conversion takes over 3 hours):

Digit Occurrences 0 1114003393 1 1114004671 2 1114044302 3 1113995927 4 1114027762 5 1114020496 6 1113993310 7 1113982144 8 1114004097 9 1114010158

First 1000 hexadecimal digits (total hex length: 9251641382):

116045344298e737ba5da19152a5285e35ca8d47406081d91a 0289604dbb4c5d1ebe513da006ddeaa116a22b76903822c9db 1797a70dcc6a5a6976ce89a7fd4bb8c2db5192b5e7ff04fcc5 a571dd6250eae6be39c2f0c1b22616e171300d16b362ac4fc7 051c6e5a1115c075316c51425c666a84467557ab501e85d193 1499a5f2ba1df7d3f1ad05b6a124275ab9a8ae68770190e559 d1c23b23d6a483cc8d1413a99bb4e4f066a0cd803bed7ab832 a6f2eaa2fbd0fdbf867967b4cd57d4346705746c4a179788d7 4a9ccafac78be12016fd5855afade01dfb9945bdc746f8ee7c 1190dfc06ef7ef7a2b6bf747ff68e6296225e450bb97db4748 32d3eb270c028857f511b197e7b25ae0ab946eaf8aa5b08ce2 68d4a5c633d699ace4f38b9b12ecbdd7be643323f3532a863a e99bfe37416761902a6a5aebc62b9e9e310d87fb3436cf4b1d 0c37c67f7aa6daf052214d6768a0e9ab93f2dab3d98d6e7954 f39475e11b9ca54cad2fac43a4b44353422654a8cf70805bda d66e161a5b0eb3834d24db0d8a7dc9d02a861337ebec308b41 ab8704d940850892f55fddbe15676981f70d4ddc2c713e37c5 cb1417f15322ea5993f268227251fc43f5de135a876fd42fdc 028bc371ec5164bd01b8be057822692c8ca5bafbd08f7f4910 04e294464ec7f4f55a8a38e3656997765b5e19ae774b6cdc9d

Last 1000 hexadecimal digits:

73ba2f6029363349801fc09a4b0273afd1a29296b76867b997 c28461ae194cd690d2f32b73d4e4798f42c27549927d2a4d72 4616b066c4d681f89275601d4a2a74054ca372bf994d590142 a3a8ca2fde857e286fbdca0636f22b315406da61738d94ef9b 758be0447d67e032b7901867143cf9041b63986faab0cec017 ed9c8c26157be06d8eeb44572402db6f4681e9158fea0e97e7 1e760ecbf494c6399b6bb74b8c49872a855f3dc2207a3f1426 ad2c5b0629104e594a4c87810473a9774d592e0981e1a22109 5ba77121fb3cac8dc1346d37dd8ea1a6b2c1c499965621f4fa b8e98e94c0faf557860c74ce133dcc22859363920fb1bbe05c 2d31ae71cf4c7874c1c8dc39b32ad6e077b7805c1dd500f659 5993c92f0d65192b126ba993d596b7c1360c4b38338b1f9092 da283348e016e98b8c50874967dadb892f1d35ce4bd7a1bd95 8f5e62ede86157db408e0264c3bf27f4724c76b1ab1b80f58b 94cd970e119be5eaf6d4a4c2b88f2846e81294e8b98c5cf7fe 924213c507a14ac7f23a86ac3e7c953d5a9e931e28ea50c1ae 77499d3afd1e43653b29fd563045da4f08ff0a47462ec262cc 8a1492b9b6ac7d04d59d5c30c5d8ed60c43882e1662f3ad862 29ee775302d477370b97e7297fbb498915f8e9e2661f839cf1 bc2a572e10f9d1908f4d9f0c219892c3bc6274570698de4a38

Edit: replaced table of limbs with hex digits to avoid confusion.

Value modulo small primes:

2 0 3 0 5 3 7 3 11 10 13 7 17 1 19 9 23 22 29 25 31 26 37 14 41 19 43 38 47 7 53 10 59 6 61 36 67 62 71 53 73 29 79 23 83 63 89 51 97 79 101 52 103 9 107 64 109 43 113 81 127 9 131 96 137 21 139 132 149 132 151 11 157 123 163 129 167 42 173 8 179 161 181 173 191 162 193 106 197 66 199 40 211 24 223 181 227 20 229 111 233 85 239 8 241 7 251 25 257 33 263 202 269 222 271 100 277 264 281 176 283 109 293 190 307 277 311 308 313 217 317 248 331 16 337 233 347 56 349 342 353 79 359 146 367 6 373 294 379 44 383 362 389 103 397 49 401 80 409 63 419 65 421 320 431 318 433 295 439 401 443 54 449 394 457 99 461 281 463 43 467 18 479 273 487 372 491 392 499 453 503 362 509 483 521 30 523 219 541 527 547 254 557 213 563 499 569 420 571 358 577 174 587 332 593 228 599 573 601 334 607 533 613 352 617 263 619 226 631 88 641 361 643 201 647 126 653 520 659 614 661 178 673 171 677 274 683 532 691 58 701 286 709 601 719 716 727 489 733 546 739 610 743 707 751 690 757 495 761 71 769 385 773 237 787 602 797 590 809 293 811 668 821 258 823 394 827 801 829 642 839 288 853 750 857 51 859 730 863 312 877 801 881 335 883 120 887 232 907 355 911 728 919 145 929 15 937 154 941 859 947 186 953 311 967 455 971 872 977 963 983 93 991 572 997 152

Numberwang: yes