Cats2D Multiphysics > Overview > Solver performance

Cats2D solver performance

The Cats2D frontal solver, written by Ralph Goodwin, uses nested dissection with multi-threaded CBLAS operations to achieve frightening speed with extraordinary economy of memory usage. Here is a fine slide presentation he put together explaining how it works.

Shown below are wall clock times per Newton iteration (top left) and modified Newton iteration (top right), maximum memory usage (bottom left), and speedup ratio of new to old solver (bottom right), for the classic lid-driven cavity problem (Re = 100, square mesh of uniform Q2/P-1 finite elements) solved on a 3 GHz Intel Core i7 MacBook Pro laptop (Retina, 13-inch, Mid 2014).

Solver performance plots

Note that wall clock time for a Newton iteration computed by the new Cats2D scales nearly linearly (power law slope 1.11) with problem size, compared to second order (slope 1.89) with the old Cats2D, up to 1 million degrees of freedom. The power law slopes of modified Newton steps and memory usage fall even closer to linear, 1.07 and 1.06, respectively. Extending these curves to 4 million unknowns barely degrades this scaling, raising it slightly to 1.16 for the Newton iteration, and not at all for the other curves.

The sharp rise in wall clock time for a modified Newton step using the old Cats2D (upper right plot) is caused when the code goes out of core because memory usage exceeds available DRAM, in this case 10 Gb. This doesn't happen using the new Cats2D because its memory usage has been reduced by 89% to 1.1 Gb for this case.

At 110,802 unknowns (100x100 elements), a mere 0.66 seconds is required per Newton iteration. At 992,402 unknowns (300x300 elements), 9.1 seconds does the job. At 3,964,802 unknowns (600x600 elements), it takes only 45 seconds to solve this huge problem, and it needs less than 8 Gb of memory to do it. These are full and accurate factorizations by Gaussian elimination on a 2-core laptop. Compare your solver today.

Fast code beats fast machines.