G. W. Zumbusch.
Adaptive Parallel Multilevel Methods for Partial Differential
Habilitation, Universität Bonn, 2001.
[ bib ]
In this text we propose a space-filling curve enumeration scheme for the load balancing problem. It is based on the principles of self-similarity and scaling invariance. It provides even multilevel locality, i.e. as much locality on each scale as possible. We introduce the space-filling curve schemes and prove some of the properties of the partitions. The scheme is cheap, deterministic, incremental, can be parallelised and provides acceptable partitions. However, even more striking, it seems to be one of the few partitioning methods where quasi-optimal estimates can be shown. We are able to derive sharp estimates both on the partition and on the multilevel algorithms on the partition, which is more than is known about competing graph partitioning load balancing methods so far. Furthermore, we give a survey of the three main aspects of the efficient treatment of PDEs, that is, discretisation, multilevel solution and parallelisation. We will treat the main features of each of the three distinct topics and cover the historical background and modern developments. We demonstrate how all three ingredients can be put together to give an adaptive and parallel multilevel approach for the solution of PDEs. Error estimators and adaptive grid refinement techniques for ordinary and for sparse grid discretisations are presented. Different types of additive and multiplicative multilevel solvers are discussed with respect to parallel implementation and application to adaptive refined grids. Efficiency issues are treated both for the sequential multilevel methods and for the parallel version by hash table storage techniques. Furthermore, space-filling curve enumeration for parallel load balancing and processor cache efficiency are discussed. We will apply the method to elliptic boundary value problems. We are able to derive estimates for the quality of the partitions by space-filling curves and the load balancing of the numerical algorithms on the grids. Even for adaptive grid refinement within certain ranges we are able to prove that the partitions are quasi-optimal, i.e. the cut sizes of the dual graph are only a constant factor away from optimum independent of the mesh size. Hence we obtain asymptotic optimality of the parallel algorithms. This seems to be remarkable in comparison to graph based heuristics, where little is known about the quality. Furthermore we were able to demonstrate the performance of the method on a range of the world's largest parallel computers, namely ASCI Blue Pacific and a prototype Cray T3E (now presumably at NSA), which are each larger than any non-US system. We complement this data by simulations run on Parnass2, which was the first non-US self-made cluster in the list of the world's largest 500 computers (TOP500). We also demonstrate that this cluster is able to outperform many other commercial parallel computers on a per processor base.