Participant Gerhard Zumbusch
Key words  Sparse Grids, Hyperbolic Crosspoints, Tensor Product, Approximation, Finite Differences, Adaptive Grid Refinement, Dynamic Load-Balancing, Distributed Memory, Message Passing, Space-Filling Curves, Key Based Addressing, Hash Storage
Description

### Sparse Grids

We consider the solution of initial-boundary value problems of partial differential equations on sparse grid discretizations, both parabolic and hyperbolic problems. Sparse grids can be constructed from tensor products of one dimensional spaces, where specific parts of the space are cut off. Based on the hierarchic basis functions (or other pre-wavelet bases), products of functions with a support less than tol are cut off. This is depicted on the right hand side.

The sparse grid construction is related to `hyperbolic crosspoints' and other approximation schemes. We are now interested in the discretization of PDEs. Besides the Galerkin approach (FEM) and the `combination technique' based on multivariate extrapolation, we propose a finite differcence type of discretization. It is based on dimensional splitting and the (nodal to) hierarchical basis transform H. Each differential operator Dj along a direction is discretized with finite differences. It is combined with a hierarchical basis transform and back-transform along all directions but the direction of interest . The global differential operator reads.

A typical stiffness matrix looks like this:

It is not sparse, but a matrix multiply can be implemented in linear time O(n).

In the presence of sigularities of the solution the convergence of standard discretizations is not as rapid as for smooth, regular solutions. One way to overcome this problem is adaptivity. The grid is refined during the computation where indicated by an error indicator:

 solve equation system refine grid adaptively until tolerance

The parallelization of adaptively refined grids in one dimension is simple. Consider a refined grid, the domain is partitioned into intervals of equal workload. Each processor holds the same number of nodes:

The partition can be stored by (p-1) separators (numbers) for p processors. Let us look at the higher dimensional case

We can map the domain to the interval by a space-filling curve. This mapping induces an order of the nodes and elements. We can proceed like in the one dimensional case using (p-1) separators

which results in the following domain decomposition.

Note that each processor is assigned the same work load (number of nodes), but thesubdomain boundaries are sub-optimal. Solving both the partition problem and minmizing the boundaries, which are a measure of communication cost during the parallel computation, is an NP-hard problem.

The main advantages of this partition method are:

• it is fast compared to graph partitioning heuristics
• it runs in parallel
• it requires no administration and no stoarage of processor neighbourhoods. The knowledge of the separators is enough to compute where to find a node and which processor to ask for it.

### Key based implementation:

We need to store a hierarchicaly nested sequence of adapted grids which includes nodes (= degrees of freedom) and their geometric relationship on the grid and their relation to nodes on different grid levels. In contrast to standard data structures based on pointers and trees, we choose a key based addressing scheme and hash storage. A hash table T contains all nodes (on a processor).

Each node can be accessed via its coordinates, which is coded uniquely in its position on the space-filling curve. This unique key is mapped to the hash table via a surjective hash function. Collsions are resolved with chaining in the C++ STL implementation in use.

This results in a statistical constant time access O(1) and a substancial saving in computer memory. An even greater improvement comes from the fact that we do not have to administer pointers, which causes trouble on distributed memory parallel computers, where one has to create and maintain copies of lots of entities otherwise.

### Parallelization:

The components of our adaptive multigrid code have to be parallelized. A single data decomposition is used on the distributed memory computer. However, each component requires a different reatment.

 parallel solve equation system parallel refine grid adaptively re-partition the sparse grid until tolerance

For the iterative solution of the equation system we need matrix multiplies Ax. This can be done in several steps: The operator is defined by . The parallelization is based on a parallel version of H, Di and H-1. The one-dimensional finite difference stencil Di  and the transformation from nodal to hierarchical basis H can be both implemented by a single communication step to fill the necessary ghost nodes, followed by a single computation step:

The inverse transformation H-1 requires some more communication. This can be either done by a communication step at each level, which probably does not scale very well, or alternativly by a single communication step, where some more ghost nodes are involved. Hence some paranet node's values are computed on several processors independently.

The load balancing based on space-filling curves can be implemented as a one stage parallel sort (radix or bucket sort) based on the previous partition. This results in a well balanced parallel sorting itself and a low number and volume of data transfers.

### Results:

Here are some three dimensional results:
regular sparse grids, solution of a convection-diffusion equation, mapped to 8 processors

computing times for the example above, on Parnass2

 nodes 1/h processors 1 2 4 8 16 32 64 81 4 0.03 0.03 0.07 0.08 0.11 225 8 0.12 0.09 0.09 0.11 0.16 0.20 0.20 593 16 0.63 0.41 0.33 0.32 0.38 0.44 0.53 1505 32 3.78 2.29 1.60 1.34 1.26 1.33 1.53 3713 64 22.1 13.3 8.79 6.39 5.17 4.47 4.47 8961 128 68.1 40.7 24.8 16.2 11.9 8.89 7.56 21249 256 201 119 66.1 40.1 28.0 18.6 13.5 49665 512 575 379 169 106 71.6 28.0 114689 1024 1630 275 179 62.6

and on a Cray T3E-600

 nodes 1/h processors 1 4 16 32 64 128 256 512 81 4 0.04 0.06 0.10 0.14 0.16 225 8 0.25 0.20 0.38 0.54 0.57 0.59 593 16 1.31 0.71 0.89 1.24 1.53 1.87 2.34 3.26 1505 32 10.0 3.66 3.03 3.67 5.75 5.28 7.17 9.97 3713 64 52.5 20.1 12.8 11.9 13.1 15.4 20.5 29.9 8961 128 158 58.2 29.6 22.6 20.9 23.0 27.9 40.6 21249 256 473 192 66.6 44.9 67.6 32.2 36.7 94.0 49665 512 1426 396 156 95.4 62.0 48.6 48.2 102.2 114689 1024 4112 125 85.9 68.5

adaptively refined sparse grids, solution of a convection-diffusion equation, adaptive resolution of boundary layers

computing times for the example above, on Parnass2

 nodes 1/h processors 1 2 4 8 16 32 81 4 0.03 0.04 0.05 0.07 0.11 201 8 0.07 0.05 0.05 0.07 0.08 0.09 411 16 0.21 0.13 0.12 0.13 0.17 0.20 711 32 0.78 0.48 0.38 0.36 0.41 0.51 1143 64 2.60 1.49 1.06 0.93 0.92 1.14 1921 128 8.69 5.99 3.70 2.88 2.70 2.83 3299 256 39.3 20.7 13.8 9.62 7.79 7.32 6041 512 177 91.0 56.8 39.5 28.6 22.0 11787 1024 949 525 271 177 138 88.2 22911 2048 1280 761 660 358

Bibliography
• T. Schiekofer, G. Zumbusch, Software Concepts of a Sparse Grid Finite Difference Code. , Proceedings of the 14th GAMM-Seminar Kiel on Concepts of Numerical Software, editor: W. Hackbusch, G. Wittum, Notes on Numerical Fluid Mechanics, Vieweg, 1998
• G. Zumbusch, A Sparse Grid PDE Solver. , Advances in Software Tools for Scientific Computing (Proceedings SciTools '98), editor: H. P. Langtangen, A. M. Bruaset, E. Quak, Springer, 2000, chapter 4 in Lecture Notes in Computational Science and Engineering 10, pp. 133-178
• M. Griebel, G. Zumbusch, Hash-Storage Techniques for Adaptive Multilevel Solvers and their Domain Decomposition Parallelization. , Proceedings of Domain Decomposition Methods 10, editor: J. Mandel, C. Farhat, X.-C. Cai, Contemporary Mathematics 218, AMS, 1998, pp. 279-286
• M. Griebel, G. Zumbusch, Parallel multigrid in an adaptive PDE solver based on hashing. , Proceedings of ParCo '97, editor: E. D'Hollander, G.R. Joubert, F.J. Peters, U. Trottenberg, Elsevier, 1998, pp. 589-599.
• Related projects
• Finite-differences on sparse grids for the solution of pde's
• Finite Difference Schemes on Sparse Grids for Time Dependent Problems
• Hierarchical and adaptive methods for the compression and visualization of large data sets
• Adaptive Parallel Multigrid Methods with Space-Filling Curves and Hash Storage
• Parallel multi-level-particle methods combined with space-filling curves
• Parnass2: A Cluster of PCs
• In cooperation with SFB 256 "Nonlinear Partial Differential Equations"