Research Group of Prof. Dr. M. Griebel
Institute for Numerical Simulation
[1] J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. Computing, 67(3):225-253, 2001. also as SFB 256 Preprint 675, Institut für Angewandte Mathematik, Universität Bonn, 2000.
bib | .pdf (link) | http | .ps.gz 1 ]
We present a new approach to the classification problem arising in data mining. It is based on the regularization network approach but, in contrast to the other methods which employ ansatz functions associated to data points, we use a grid in the usually high-dimensional feature space for the minimization process. To cope with the curse of dimensionality, we employ sparse grids. Thus, only O(hn-1 nd-1) instead of O(hn-d) grid points and unknowns are involved. Here d denotes the dimension of the feature space and hn = 2-n gives the mesh size. To be precise, we suggest to use the sparse grid combination technique where the classification problem is discretized and solved on a certain sequence of conventional grids w ith uniform mesh sizes in each coordinate direction. The sparse grid solution is then obtained from the solutions on these different grids by linear combination . In contrast to other sparse grid techniques, the combination method is simpler to use and can be parallelized in a natural an d straightforward way. We describe the sparse grid combination technique for the classification problem in terms of the regularization network approach. We then give implementational d etails and discuss the complexity of the algorithm. It turns out that the metho d scales linear with the number of instances, i.e. the amount of data to be clas sified. Finally we report on the quality of the classifier build by our new method. Here we consider standard test problems from the UCI repository and problems wit h huge synthetical data sets in up to 9 dimensions. It turns out that our new method achieves correctness rates which are competitive to that of the best existing methods.