Lead author Davis Blalock joined a Reddit discussion to answer questions about the MADDNESS paper.
MADDNESS outperformed all other methods, achieving a "far better speed-accuracy tradeoff." The team also compared MADDNESS to the other algorithms using kernel classifiers trained on the UCR Time Series Archive datasets in these experiments, MADDNESS was "significantly faster than alternatives at a given level of accuracy." The AMM algorithms were used as part of an image classifier trained and evaluated on the CIFAR datasets. The researchers compared MADDNESS to six other AMM algorithms, including principal component analysis ( PCA) and Bolt, their own previous AMM algorithm, as well as exact matrix multiplication using BLAS. MADDNESS includes two additional performance improvements: a prototype optimization which chooses a set of prototypes that minimizes the reconstruction error of the original matrix, and a fast 8-bit aggregation which combines multiple products using hardware-specific averaging instructions instead of addition. Mapping an input vector to a prototype requires only comparison to the threshold values of the tree splits. The functions are based on binary regression trees each tree has 16 leaves which represent hash buckets. The key innovation with MADDNESS is using a pre-processing step to produce very fast hash functions that do not require multiply-add operations. Then, any new input vector is mapped to its most similar prototype using a hash function, which gives an index to a pre-computed product. The products of each prototype vector with the fixed weight vector are pre-computed. In PQ, a large set of input vectors is analyzed to create a small number of prototype vectors. MADDNESS is based on a vector quantization algorithm called product quantization (PQ). The other matrix represents the input data for example, the pixels of an image to be classified. In particular, one matrix contains fixed values this might represent the weights in an image classifier model.
The MADDNESS algorithm makes several assumptions about the matrices being multiplied: "tall, relatively dense, and resident in a single machine’s memory," which occur in a wide variety of machine learning applications. Thus, many researchers have investigated AMM algorithms, which trade off accuracy of matrix multiplication for speed. Because GPU and TPU chips can execute many multiply-adds in parallel, they perform matrix multiplication faster than CPUs, making them attractive for ML applications however, they may be too expensive or even unavailable for researchers with a limited budget, or in resource-constrained applications like IoT. Matrix multiplication is a fundamental operation in machine learning, and is one of the most time-consuming, due to the extensive use of multiply-add instructions. In a set of experiments comparing the performance of MADDNESS against other algorithms in an image classifier, the researchers found that MADDNESS has a better speed-accuracy tradeoff, and achieves "virtually the same accuracy as exact multiplication" while being 10x faster. Although the algorithm does introduce some output error, the authors show the algorithm has a theoretical upper-bound on the error, which can be traded-off against speed.
The hash functions map input data to an index into a lookup table that contains pre-computed dot-products.
Instead, MADDNESS uses a set of efficient learned hash functions that achieve coding rates of 100GB/second using only one CPU thread. Unlike many other AMM algorithms, MADDNESS uses no multiply-add operations. The team described MADDNESS and a set of experiments in a paper presented at the recent International Conference on Machine Learning (ICML). MADDNESS requires zero multiply-add operations and runs 10x faster than other approximate methods and 100x faster than exact multiplication.
Researchers at MIT's Computer Science & Artificial Intelligence Lab (CSAIL) have open-sourced Multiply-ADDitioN-lESS (MADDNESS), an algorithm that speeds up machine learning using approximate matrix multiplication (AMM).