Fast Kernel Summation via Slicing
Back to: Main Page
In the following, we give details on the implemented fast kernel summation. A theoretical background can be found here. A precise specification of all attributes and arguments in this library is given here. We aim to compute the kernel sums
\[s_m=\sum_{n=1}^N w_n K(x_n,y_m)\]for all $m=1,…,M$. The naive implementation has a computational complexity of $O(MN)$. The implementation of this library has complexity $O(M+N)$.
Supported Kernels
The implementation currently supports the following kernels:
- The
Gausskernel given by $K(x,y)=\exp(-\frac1{2\sigma^2}\|x-y\|^2)$. - The
Laplacekernel given by $K(x,y)=\exp(-\frac1{\sigma}\|x-y\|)$. In the literature, the parameter $\sigma$ is often replaced by $\frac{1}{\alpha}$. However, we stick to the above notation to handle the scale parameter similarly for all kernels. - The
Maternkernel given by $K(x,y)=\frac{2^{1-\nu}}{\Gamma(\nu)}(\tfrac{\sqrt{2\nu}}{\sigma}\|x-y\|)^\nu K_\nu(\tfrac{\sqrt{2\nu}}{\sigma}\|x-y\|)$, where $K_\nu$ is the modified Bessel function of second kind. The kernel depends on a smoothness parameter $\nu$ which determines its smoothness. The value of $\nu$ has to be passed within thekernel_paramsdictionary (kernel_params["nu"]=nu). For $\nu=\frac12$, we obtain the Laplace kernel and for $\nu\to\infty$, the Matern kernel converges towards the Gauss kernel. - The
energykernel given by $K(x,y)=-\frac{1}{\sigma}\|x-y\|$. - The
Rieszkernel is given by $K(x,y)=-\frac{\|x-y\|^r}{\sigma^r}$ for some exponent $r\in(-1,\infty)$. For $r=1$ we resemble the energy kernel. The exponent $r$ has to be passed within thekernel_paramsdictionary (kernel_params["r"]=r). For small values of $r$ (say $r<1$), this kernel becomes very non-smooth at $x=y$. In this case, more Fourier coefficients in the fast Fourier summation might be required (which can be adjusted via the keyword argumentn_ftin the constructor of theFastsumobject). - The
thin_platespline kernel is given by $K(x,y)=\frac{\|x-y\|^2}{\sigma^2}\log(\frac{\|x-y\|}{\sigma})$. - The
logarithmickernel is given by $K(x,y)=\log(\frac{\|x-y\|}{\sigma})$. This kernel is singular forx=y. Therefore, the fast Fourier summation requires significantly more Fourier coefficients then for the other kernels (e.g.,n_ft=65536for a relative error up to3e-3).
Note: A better treatment for kernels which are non-smooth or singular at $x=y$ is implemented in the NFFT3 library.
Usage and Example
To use the fast kernel summation, we first create a Fastsum object with fastsum=Fastsum(d, kernel="Gauss"). It takes the dimension and the kernel (as string from the above list) as input.
Afterwards, we can compute the vector $s=(s_1,…,s_M)$ by s=fastsum(x, y, w, xis_or_P) where x has the shape (...,N,d), y has the shape (...,M,d) and w has the shape (...,N). The argument xis_or_P either takes the number of considered slices as integer (higher number = higher accuracy) or the slices itself as a tensor of size (P,d). The fastsum method supports batching. That is, we can add arbitrary many dimensions in the beginning of each of the tensors x, y and w. The number of batching dimensions has to be the same for all inputs, but broadcasting is supported (i.e., expanding batching dimensions of size 1).
Other optional arguments for the constructor of the Fastsum object include (full list in the specification):
kernel_params: For the Matern kernel, we also have to specifykernel_params=dict(nu=nu_val), wherenu_valis a float specifying the smoothness parameter $\nu$.- Batch sizes: If the memory consumption is too high, the computation can be batched with two batch size parameters:
batch_size_P: batch size for the slicesbatch_size_nfft: batch size for the NFFT (should be smaller or equalbatch_size_P)
slicing_mode: By default the slices in the slicing algorithm are chosen iid. The performance can often be increased by using QMC rules. Currently the modes"iid","orthogonal","Sobol","distance"and"spherical_design"are implemented, see specification for a description. Additionally, the value"non-sliced"can be passed to apply the non-sliced fast Fourier summation. The choice"spherical_design"is only applicable for $d=3$ and $d=4$. The default value is"non-sliced"for $d\in\{1,2\}$"spherical_design"for $d\in\{3,4\}$,"distance"for $d \leq 100$ but $d\not\in\{3,4\}$ and"orthogonal"otherwise.
import torch
from simple_torch_NFFT import Fastsum
device = "cuda" if torch.cuda.is_available() else "cpu"
d = 10 # data dimension
kernel = "Gauss" # kernel type
fastsum = Fastsum(d, kernel=kernel, device=device) # fastsum object
scale = 1.0 # kernel parameter
P = 256 # number of projections for slicing
N, M = 10000, 10000 # Number of data points
# data generation
x = torch.randn((N, d), device=device, dtype=torch.float)
y = torch.randn((M, d), device=device, dtype=torch.float)
x_weights = torch.rand(x.shape[0]).to(x)
kernel_sum = fastsum(x, y, x_weights, scale, P) # compute kernel sum