Skip to the content.

Runtime Comparison

Back to: Main Page, Overview of the NFFT

The test script test/ compares the runtime of this implementation with some other libraries. For the comparisons, we always perform all precomputations, which only depend on the input shape and dimension, but no precomputations depending on the basis points x or the function values f or f_hat. We fix M=100000, m=2 and sigma=2. Then, we obtain the following execution times on a NVIDIA RTX 4090 GPU averaged over 10 runs. Moreover, we run all methods with single precision. Note that pyNFFT3 only has a CPU implementation (24-core AMD Ryzen Threadripper 7960X s) and always uses double precision such that the comparison with it is not really fair.

We always use the input shape (batch_x,1,M,d) for x, (batch_x,batch_f,M) for f and (batch_x,batch_f,N_1,...,N_d) for f_hat.

One-dimensional NFFT

We use N=4096. Then the execution times (seconds) the forward NFFT were the following.

batch sizes pyNFFT3 (CPU) TorchKbNufft torch_nfft simple_torch_NFFT
batch_x=1, batch_f=1 0.00226 0.00113 0.00140 0.00078
batch_x=1, batch_f=10 0.00505 0.00108 0.00333 0.00096
batch_x=10, batch_f=1 0.01340 0.00447 0.00346 0.00085
batch_x=10, batch_f=10 0.06744 0.00500 0.02241 0.00288

For the adjoint NFFT, we obtain the following execution times

batch sizes pyNFFT3 (CPU) TorchKbNufft torch_nfft simple_torch_NFFT
batch_x=1, batch_f=1 0.00289 0.00270 0.00156 0.00091
batch_x=1, batch_f=10 0.01650 0.00289 0.00325 0.00183
batch_x=10, batch_f=1 0.02491 0.00613 0.00310 0.00183
batch_x=10, batch_f=10 0.15513 0.00804 0.02222 0.01483

Two-dimensional NFFT

We use N=(N_1,N_2)=(256,256). Then the execution times (seconds) for the forward NFFT were the following.

batch sizes pyNFFT3 (CPU) TorchKbNufft torch_nfft simple_torch_NFFT
batch_x=1, batch_f=1 0.00489 0.00237 0.00257 0.00092
batch_x=1, batch_f=10 0.02079 0.00263 0.00973 0.00168
batch_x=10, batch_f=1 0.07189 0.01557 0.01020 0.00240
batch_x=10, batch_f=10 0.21706 0.02359 0.06724 0.01905

For the adjoint NFFT, we obtain the following execution times

batch sizes pyNFFT3 (CPU) TorchKbNufft torch_nfft simple_torch_NFFT
batch_x=1, batch_f=1 0.00491 0.00502 0.00223 0.00117
batch_x=1, batch_f=10 0.02924 0.00649 0.00893 0.00725
batch_x=10, batch_f=1 0.07695 0.02437 0.00950 0.00791
batch_x=10, batch_f=10 0.30029 0.03246 0.06537 0.07389

Three-dimensional NFFT

We use N=(N_1,N_2,N_3)=(128,128,128). For batch_x=batch_f=10, we got an memory error on the GPU (should not be surprising when trying to perform 100 three-dimensional NFFTs in parallel). Then the execution times (seconds) for the forward NFFT were the following.

batch sizes pyNFFT3 (CPU) TorchKbNufft torch_nfft simple_torch_NFFT
batch_x=1, batch_f=1 0.03677 0.00888 0.01297 0.00458
batch_x=1, batch_f=10 0.27084 0.02778 0.06780 0.04576
batch_x=10, batch_f=1 0.30655 0.08501 0.06860 0.04744

For the adjoint NFFT, we obtain the following execution times

batch sizes pyNFFT3 (CPU) TorchKbNufft torch_nfft simple_torch_NFFT
batch_x=1, batch_f=1 0.04423 0.01856 0.00950 0.00675
batch_x=1, batch_f=10 0.34766 0.04173 0.10051 0.06409
batch_x=10, batch_f=1 0.38263 0.11934 0.10062 0.06685