Runtime Comparison

Back to: Main Page, Overview of the NFFT

The test script test/tuntime_test.py compares the runtime of this implementation with some other libraries. For the comparisons, we always perform all precomputations, which only depend on the input shape and dimension, but no precomputations depending on the basis points x or the function values f or f_hat. We fix M=100000, m=2 and sigma=2. Then, we obtain the following execution times on a NVIDIA RTX 4090 GPU averaged over 10 runs. Moreover, we run all methods with single precision. Note that pyNFFT3 only has a CPU implementation (24-core AMD Ryzen Threadripper 7960X s) and always uses double precision such that the comparison with it is not really fair.

We always use the input shape (batch_x,1,M,d) for x, (batch_x,batch_f,M) for f and (batch_x,batch_f,N_1,...,N_d) for f_hat.

One-dimensional NFFT

We use N=4096. Then the execution times (seconds) the forward NFFT were the following.

batch sizes	pyNFFT3 (CPU)	TorchKbNufft	torch_nfft	simple_torch_NFFT
`batch_x=1`, `batch_f=1`	0.00226	0.00113	0.00140	0.00078
`batch_x=1`, `batch_f=10`	0.00505	0.00108	0.00333	0.00096
`batch_x=10`, `batch_f=1`	0.01340	0.00447	0.00346	0.00085
`batch_x=10`, `batch_f=10`	0.06744	0.00500	0.02241	0.00288

For the adjoint NFFT, we obtain the following execution times

batch sizes	pyNFFT3 (CPU)	TorchKbNufft	torch_nfft	simple_torch_NFFT
`batch_x=1`, `batch_f=1`	0.00289	0.00270	0.00156	0.00091
`batch_x=1`, `batch_f=10`	0.01650	0.00289	0.00325	0.00183
`batch_x=10`, `batch_f=1`	0.02491	0.00613	0.00310	0.00183
`batch_x=10`, `batch_f=10`	0.15513	0.00804	0.02222	0.01483

Two-dimensional NFFT

We use N=(N_1,N_2)=(256,256). Then the execution times (seconds) for the forward NFFT were the following.

batch sizes	pyNFFT3 (CPU)	TorchKbNufft	torch_nfft	simple_torch_NFFT
`batch_x=1`, `batch_f=1`	0.00489	0.00237	0.00257	0.00092
`batch_x=1`, `batch_f=10`	0.02079	0.00263	0.00973	0.00168
`batch_x=10`, `batch_f=1`	0.07189	0.01557	0.01020	0.00240
`batch_x=10`, `batch_f=10`	0.21706	0.02359	0.06724	0.01905

For the adjoint NFFT, we obtain the following execution times

batch sizes	pyNFFT3 (CPU)	TorchKbNufft	torch_nfft	simple_torch_NFFT
`batch_x=1`, `batch_f=1`	0.00491	0.00502	0.00223	0.00117
`batch_x=1`, `batch_f=10`	0.02924	0.00649	0.00893	0.00725
`batch_x=10`, `batch_f=1`	0.07695	0.02437	0.00950	0.00791
`batch_x=10`, `batch_f=10`	0.30029	0.03246	0.06537	0.07389

Three-dimensional NFFT

We use N=(N_1,N_2,N_3)=(128,128,128). For batch_x=batch_f=10, we got an memory error on the GPU (should not be surprising when trying to perform 100 three-dimensional NFFTs in parallel). Then the execution times (seconds) for the forward NFFT were the following.

batch sizes	pyNFFT3 (CPU)	TorchKbNufft	torch_nfft	simple_torch_NFFT
`batch_x=1`, `batch_f=1`	0.03677	0.00888	0.01297	0.00458
`batch_x=1`, `batch_f=10`	0.27084	0.02778	0.06780	0.04576
`batch_x=10`, `batch_f=1`	0.30655	0.08501	0.06860	0.04744

For the adjoint NFFT, we obtain the following execution times

batch sizes	pyNFFT3 (CPU)	TorchKbNufft	torch_nfft	simple_torch_NFFT
`batch_x=1`, `batch_f=1`	0.04423	0.01856	0.00950	0.00675
`batch_x=1`, `batch_f=10`	0.34766	0.04173	0.10051	0.06409
`batch_x=10`, `batch_f=1`	0.38263	0.11934	0.10062	0.06685