CUDA Acceleration
By uploading your data to the GPU, you can accelerate the computation of your model.
julia> using CUDA, OMEinsum
julia> code = ein"ij,jk,kl,li->" # the einsum notation
ij, jk, kl, li ->
julia> A, B, C, D = rand(1000, 1000), rand(1000, 300), rand(300, 800), rand(800, 1000);
julia> size_dict = OMEinsum.get_size_dict(getixsv(code), (A, B, C, D)) # get the size of the labels
Dict{Char, Int64} with 4 entries:
'j' => 1000
'i' => 1000
'k' => 300
'l' => 800
julia> optcode = optimize_code(code, size_dict, TreeSA()) # optimize the contraction order
SlicedEinsum{Char, DynamicNestedEinsum{Char}}(Char[], kl, kl ->
├─ ki, li -> kl
│ ├─ jk, ij -> ki
│ │ ├─ jk
│ │ └─ ij
│ └─ li
└─ kl
)
The contraction order is optimized. Now, let's benchmark the contraction on the CPU.
julia> using BenchmarkTools
julia> @btime optcode($A, $B, $C, $D) # the contraction on CPU
6.053 ms (308 allocations: 20.16 MiB)
0-dimensional Array{Float64, 0}:
1.4984046443610943e10
The contraction on the CPU takes about 6 ms. Now, let's upload the data to the GPU and perform the contraction on the GPU.
julia> @btime CUDA.@sync optcode($cuA, $cuB, $cuC, $cuD) # the contraction on GPU
243.888 μs (763 allocations: 28.56 KiB)
0-dimensional CuArray{Float64, 0, CUDA.DeviceMemory}:
1.4984046443610939e10
To learn more about using GPU and autodiff, please check out the following asciinema video.