Same model, new song. Unfortunately the fretless bass confused the model a bit and it removed a few notes, but am training a new version which won't make that mistake.
Model is just a deep residual u-net, bridge of the u-net uses a stack of residual units with convolutional multihead attention module placed before adding the identity connection which computes attention for both channels and features. The convolutional multihead attention module is something I coded with inspiration from convolutional block attention module, it first computes channel attention using 9x9 shared kernel convolutions for the QKV tensors and a separable convolution with groups=channels for the out projection, immediately followed by 1x1 convolutions to project all features into a shared space with a 1x1 convolution out projection.
I will likely unlist this once First Fragment release their official instrumental album.