This folder shows an example of adapting Hyena to use FlashFFTConv. The original files are sourced from safari.
Install model-specific requirements. See the safari repo for instructions.
This code depends on an old version of FlashAttention (0.2.8) for the MLP interface.
We have sample configs for Hyena models of different sizes that you can benchmark:
python benchmark_fwd.py experiment=pile/hyena.yaml
python benchmark_fwd.py experiment=pile/hyena-flashfft.yaml
We describe the changes necessary to use FlashFFTConv in Hyena:
Create an instance of FlashFFTConv
in LMBackbone
. In src/models/sequence/long_conv_lm.py, lines 193-197:
if use_flashfftconv:
self.flashfftconv = FlashFFTConv(layer['l_max'] * 2, dtype=torch.float16)
for layer in self.layers:
layer.mixer.flashfftconv = self.flashfftconv
Then, we adapt Hyena to use the flashfftconv
variable in src/models/sequence/hyena-flashfft.py.
We make a couple more optimizations:
- We use our fast depthwise kernel.
- We introduce an "inference mode" that simply loads the convolution kernel from weights, instead of recomputing it every time. An alternative is to use a fast kernel to generate the convolution kernel, as in the M2 repo.