Llama-Nemotron models seem to offer significant ahead-of-time optimization and handming nemotron may be useful in general. Took a peek at the config.json and it didn't look pretty i nthere. There's a PR on exllama and llamacpp merged one by the same author it appears
Llama-Nemotron models seem to offer significant ahead-of-time optimization and handming nemotron may be useful in general. Took a peek at the config.json and it didn't look pretty i nthere. There's a PR on exllama and llamacpp merged one by the same author it appears