Vicuna-33B-1-3-SuperHOT-8K-GPTQ

notfromhere@lemmy.one · 2 years ago

Vicuna-33B-1-3-SuperHOT-8K-GPTQ

simple@lemmy.mywire.xyz · 2 years ago

I tried with WizardLM uncensored, but 8K seems to be too much for 4090, it runs out of VRAM and dies.

I also tried with just 4K, but that also seems to not work.

When I run it with 2K, it doesn’t crash but the output is garbage.

notfromhere@lemmy.one · 2 years ago

I hope llama.cpp supports SuperHOT at some point. I never use GPTQ but may need to make an exception to try out the larger context sized. Are you using exllama? Curious why you’re getting garbage output

Vicuna-33B-1-3-SuperHOT-8K-GPTQ

Vicuna-33B-1-3-SuperHOT-8K-GPTQ

TheBloke/Vicuna-33B-1-3-SuperHOT-8K-GPTQ · Hugging Face