notfromhere@lemmy.one to LocalLLaMA@sh.itjust.worksEnglish · 2 years agoVicuna-33B-1-3-SuperHOT-8K-GPTQhuggingface.coexternal-linkmessage-square6fedilinkarrow-up18arrow-down10
arrow-up18arrow-down1external-linkVicuna-33B-1-3-SuperHOT-8K-GPTQhuggingface.conotfromhere@lemmy.one to LocalLLaMA@sh.itjust.worksEnglish · 2 years agomessage-square6fedilink
minus-squaresimple@lemmy.mywire.xyzlinkfedilinkEnglisharrow-up0·2 years agoI tried with WizardLM uncensored, but 8K seems to be too much for 4090, it runs out of VRAM and dies. I also tried with just 4K, but that also seems to not work. When I run it with 2K, it doesn’t crash but the output is garbage.
minus-squarenotfromhere@lemmy.oneOPlinkfedilinkEnglisharrow-up1·2 years agoI hope llama.cpp supports SuperHOT at some point. I never use GPTQ but may need to make an exception to try out the larger context sized. Are you using exllama? Curious why you’re getting garbage output
I tried with WizardLM uncensored, but 8K seems to be too much for 4090, it runs out of VRAM and dies.
I also tried with just 4K, but that also seems to not work.
When I run it with 2K, it doesn’t crash but the output is garbage.
I hope llama.cpp supports SuperHOT at some point. I never use GPTQ but may need to make an exception to try out the larger context sized. Are you using exllama? Curious why you’re getting garbage output