• simple@lemmy.mywire.xyz
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 years ago

    I tried with WizardLM uncensored, but 8K seems to be too much for 4090, it runs out of VRAM and dies.

    I also tried with just 4K, but that also seems to not work.

    When I run it with 2K, it doesn’t crash but the output is garbage.

    • notfromhere@lemmy.oneOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      I hope llama.cpp supports SuperHOT at some point. I never use GPTQ but may need to make an exception to try out the larger context sized. Are you using exllama? Curious why you’re getting garbage output