• supert@lemmy.sdfeu.org
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      1 year ago

      I can run 4bit quantised llama 70B on a pair of 3090s. Or rent gpu server time. It’s expensive but not prohibitive.

      • anotherandrew@lemmy.mixdown.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I’m trying to get to the point where I can locally run a (slow) LLM that I’ve fed my huge ebook collection too and can ask where to find info on $subject, getting title/page info back. The pdfs that are searchable aren’t too bad but finding a way to ocr the older TIFF scan pdfs and getting it to “see” graphs/images are areas I’m stuck on.

    • Grimy@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I personally use runpod. It doesn’t cost much even for the high end level stuff. Tbh the openai API is easier though and gives mostly better results.

      • Communist@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I specifically said “large context” how many tokens can you get through before it goes insanely slow?