Models not loading into RAM

corvus@lemmy.ml · edit-2 5 days ago

Imagine having that great idea and carefully crafting the edition of the book for that to happen. Or may be just unscrupulously fill up the book with white pages with the sentence “this page is intentionally left blank (see page 269)”

corvus@lemmy.ml · 18 days ago

I ended up buying an ASUS BT400, it works out of the box in Linux. I found it here

corvus@lemmy.ml · 20 days ago

Oh great, thanks

corvus@lemmy.ml · 20 days ago

Yeah I tested with lower numbers and it works, I just wanted to offload the whole model thinking it will work, 2GB it’s a lot. With other models it prints about 250MB when fails and if you sum up the model size it’s still well below the iGPU free memory so I dont get it… anyway, I was thinking about upgrading the memory to 32GB or may be 64GB but I hesitate because with models around 7GB and CPU only I get around 5 t/s and with 14GB 2-3 t/s, so I run one of around 30GB I guess it will get around 1 t/s? My supposition is that increasing RAM doesn’t increase performance per se, just let’s you upload bigger models to memory, so performance is approximately linear on model size… what do you think?

corvus@lemmy.ml · 21 days ago

I get an error when offloading the whole model to GPU

./build/bin/llama-cli -m ~/software/ai/models/deepseek-math-7b-instruct.Q8_0.gguf -n 200 -t 10 -ngl 31 -if

The relevant output is:

…

llama_model_load_from_file_impl: using device Vulkan0 (Intel® Iris® Xe Graphics (RPL-U)) - 7759 MiB free

…

print_info: file size = 6.84 GiB (8.50 BPW)

…

load_tensors: loading model tensors, this can take a while… (mmap = true) load_tensors: offloading 30 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 31/31 layers to GPU load_tensors: Vulkan0 model buffer size = 6577.83 MiB load_tensors: CPU_Mapped model buffer size = 425.00 MiB

…

ggml_vulkan: Device memory allocation of size 2013265920 failed ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory llama_kv_cache_init: failed to allocate buffer for kv cache llama_init_from_model: llama_kv_cache_init() failed for self-attention cache common_init_from_params: failed to create context with model ‘~/software/ai/models/deepseek-math-7b-instruct.Q8_0.gguf’ main: error: unable to load model

It seems to me that there is enough room for the model, but I don’t know what “Device memory allocation of size 2013265920” means.

corvus@lemmy.ml · edit-2 22 days ago

Is BLAS faster with CPU only than Vulkan with CPU+iGPU? After failing to make work the SYCL backend in llama.cpp apparently because of a Debian driver issue I ended up using the Vulkan backend but after many tests offloadding to the iGPU doesn’t seem to make much difference.

corvus@lemmy.ml · edit-2 22 days ago

Is BLAS faster with CPU only than Vulkan with CPU+iGPU? After failing to make work the SYCL backend of llama.cpp apparently because a Debian driver issue I tried the Vulkan backend successfuly but offloading to iGPU doesn’t seems to make much difference.

corvus@lemmy.ml · 25 days ago

I don’t like intermediaries ;) Fortunately I compiled llama.cpp with the Vulkan backend and everything went smooth and now I have the option to offload to the GPU. Now I will test performance CPU vs CPU+GPU. Downloaded deepseek 14b and is really good, the best I could run so far in my limited hardware.

corvus@lemmy.ml · edit-2 26 days ago

Yes, gpt4all runs it in cpu mode, the gpu option does not appear in the drop-down menu, which means the gpu it’s not supported or there is an error. I’m trying to run the models with the SyCL backend implemented in llama.cpp that performs specific optimizations for cpu+gpu with the Intel DPC++/C++ Compiler and the OneAPI Toolkit.

Also try Deepseek 14b. It will be much faster.

ok, I’ll test it out.

corvus@lemmy.ml · edit-2 27 days ago

I tried llama.cpp but I was having some errors about not finding some library so I tried gpt4all and it worked. I’ll try to recompilte and test it again. I have a thinkbook with Intel i5-1335u and integrated Xe graphics. I installed the Intel OneAPI toolkit so llama.cpp could take advantage of the SYCL backend for Intel GPUs, but I had an execution error that I was unable to solve after many days. I installed the Vulkan SDK needed to compile gpt4all with the hope to being able to use the GPU but gpt4all-chat doesn’t show the option to run from it, so from what I read it means that it’s not supported, but from some posts that I read I should not expect a big performance boost from that GPU.

corvus@lemmy.ml · 27 days ago

Models not loading into RAM

corvus@lemmy.ml · 1 month ago

Like a snail… slow but cool.

corvus@lemmy.ml · 1 month ago

What do you mean by “average number of pages”? Average over what?

corvus@lemmy.ml · 1 month ago

Searching “github download webpage video” gives this and more results to try.

corvus@lemmy.ml · 1 month ago

Here is how you can run the 671B model without using graphic cards for about $6.000. Here is the post on X.

corvus@lemmy.ml · edit-2 1 month ago

Here you have all the packages you can install for specific purposes grouped by categories

corvus@lemmy.ml · 2 months ago

Create and restore an ssd image using dd in different filesystems

corvus@lemmy.ml · 5 months ago

Bluetooth dongle working without propietary firmware

corvus@lemmy.ml · 5 months ago

Bluetooth dongle working without propietary firmware

corvus@lemmy.ml · 9 months ago

do you see how easily you are manipulated with misinformation? That’s why people don’t stop talking stupid things about bitcoin. She doesn’t say that in any moment. She says higher highs and higher lows, which is true and doesn’t mean allways increasing.

corvus@lemmy.ml · edit-2 9 months ago

One of the biggest misconceptions about bitcoin is that it’s a neoliberal/libertarian thing. Thinking this way only shows lack of understanding. A famous short video from the most respected bitcoin educator will make understand better. https://youtu.be/ywO0r_Fz0lc

And about the environmental concerns Lyn Alden has a post that deals with all the misinformation. If you are really interested take the time to read it, it’s exhaustive. https://www.lynalden.com/bitcoin-energy/

corvus@lemmy.ml · 9 months ago

Honestly I was expecting far more downvotes. I posted the video with people like you in mind, who still can think critically without the burden of misinformation and ideology.

corvus@lemmy.ml · 9 months ago

This video has seemingly no sources for its claims.

It’s just an introductory video. The references are in her book. I counted around 300.

Lyn Alden is part of “Ego Death Capital”, a venture capital company around cryptocurrencies (https://egodeath.capital/team)

Only Bitcoin. Bitcoin is not any crypto or altcoin, which I agree most are scams.

Lyn Alden is the Board Director of Swan Bitcoin - a Bitcoin investment platform (https://www.swanbitcoin.com/alden/) Lyn Alden is not an economist (https://www.lynalden.com/about-lyn-alden/)

Who are you expecting to make a video about the failure of the current system? A banker?

Bitcoin cannot be diluted (~27:25) REALITY CHECK: Bitcoin is always being diluted until it reaches its hard limit.

What she obviously means is that nobody can delute it. It creates new money at a mathematically determined rate.

The value of Bitcoin has only increased over time (~27:50) REALITY CHECK: The log scale is playing tricks. A linear graph would show how volatile Bitcoin has truly been.

She doesn’t say that. She says bigger highs and bigger lows, which is true. That doesn’t mean it always increases.

Bitcoin’s hard limit is likely very dangerous for the network (~29:00): Once the hard limit is reached, it is unclear if people will keep >pumping computing power at it. If the creation of new Bitcoin is no longer allowed, it is possible that transaction fees will need to >be raised to compensate miners.

Dont worry, it will happen in 2140.

And of course is Bitcoin propaganda, and more of this quality is needed.

corvus@lemmy.ml · edit-2 9 months ago

Besides the solution proposed, the video excels at explaning how we reached the current mess in the economy.

corvus@lemmy.ml · 9 months ago

How open source money fixes a corrupted banking system

corvus@lemmy.ml · 9 months ago

Learning english

corvus@lemmy.ml · 1 year ago

science hell

corvus@lemmy.ml · edit-2 1 year ago

googerteller: an audible feedback on just how much your browsing feeds into google

corvus@lemmy.ml · edit-2 2 years ago

Climate change - A verting catastrophe