![]() ![]() At `-t 8` it has the fastest output presumably since that is the number of performance cores my M2 Max has. What I'm seeing is that this seems to change the fidelity of the output. I originally expected that this was similar to `make -j` flag where it controls the number of parallel threads but the total computation done would be the same. It is a rather surreal experience, a ghost in my shell indeed.Īs an aside, I'm seeing an interesting behaviour on the `-t` threads flag. I'm currently running the 65B model just fine. I'm following the instructions on the post from the original owner of the repository involved here. ![]() Main: predict time = 34094.89 ms / 264.30 ms per token We have a team of auto locksmiths on call in Kirtlington 24 hours a day 365 days a year to help with any auto locksmith emergency you may find yourself in, whether it be repairing an broken omega lock, reprogramming your car transponder keys, replacing a lacking vehicle key or limiting chipped car fobs, our team of auto lock When you need the help of an Auto Locksmith Kirtlington look no further than our team of experts who are always on call 24 hours a day 365 days a year. Llama_model_load: loading model part 4/4 from './models/30B/ggml-model-q4_0.bin.3' Llama_model_load: loading model part 3/4 from './models/30B/ggml-model-q4_0.bin.2' Llama_model_load: loading model part 2/4 from './models/30B/ggml-model-q4_0.bin.1' ![]() Llama_model_load: model size = 4850.14 MB / num tensors = 543 Llama_model_load: loading model part 1/4 from './models/30B/ggml-model-q4_0.bin' Llama_model_load: ggml ctx size = 20951.50 MB Llama_model_load: loading model from './models/30B/ggml-model-q4_0.bin' - please wait. I am running the 30B model on my m1 Mac Studio with 32gb of ram. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |