refactor: handle GGML_VK_VISIBLE_DEVICES at the Python level#2179
Conversation
| else: | ||
| inputs.vulkan_info = "".encode("UTF-8") | ||
| if "GGML_VK_VISIBLE_DEVICES" not in os.environ: | ||
| if args.usevulkan: # is an empty array if using vulkan without defined gpu |
There was a problem hiding this comment.
To make it work the same way as before for text models, this would have [0] as default instead of empty (which incidentally would be the wrong choice on my system, since the iGPU usually appears on Vulkan0).
All C++ handling code currently: - build a comma-separated list from the info_vulkan array - if GGML_VK_VISIBLE_DEVICES isn't set - set GGML_VK_VISIBLE_DEVICES to the list Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this can be done in the same way at the Python level, before all loading functions. Caveat: load_model had the default `inputs.vulkan_info = "0"`, so the default GPU would be "0" only when loading a text model.
062c9e0 to
32b549a
Compare
|
Seems to be working fine, so marking as ready. Feel free to add back that "force only Vulkan device 0" default afterwards; I don't mind keeping my workarounds, but I also don't want to spend time testing it to be sure it's doing exactly the wrong choice for me 🙂 |
|
@wbruna unfortunately upon proper testing this appears to fail completely, the environment variable is never propagated into C++ side and does not get set at all, causing vulkan to always use the default GPU. Does that happen for you too? |
|
seems like once the DLL is loaded via but i can probably still de-duplicate the code to only do this once via a helper function. |
|
hmm nope, still not working. we might have to revert after all. i'll check in again tomorrow. |
|
That's very weird, because it appears to work perfectly fine here: git diff
diff --git a/otherarch/sdcpp/sdtype_adapter.cpp b/otherarch/sdcpp/sdtype_adapter.cpp
index 41aaa82f2..ef48c74f2 100644
--- a/otherarch/sdcpp/sdtype_adapter.cpp
+++ b/otherarch/sdcpp/sdtype_adapter.cpp
@@ -317,6 +317,15 @@ std::string load_umt5_tokenizer_json()
}
bool sdtype_load_model(const sd_load_model_inputs inputs) {
+
+
+ const char * var = getenv("GGML_VK_VISIBLE_DEVICES");
+ if (var) {
+ std::cerr << "****************** GGML_VK_VISIBLE_DEVICES = " << var << std::endl;
+ } else {
+ std::cerr << "****************** GGML_VK_VISIBLE_DEVICES is NULL " << std::endl;
+ }
+
sd_is_quiet = inputs.quiet;
set_sd_quiet(sd_is_quiet);
executable_path = inputs.executable_path;$ python koboldcpp.py --config config.kcpps | grep GGML_VK
****************** GGML_VK_VISIBLE_DEVICES is NULL
^C
$ python koboldcpp.py --config config.kcpps --usevulkan 1 | grep GGML_VK
****************** GGML_VK_VISIBLE_DEVICES = 1
^C
$ python koboldcpp.py --config config.kcpps --usevulkan 1 0 | grep GGML_VK
****************** GGML_VK_VISIBLE_DEVICES = 1,0
^C
$ python koboldcpp.py --config config.kcpps --usevulkan 0 1 | grep GGML_VK
****************** GGML_VK_VISIBLE_DEVICES = 0,1
^C
But I'm doing it at the same point that And I'm sure ggml is seeing it: $ python koboldcpp.py --config config.kcpps --usevulkan 4 | grep GGML_VK
****************** GGML_VK_VISIBLE_DEVICES = 4
ggml_vulkan: Invalid device index 4 in GGML_VK_VISIBLE_DEVICES.
terminate called after throwing an instance of 'std::runtime_error'
what(): Invalid Vulkan device indexWith a similar patch to $ python koboldcpp.py --usevulkan 4 --model ./llama-2-7b.Q4_0.gguf | grep GGML_VK
*** load_model: GGML_VK_VISIBLE_DEVICES = 4
ggml_vulkan: Invalid device index 4 in GGML_VK_VISIBLE_DEVICES. |
Why not?
How exactly are you testing it? I could imagine e.g. msvc being weird, caching the env var right after library initialization, and somehow invalidating the cache on the |
|
I'm simply loading it normally in windows, i was puzzled because now i couldn't choose my vulkan device anymore. |
|
koboldcpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp Line 5929 in b31877e this line flags as nullptr now |
|
it works if i move |
|
Doing a full build to test it, but should be fine: |
|
Yeah, too bad windows doesnt have it. But this C++ macro approach should hopefully be clean enough. |
|
Working fine now; thanks. |
All C++ handling code currently:
Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this can be done in the same way at the Python level, before all loading functions.
Caveat: load_model had the default
inputs.vulkan_info = "0", so the default GPU would be "0" only when loading a text model.