Vulkan for Edge AI: Expanding the Hardware Frontier with llama.cpp
DateMay 6Time11:35 - 12:00Location Central Room
Agentic AI on the edge requires accessible, low-latency inference, yet hardware fragmentation limits deployment. While CUDA dominates acceleration, its vendor lock-in constrains local intelligence. This talk examines Vulkan as a vendor-neutral alternative, showcasing how it expanded compatibility and reduced deployment complexity in llama.cpp across Intel, AMD, and Nvidia GPUs.
However, Vulkan is not a silver bullet. I will outline engineering roadblocks, from driver inconsistencies to compute limitations. Looking ahead, we explore VK_NV_cooperative_matrix2 as a blueprint for offloading hardware-specific optimizations to the driver. This enables peak performance via vendor optimizations while still allowing broad support through generic shader fallbacks, unifying the edge AI ecosystem.