Agentic AI on Edge

Vulkan for Edge AI: Expanding the Hardware Frontier with llama.cpp

Date May 6 Time 11:35 - 12:00 Location Central Room

Agentic AI on the edge requires accessible, low-latency inference, yet hardware fragmentation limits deployment. While CUDA dominates acceleration, its vendor lock-in constrains local intelligence. This talk examines Vulkan as a vendor-neutral alternative, showcasing how it expanded compatibility and reduced deployment complexity in llama.cpp across Intel, AMD, and Nvidia GPUs.

However, Vulkan is not a silver bullet. I will outline engineering roadblocks, from driver inconsistencies to compute limitations. Looking ahead, we explore VK_NV_cooperative_matrix2 as a blueprint for offloading hardware-specific optimizations to the driver. This enables peak performance via vendor optimizations while still allowing broad support through generic shader fallbacks, unifying the edge AI ecosystem.

Speakers

Ruben Ortlam Senior Machine Learning Engineer, Red Hat