OminiX: Fully Automated Native C++ Deployment for Diverse Large-Scale Learning Models
DateMay 6Time11:10 - 11:35Location Central Room
Running deep learning inference in native C++ enables efficient edge deployment, eliminates Python/PyTorch dependencies, and allows fast, accurate quantization. However, converting a PyTorch
model to native C++ requires weeks of labor-intensive development, and only a narrow range of LLMs have been manually ported. We propose OminiX cpp, an automated pipeline where an AI agent
with structured procedural skills converts arbitrary PyTorch models into optimized C++ inference code targeting the GGML runtime. OminiX cpp generalizes beyond LLMs to support diverse model
families, including image and video generation, speech recognition, text-to-speech models, world models, and Vision-Language-Action (VLA) models. As a case study, we show the results on OpenVLA, a 7B-parameter VLA model, achieving a near-lossless task success rate, up to 63% memory reduction, and up to 1.52× speedup.