OpenSeek-10B: Scaling Open-Source LLMs with Less Compute
DateMay 6Time14:30 - 15:00Location Open Stage
Pretraining large language models from scratch is expensive — but does it have to be? In this talk, we present OpenSeek-10B, a fully open-source 10B-parameter language model that outperforms Qwen3-14B-Base and other open-source models of comparable size, while reducing pretraining FLOPs by roughly 20x.
We share two key techniques that compound to dramatically improve pretraining efficiency: small-model initialization, which grows a well-trained 4B model to 10B scale through hybrid width-depth expansion, inheriting its learned capabilities instead of starting from scratch; and the Muon optimizer, which further accelerates training convergence over the widely-used AdamW. Combined with a carefully designed multi-stage data strategy built entirely on open-source datasets, OpenSeek-10B outperforms Qwen3-14B-Base and other open-source models of comparable size on mainstream benchmarks.
We will walk through the full recipe — model scaling, optimizer selection, data curation, and the lessons we learned along the way — providing a practical, reproducible blueprint for the community to pretrain competitive models at lower cost. We will also share how FlagOS, BAAI's open-source unified software stack validated across chips from 6 vendors, enables this recipe to run efficiently beyond a single hardware ecosystem.