OFFRE LIMITÉE Réservez votre billet Early Bird et économisez 30% ! · Offre valable jusqu'au 13 avril — Réservez maintenant !
Filtrer
Atelier Own Your Data

Synthetic Data for the Commons: Building Open SOTA LLMs with Synthetic Environments

Date 5 mai Heure 14:40 - 15:00 Lieu Scène Ouverte
Training state-of-the-art language models typically demands vast proprietary datasets and closed pipelines. At Pleias, we take a different path — building open, high-performing LLMs using synthetic data environments designed for the commons. This talk presents our approach to constructing synthetic data pipelines that generate diverse, high-quality training corpora without relying on proprietary sources. We cover the technical architecture behind our synthetic environments, the training strategies that enable competitive performance on standard benchmarks, and why we believe open synthetic data is a critical piece of the puzzle for democratizing access to frontier AI capabilities. We share lessons learned, benchmark results, and a roadmap for community-driven improvements.