Agents continuously produce large volumes of artifacts: code, text, telemetry, etc. Most teams are stuck with static and stale datasets. In 2026, with agents able to create data pipelines as code, fresh and reliable datasets are accessible to any Python coder. We'll share how we built our internal agent evaluation platform with open source Python tools and AI (dlt, LanceDB, Pydantic, Ibis, HuggingFace, and more.)