This document provides a comprehensive overview of the most advanced large-scale foundation models as of mid-2025, detailing their known or inferred training datasets based on public disclosures, research papers, and information leaks. We'll explore the data sources powering AI systems from major companies including OpenAI, Anthropic, Google DeepMind, Meta, Mistral AI, Cohere, xAI, and Chinese AI labs, along with notable controversies surrounding data usage.
This document provides a comprehensive overview of the most advanced large-scale foundation models as of mid-2025, detailing their known or inferred training datasets based on public disclosures, research papers, and information leaks. We'll explore the data sources powering AI systems from major companies including OpenAI, Anthropic, Google DeepMind, Meta, Mistral AI, Cohere, xAI, and Chinese AI labs, along with notable controversies surrounding data usage.