AI Features Features How it works Pricing FAQ Blog Glossary About Us Agencies
Technique

Training Data

Training data is the massive dataset used to teach LLMs language understanding and generation. Includes web pages, books, articles that shape AI knowledge and biases.

What is Training Data?

Training data is the collection of text used to teach LLMs to understand and generate language.

Major Sources

  • Common Crawl (web archives)
  • Books and literature
  • Wikipedia
  • GitHub (code)
  • Scientific papers

Implications

Content in training data influences AI knowledge. Outdated info can persist, making quality and recency important.

Pour aller plus loin

Découvrez notre article approfondi sur ce sujet

Lire l'article