Unity ML-Agents | Pretrain an LLM from Scratch with Sentence Transformers | Part 18
*Welcome back to our Tau LLM series! ๐*
In this episode, we're excited to showcase the latest advancements in our project. Here's what we've been up to:
**Oproof Python Package Completion**: In the last episode, we successfully completed the oproof Python package. This package is designed to validate prompt-response pairs using Ollama and Python, ensuring data integrity and accuracy.
**Terminal Command Implementation**: We'll be integrating the oproof package into Tau's kernel as the `data oproof {filename}` terminal command. This command will load a data file of training messages and validate each prompt-response pair, checking for domain accuracy in basic math, grammar, and spelling.
**Error Handling and Output**: Any invalid messages will be removed from the input training data and saved into a `*_oproof_error.json` file, similar to our ophrase terminal command.
Join us as we implement and test these exciting new features. Whether you're a beginner or an experienced developer, this episode offers valuable insights into developing, testing, and enhancing an LLM using custom tools and techniques.
Stay tuned and let's get started! ๐