Datasets

A dataset is a collection of samples (prompt, response) pairs that is used to train a model to perform a specific task. Within a dataset you can have any number of versions (each with their own model).

You should make new datasets for the following reasons:

The problem the model should solve is different from any other dataset.
The model will be evaluated differently.

Datasets are designed to help you iterate on solving a single problem with a set of ever improving models .You should not create datasets that mix different problems, even if the they are very similar types of problems.

e.g a tweet classifier and a report classifier are both classifiers but should be in separate datasets.

PreviousImproving/Editing a dataset NextVersions

Last updated 28 days ago