Datasets

A dataset is a collection of samples (prompt, response) pairs that is used to train a model to perform a specific task. Within a dataset you can have any number of versions (each with their own model).

You should make new datasets for the following reasons:

  • The problem the model should solve is different from any other dataset.

  • The model will be evaluated differently.

Datasets are designed to help you iterate on solving a single problem with a set of ever improving models .You should not create datasets that mix different problems, even if the they are very similar types of problems.

  • e.g a tweet classifier and a report classifier are both classifiers but should be in separate datasets.

Last updated