Running a model

Now that you have a trained model, it's time to run it.

Before downloading a model, you can use the built-in playground to try out a few prompts on the trained model.

Downloading a model

There are 2 ways to download a trained model:

  • Through the dashboard

  • Using the API

You can download the model from the dashboard by using the "Download model" button on the overview page of a dataset.

For downloading a model using the API:

  • You need to get an API key from the dataset page.

  • Call the model download endpoint of the API using this API key as defined here. (You can get the model ID from the overview page)

  • You will receive a url in the response which you can use to download the model using a tool like wget.

Through both the download methods, you will have a zip file which has all the necessary files to run the model.

To run the model we need to unzip the file-

unzip .zip

The resulting directory will look like this-

/model_run
  ├── config.json
  ├── generation_config.json
  ├── model.safetensors
  ├── special_tokens_map.json
  ├── tokenizer_config.json
  └── tokenizer.model

One last thing needed to run the model is the transformers library by huggingface, you can go ahead and install it by

pip install transformers

Now we have everything needed to run the model, we will use the following code to actually run it-

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("model_run")
model = AutoModelForCausalLM.from_pretrained("model_run",device_map="auto")
def fmt_prompt(text): # this return format is for mistral finetuned models
    return f'''[INST] {text} [\INST]'''
    
prompt = fmt_prompt("your prompt goes here")
input_tokens = tokenizer(prompt,return_tensors="pt").to(model.device)

outputs = model.generate(**input_tokens,do_sample=True,temperature=0.7,top_p=0.95,max_new_tokens=100)

print(tokenizer.decode(outputs[0],skip_special_tokens=True,clean_up_tokenization_spaces=False))

The above code loads the model, runs inference and prints the output.

Note: The above model formats the prompt according to the mistral prompt format. If you have trained using Llama 3 as the base model use the following prompt template:

def fmt_prompt(text): # this return format is for llama finetuned models
    return f'''<|start_header_id|>user<|end_header_id|>\n\n{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'''

Last updated