# Training GPT-2

Now we can proceed to traing you data set. Execute the next command.

```
$ python train.py --dataset lyric.npz
```

If you would like to see more examples, you can modify it accordingly. For example, to output 3 samples every 50 steps, type the following command instead:

```
$ python train.py --dataset lyric.npz --sample_every 50 --sample_num 3
```

&#x20;There is also an option for you to increase the **batch size** and **learning rate**. Make sure that you have sufficient RAM to handle the increase in batch size (the default is 1). Learning rate is used for fine-tuning the model.

```
python train.py --dataset lyric.npz --batch_size 2 --learning_rate 0.0001
```

### \*\*\*\*Training using horovod

For those that wish to distribute on multiple GPUs to train GPT-2, you can try the following code (all of them in one line):

```
$ mpirun -np 4 -H localhost:4 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -x PYTHONPATH=src -mca pml ob1 -mca btl ^openib train-horovod.py --dataset lyric.npz
```

### How to stop training?

&#x20;You can stop the training by using Ctrl+C. By default, the model will be saved once every 1000 steps and a sample will be generated once every 100 steps. After you have interrupted the process, a **checkpoint** folder and **samples** folder will be generated for you. Inside each folder, you can find another folder called **run1**.&#x20;

Samples will contain the example output from the model, you can view it in any text editor to evaluate your model. The **checkpoint** folder will contains the necessary data for you to resume your training in the future.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://alfredo-reyes-montero.gitbook.io/openai/application/generate-custom-text-content/training-gpt-2.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
