Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 34 additions & 50 deletions docs/customizer/about.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ Below are some examples of how you might format your dataset to perform a handfu

When testing models trained with prompt/completion datasets, use the `/v1/completions` endpoint instead of `/v1/chat/completions`.

For details, refer to the [Dataset Formatting tutorial](/documentation/fine-tune-models/tutorials/format-training-dataset#format-a-prompt-completion-dataset).
For details, refer to the [Dataset Formatting tutorial](/documentation/customizer-reference/tutorials/format-training-dataset#format-a-prompt-completion-dataset).

</Note>
#### Document Classification
Expand Down Expand Up @@ -197,31 +197,37 @@ completion: "<simple>"

Most of the models support Instruction Templates for training, the expected dataset conforms with the standard [OpenAI messages format](https://platform.openai.com/docs/guides/fine-tuning#multi-turn-chat-examples). Additionally, some models support tool calling which have additional optional parameters of `tools` at the top level of each entry and `tool_calls` per message.

For more information refer to our [in-depth instructions](/documentation/fine-tune-models/tutorials/format-training-dataset#format-a-conversation-dataset).
For more information refer to our [in-depth instructions](/documentation/customizer-reference/tutorials/format-training-dataset#format-a-conversation-dataset).

## Hyperparameters

Hyperparameters are configuration settings used to control the training process. You'll set these values before training begins to optimize how the model learns from your data. While the model automatically learns its internal parameters during training, these hyperparameters help guide that learning process. The right values depend on your specific use case, dataset size, and computational resources.

| Hyperparameter | Description | Default |
|----------------|-------------|---------|
| `epochs` | Number of complete passes through the training dataset | Model-dependent |
| `batch_size` | Number of samples processed before updating model weights | Model-dependent |
| `learning_rate` | Step size for weight updates during training | Model-dependent |
| `training.type` | Training type: `"sft"` for supervised fine-tuning | `"sft"` |
| `training.peft.type` | PEFT method: `"lora"` for Low-Rank Adaptation | — |
| `training.peft.rank` | LoRA rank (lower = fewer parameters, higher = more expressive) | 8 |
| `training.peft.alpha` | LoRA scaling factor | 32 |
Common hyperparameters you'll tune include:

| Hyperparameter | Description |
|----------------|-------------|
| Epochs | Number of complete passes through the training dataset |
| Batch size | Number of samples processed before updating model weights |
| Learning rate | Step size for weight updates during training |
| LoRA rank | Low-rank dimension of the adapter (lower = fewer parameters, higher = more expressive) |
| LoRA alpha | LoRA scaling factor |

<Note>

NeMo Customizer offers **two training backends** — Automodel (multi-GPU) and Unsloth (single-GPU, quantized) — and each accepts its own job configuration. The exact field names, defaults, and available knobs differ between them. For the full per-backend hyperparameter reference, see [Training Configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).

</Note>

## Parallelism

NeMo Platform Customizer supports various distributed training parallelization methods, which can be mixed together.
The Automodel backend supports several distributed training parallelization methods, which can be mixed together. The Unsloth backend runs on a single GPU and does not use these settings.

### Tensor Parallelism

[Tensor Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#tensor-parallelism) (TP) distributes the parameter tensor of an individual layer across GPUs. In addition to reducing model state memory usage, it also saves activation memory as the per-GPU tensor sizes shrink. The tradeoff is increased CPU overhead.

TP can be configured via `parallelism.tensor_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-jobs/training-configuration).
TP can be configured via `parallelism.tensor_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).

<Note>

Expand All @@ -232,7 +238,7 @@ As of release 25.10.0, AutoModel engines including Phi-4, Qwen, and Gemma suppor

[Pipeline Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#pipeline-parallelism) (PP) distributes the layers of a neural network across GPUs. The GPUs then process the different layers sequentially.

PP can be configured via `parallelism.pipeline_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-jobs/training-configuration).
PP can be configured via `parallelism.pipeline_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).

#### Configuration

Expand All @@ -246,11 +252,11 @@ PP can be configured via `parallelism.pipeline_parallel_size` in the [training c
- Smaller TP values generally have less communication overhead.
- Larger TP values provide more memory savings but increase communication costs.

### Sequence Parallelism
### Context Parallelism

[Sequence Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#sequence-parallelism) (SP) extends tensor-level model parallelism by distributing computing load and activation memory across multiple GPUs along the sequence dimension of transformer layers. This method is particularly useful when training on the datasets with longer sequences. It also benefits portions of the layer that have previously not been parallelized, enhancing overall model performance and efficiency.
[Context Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#context-parallelism) (CP) distributes activation memory along the sequence dimension across GPUs, which is particularly useful when training on datasets with very long sequences.

Sequence Parallelism can be enabled/disabled using `parallelism.sequence_parallel` in the [training configuration](/documentation/customizer-reference/manage-jobs/training-configuration).
Context Parallelism can be configured via `parallelism.context_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).

## Sequence Packing

Expand All @@ -260,46 +266,24 @@ Sequence Parallelism can be enabled/disabled using `parallelism.sequence_paralle
- Maximize GPU compute efficiency
- Optimize GPU memory usage

When enabled, the `batch_size` and number of training steps update so that each gradient iteration sees, on average, the same number of tokens compared to running fine-tuning _without_ sequence packing.
When enabled, the effective batch size and number of training steps update so that each gradient iteration sees, on average, the same number of tokens compared to running fine-tuning _without_ sequence packing.

### Limitations
Sequence packing is enabled per backend:

- **Automodel**: set `batch.sequence_packing` to `true`.
- **Unsloth**: set `dataset.packing` to `true`.

- Sequence packing is an experimental feature only supprted by the following models:
- meta/llama-3.1-8b-instruct
- meta/llama-3.1-70b-instruct
- meta/llama3-70b-instruct
- meta/llama-3.2-3b-instruct
- meta/llama-3.2-1b
- meta/llama-3.2-1b-instruct
See [Training Configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration) for the full batch and dataset options.

### Limitations

- Sequence packing is an experimental feature whose support varies by model and backend.
- Chat prompt templates do not have support for sequence packing.

<Note>

If `training.sequence_packing` is enabled when using a model that does not support sequence packing, the fine-tuning will proceed _without_ sequence packing and a warning will be returned in the API response.
If sequence packing is enabled for a model that does not support it, fine-tuning proceeds _without_ sequence packing and a warning is returned in the API response.

</Note>
### Example of using in the API

Example of creating a customization job with sequence packing enabled:

```python
job = client.customization.jobs.create(
workspace="default",
name="my-packed-job",
spec={
"model": "default/llama-3.1-8b-instruct",
"dataset": "fileset://default/test-dataset",
"training": {
"type": "sft",
"peft": {"type": "lora", "rank": 16},
"sequence_packing": True,
"epochs": 10,
"batch_size": 16,
"learning_rate": 0.00001,
},
},
)
```

Learn how to create a LoRA customization job with sequence packing by following the [Optimizing for Tokens/GPU](tutorials/optimize-throughput.ipynb) tutorial.
Learn how to create a LoRA customization job with sequence packing by following the [Optimizing for Tokens/GPU](/documentation/customizer-reference/tutorials/optimize-throughput) tutorial.
21 changes: 7 additions & 14 deletions docs/customizer/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ Learn how to fine-tune models by making requests to NVIDIA NeMo Customizer throu
At a high level, the fine-tuning workflow consists of the following steps:

1. [Create a Model Entity](/documentation/customizer-reference/manage-model-entities/overview) pointing to your base model checkpoint (stored as a FileSet).
1. Format a compatible [dataset](/documentation/fine-tune-models/tutorials/format-training-dataset).
1. [Create a customization job](/documentation/fine-tune-models/manage-customization-jobs) referencing the Model Entity.
1. Format a compatible [dataset](/documentation/customizer-reference/tutorials/format-training-dataset).
1. [Create a customization job](/documentation/customizer-reference/manage-customization-jobs) referencing the Model Entity.
1. Monitor the job until it completes.
1. The customization job automatically creates either:
- **LoRA jobs**: An adapter attached to the original Model Entity
Expand Down Expand Up @@ -49,7 +49,7 @@ View the available Phi models from Microsoft, designed for strong reasoning capa
View the available GPT-OSS models supported for Full SFT customization.

</Card>
<Card title="Embedding Models" href="/documentation/fine-tune-models/models/embedding">
<Card title="Embedding Models" href="/documentation/customizer-reference/models/embedding">

View the available embedding models for question-answering and retrieval tasks.

Expand All @@ -63,7 +63,7 @@ Perform common fine-tuning tasks.

<Cards>

<Card title="Manage Customization Jobs" href="/documentation/fine-tune-models/manage-customization-jobs">
<Card title="Manage Customization Jobs" href="/documentation/customizer-reference/manage-customization-jobs">

Create, list, view, and cancel customization jobs.

Expand All @@ -89,7 +89,7 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks.

<Cards>

<Card title="Format Training Datasets" href="/documentation/fine-tune-models/tutorials/format-training-dataset">
<Card title="Format Training Datasets" href="/documentation/customizer-reference/tutorials/format-training-dataset">

Learn how to format datasets for different model types.

Expand All @@ -109,13 +109,6 @@ Learn how to start a SFT customization job using a custom dataset.

<small><span class="md-tag">nemo-customizer</span></small>

</Card>
<Card title="Align a Model with DPO" href="tutorials/dpo-customization-job.ipynb">

Learn how to align a model with DPO (Direct Preference Optimization) using preference data.

<small><span class="md-tag">nemo-customizer</span> <span class="md-tag">dpo</span></small>

</Card>
<Card title="Distill a Model with Knowledge Distillation" href="tutorials/distillation-customization-job.ipynb">

Expand All @@ -124,7 +117,7 @@ Learn how to compress a larger teacher model into a smaller student model.
<small><span class="md-tag">nemo-customizer</span> <span class="md-tag">knowledge-distillation</span></small>

</Card>
<Card title="Check Customization Job Metrics" href="/documentation/fine-tune-models/tutorials/metrics">
<Card title="Check Customization Job Metrics" href="/documentation/customizer-reference/tutorials/metrics">

Learn how to check job metrics using MLFlow or Weights & Biases.

Expand All @@ -147,7 +140,7 @@ Learn how to optimize the token-per-GPU throughput for a LoRA optimization job.

<Cards>

<Card title="Hyperparameters" href="/documentation/customizer-reference/manage-jobs/training-configuration">
<Card title="Hyperparameters" href="/documentation/customizer-reference/manage-customization-jobs/training-configuration">

View the available hyperparameters and their valid options that you can set when creating a customization job.

Expand Down
42 changes: 22 additions & 20 deletions docs/customizer/manage-customization-jobs/cancel-job.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ export NMP_BASE_URL="https://your-nmp-base-url"

## To Cancel a Customization Job

Running jobs may be cancelled. A cancelled job does not upload checkpoints. You need the job's name and workspace; you can get these from [List Active Jobs](/documentation/customizer-reference/manage-jobs/list-active-jobs).
Running jobs may be cancelled. A cancelled job does not upload checkpoints. Customization jobs run on the platform's Jobs service, so you cancel them through that service (the same way for both backends) using the job's name and workspace. You can get these from [List Active Jobs](/documentation/customizer-reference/manage-customization-jobs/list-active-jobs).

Use the SDK to cancel a customization job:
Use the SDK to cancel a job:

```python
import os
Expand All @@ -32,10 +32,10 @@ client = NeMoPlatform(
workspace="default",
)

# Cancel a customization job (use the job name and workspace from List Active Jobs)
job_name = "my-sft-job"
# Cancel a job (use the job name and workspace from List Active Jobs)
job_name = "automodel-a1b2c3d4e5f6"
workspace = "default"
cancelled_job = client.customization.jobs.cancel(name=job_name, workspace=workspace)
cancelled_job = client.jobs.cancel(name=job_name, workspace=workspace)

print(f"Job {cancelled_job.name} has been cancelled")
print(f"Current status: {cancelled_job.status}")
Expand All @@ -48,28 +48,30 @@ print(f"Updated at: {cancelled_job.updated_at}")

```json
{
"name": "my-sft-job",
"name": "automodel-a1b2c3d4e5f6",
"workspace": "default",
"id": "job-abc123def456",
"id": "platform-job-2k8i3i1HqJHHPVB5M6Bk9Z",
"source": "automodel",
"status": "cancelled",
"spec": {
"model": "default/llama-3-2-1b",
"dataset": "fileset://default/my-training-dataset",
"model": "default/qwen3-1.7b",
"dataset": { "training": "default/my-training-dataset" },
"training": {
"type": "sft",
"batch_size": 16,
"epochs": 3,
"learning_rate": 1e-05,
"max_seq_length": 4096,
"parallelism": {
"num_gpus_per_node": 2,
"tensor_parallel_size": 2
}
"training_type": "sft",
"finetuning_type": "all_weights",
"max_seq_length": 4096
},
"schedule": { "epochs": 3 },
"batch": { "global_batch_size": 16, "micro_batch_size": 1 },
"optimizer": { "learning_rate": 1e-05 },
"parallelism": {
"num_gpus_per_node": 2,
"tensor_parallel_size": 2
},
"output": {
"name": "my-finetuned-llama",
"name": "my-finetuned-qwen",
"type": "model",
"fileset": "my-finetuned-llama-a1b2c3d4e5f6"
"fileset": "my-finetuned-qwen-a1b2c3d4e5f6"
}
},
"created_at": "2026-02-09T10:30:00.000Z",
Expand Down
Loading
Loading