Directional Textual Inversion (DTI)

Directional Textual Inversion for Personalized Text-to-Image Generation

Kunhee Kim^1* · NaHyeon Park^1* · Kibeom Hong² · Hyunjung Shim¹

¹KAIST · ²Sookmyung Women's University

Overview

Textual Inversion (TI) is efficient but often suffers from embedding norm inflation—learned tokens drift to out-of-distribution magnitudes, degrading prompt fidelity in pre-norm Transformers.

Directional Textual Inversion (DTI) addresses this by:

Decoupling magnitude and direction: Fixing embedding magnitude to an in-distribution scale
Spherical optimization: Optimizing only the direction on the unit hypersphere via Riemannian SGD
vMF prior: Using a von Mises-Fisher prior for semantic coherence

DTI achieves superior text fidelity and subject preservation on Stable Diffusion XL (SDXL), SANA, and Wan2.1-T2V-1.3B (image-as-1-frame-video setup).

Our implementation is built on top of the HuggingFace diffusers library and is fully compatible with existing Textual Inversion (TI) pipelines.

Installation

Requirements: Python 3.9+ · PyTorch 2.0+ · CUDA GPU

git clone https://github.com/kunheek/dti
cd dti
pip install -e .

Using [`uv`](https://docs.astral.sh/uv/) (recommended for faster setup)

uv venv python=3.12
source .venv/bin/activate
pip install -e .

Quick Start

Train DTI on SDXL

python exps/ours_sdxl.py -g 0

This runs DTI on all DreamBooth subjects. To train on a specific subject:

python exps/ours_sdxl.py -g 0 --instances dog

Full training command

accelerate launch --mixed_precision=bf16 --num_processes=1 \
  scripts/train_sdxl.py \
  --pretrained_model_name_or_path "stabilityai/stable-diffusion-xl-base-1.0" \
  --train_data_dir data/dreambooth/dog \
  --output_dir output/dti-sdxl/dog \
  --placeholder_token "<dog>" \
  --initializer_token dog \
  --resolution 768 \
  --train_batch_size 4 \
  --max_train_steps 400 \
  --learning_rate 0.01 \
  --token_scale mean \
  --kappa 1e-4 \
  --decompose_scale true

Train on SANA

python exps/ours_sana.py -g 0 -m sana1.5_1.6b

Note: Our paper reports SANA with learning rate 0.02 and 1000 steps, but later experiments showed better performance with learning rate 0.01 and 500 steps. We recommend using the 0.01 + 500 combination for SANA.

Train on Wan2.1-T2V-1.3B (image-as-video, 1 frame)

python exps/ours_wan.py -g 2 -m wan2.1_t2v_1.3b

Wan uses Wan-AI/Wan2.1-T2V-1.3B-Diffusers. If you see ValueError: Unrecognized model ..., upgrade transformers and diffusers:

uv pip install --python .venv/bin/python --upgrade "git+https://github.com/huggingface/diffusers.git@main" "transformers>=4.48" "huggingface-hub>=0.34,<1.0"

Evaluate

python scripts/evaluate.py -e output/dti-sdxl

Training Options

Parameter	Description	Default
`-g`	GPU ID	`0`
`--instances`	Subject names	all
`--max_train_steps`	Training iterations	`400`
`--kappa`	vMF prior strength	`1e-4`
`--learning_rate`	Learning rate	`0.01`
`--token_scale`	Magnitude scale (`mean`, `max`, or float)	`mean`

Baselines

# Standard Textual Inversion
python exps/ti_sdxl.py -g 0

# DCO + DTI
python exps/ours_dco_sdxl.py -g 0

Data Format

data/
├── dreambooth.json
└── dreambooth/
    └── subject_name/
        ├── 00.jpg
        ├── 01.jpg
        └── ...

JSON format:

{
  "subject_name": {
    "path": "data/dreambooth/subject_name",
    "class": "dog",
    "initialization": "dog"
  }
}

Download DreamBooth dataset:

python scripts/download_datasets.py

Project Structure

dti/
├── src/dti/       # Core implementation
├── scripts/       # Training & evaluation scripts
├── exps/          # Experiment launchers
└── data/          # Datasets and configs

Citation

@inproceedings{kim2026directional,
  title={Directional Textual Inversion for Personalized Text-to-Image Generation},
  author={Kim, Kunhee and Park, NaHyeon and Hong, Kibeom and Shim, Hyunjung},
  booktitle={International Conference on Learning Representations},
  year={2026}
}

License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
docs		docs
exps		exps
notebooks		notebooks
scripts		scripts
src/dti		src/dti
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Directional Textual Inversion (DTI)

Overview

Installation

Quick Start

Train DTI on SDXL

Train on SANA

Train on Wan2.1-T2V-1.3B (image-as-video, 1 frame)

Evaluate

Training Options

Baselines

Data Format

Project Structure

Citation

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Directional Textual Inversion (DTI)

Overview

Installation

Quick Start

Train DTI on SDXL

Train on SANA

Train on Wan2.1-T2V-1.3B (image-as-video, 1 frame)

Evaluate

Training Options

Baselines

Data Format

Project Structure

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages