Skip to content

Conversation

@mollyheamazon
Copy link
Contributor

Issue #, if available:
Currently directly passing a s3 uri with jsonl file format

sft_trainer = SFTTrainer(
    ...
    training_dataset="s3://olympus-cft-dev-us-east-1/mm-cft/bedrock-input/nova-2-0/llava-cot-cleaned-300-reasoning_converse.jsonl"
)

would throw an error:

ClientError: Invalid input error: Input 'train' data file not found. Please verify that S3DataType is specified as Converse and the 'train' data channel was provided for the job, exit code: 0

Description of changes:
Update the s3_data_type from S3Prefix to Converse, since the _convert_input_data_to_channels function is only used for fine-tuning jobs.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@mollyheamazon
Copy link
Contributor Author

This is a feature request rather than a bug fix. Would need further discussion on the design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants