Developing Custom Fine-Tuning Templates

TOC

Template Structure Overview

A custom fine-tuning template should include the essential configuration files and training scripts. For example, the YOLOv5 object detection fine-tuning template (finetune-object-detection) typically has the following directory structure:

  • Core training script: Handles the model training logic.
  • Utility scripts: Provide helper functions to interact with the platform.
  • Configuration files: Specify the training environment and parameters.

Core Responsibilities and Script Requirements

Your main responsibility is to implement a custom fine-tuning training script (usually named run.sh). To ensure your script integrates smoothly with Alauda AI platform sub-tasks, follow these three key requirements:

1. Import Platform Utility Scripts

At the beginning of your main training script (e.g., run.sh), include the following commands to load platform-provided utility functions:

#!/usr/bin/env bash

set -ex
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd -P)
source ${SCRIPT_DIR}/util.sh

Purpose: The util.sh script provides standard platform functions such as parameter retrieval, path resolution, and logging. Refer to the provided examples to ensure your script uses the built-in parameters and control flow correctly.

2. Model Output Path Notification

Before the training function exits, you must execute the following command to pass the output path of the fine-tuned model to subsequent tasks (such as model upload):

echo "${MODEL_PATH}/${OUTPUT_DIR}" > ${TASK_META_OUTPUT_PATH_FILE}

Purpose: This mechanism allows the platform to identify and collect the final training outputs. Ensure the path is constructed correctly (base model path + relative output directory).

3. Script Execution Permissions

Before uploading your fine-tuning template to the GitLab model repository, make sure all Bash script files (especially run.sh and any dependent .sh files) have executable permissions. Action: Set the permissions by running chmod +x *.sh or by specifying individual files.

Key Parameter Reference Table

When implementing your fine-tuning template, review the table below to understand the core parameters in the template directory and scripts, along with their meanings. These parameters define how the base model, dataset, and platform environment are connected. Recommendation: Before writing your own template, study the official sample templates to understand how parameters are used in real training workflows.

config.yaml – Template YAML File

ParameterDescriptionExampleNotes
imageDocker image required for fine-tuning trainingdocker.io/alaudadockerhub/yolov5-runtime:v0 .1.0
tool-imageUtility image for data download and uploaddocker.io/alaudadockerhub/git-tool:v0 .1.0
sub-templatesDefines fine-tuning sub-templates (e.g., distinguish LoRA partial fine-tuning and full fine-tuning with lora and full templates)Supports multi-language (zh/en) descriptions
paramsParameter list. Under params, specify each sub-template name, and define parameters as a list under each sub-templatedefault: the sub-template name specified in sub-templates
Parameter details:
name: parameter name (e.g., img)
env: corresponding environment variable
value: default value
display: tooltip text in the UI

util.sh – Utility Script

ParameterDescriptionExampleNotes
WORKSPACE_PATHFine-tuning workspace path/mnt/workspaceBuilt-in task parameter
FT_TASK_META_DIRDirectory for fine-tuning task metadata; stores metadata shared between sub-tasks/mnt/workspace/.taskBuilt-in task parameter
SIGNAL_FILE_PREPARE_DONESignal file generated after data download sub-task completes/mnt/workspace/.task/prepare.doneBuilt-in task parameter
SIGNAL_FILE_EXPORT_DONESignal file generated after training task completes/mnt/workspace/.task/export.doneBuilt-in task parameter
TASK_META_TEMPLATE_PATH_FILEFile where the data download sub-task saves the fine-tuning template path for later tasks/mnt/workspace/.task/meta_template_pathBuilt-in task parameter
BASE_MODEL_URLPath to the base model in the model repository (domain excluded)fy-c1/amlmodels/yolov5ENV variable
MODEL_TAGTag to downloadv0.1.0ENV variable. Download by tag only
MODEL_BRANCHBranch to downloadmainENV variable. Required when specifying a commit
MODEL_COMMITCommit ID to download6635e1b9ENV variable
DATASET_URLPath to the dataset in the model repository (domain excluded)fy-c1/amldatasets/coco128ENV variable
DATASET_TAGTag to downloadv0.1.0ENV variable. Download by tag only
DATASET_BRANCHBranch to downloadmainENV variable. Required when specifying a commit
DATASET_COMMITCommit ID to download6635e1b9ENV variable
DATASET_S3_URLS3 URL for the dataset (must end with bucket name)http://minio-service.kubeflow.svc.cluster.local:9000/finetuneENV variable. If both DATASET_S3_URL and DATASET_URL are set, S3 is used
DATASET_S3_PATHDataset storage path in S3 (excluding bucket)coco128ENV variable
DATASET_S3_ACCESSIDS3 access IDENV variable
DATASET_S3_ACCESSKEYS3 access keyENV variable
OUTPUT_MODEL_URLDestination path in the model repository for the uploaded modelfy-c1/amlmodels/yolov5_ft_coco128ENV variable
GIT_BASEGitLab base URLhttps://aml-gitlab.alaudatech.netENV variable
GIT_USERGitLab userENV variable
GIT_TOKENGitLab access tokenENV variable
N_RANKSDegree of parallelism for distributed job2Distributed job parameter
RANKCurrent task rank0Distributed job parameter
WORLD_SIZETotal number of processes in training2Distributed job parameter
MASTER_ADDRMaster process addressDistributed job parameter
MASTER_PORTMaster process port8888 (default)Distributed job parameter

run.sh – Template Execution Script

ParameterDescriptionNotes
set_extra_paramsSet parameters via environment variables
set_finetune_paramsSet hyperparameters via environment variables
launch_trainingMain fine-tuning training functionFor multiple training scenarios, you can define specific functions (e.g., CPU execution, single-node multi-GPU, or multi-node multi-GPU). See run.sh examples for details.
export_weightsGenerate/export the resulting model