Enable Fine-Tuning and Training Features
TOC
Install Plugins
- Ensure the
Volcanocluster plugin is installed. - Ensure the
MLflowcluster plugin is installed (deploying it requiresPostgreSQL).
Download the following plugin artifacts from https://cloud.alauda.cn or https://cloud.alauda.io and push these plugins to the ACP platform.
MLFlow: MLFlow tracking server for monitoring training experiments. After installation, an "MLFlow" menu entry will appear in the AML navigation bar. Volcano: Schedules training jobs using various scheduler plugins, including Gang-Scheduling and Binpack.
Go to "Administrator - Marketplace - Upload Packages", switch to the "Cluster Plugins" tab, find the uploaded plugins, and verify that their versions are correctly synced. Then go to "Administrator - Marketplace - Cluster Plugins", locate these plugins, click the "..." button on the right, and select "Install". Complete the setup form if required, then click "Install" to add the plugin to the current cluster.
Enable Features
Navigate to "Administrator - Clusters - Resources", then enter amlcluster in the search box on the left side.
Click the "Correlated with Cluster" panel to find the AmlCluster resource.
Within the AmlCluster resource, set tuneModels and datasets to true under spec.values.experimentalFeatures.
- When set to
true, the "Datasets" item appears in the left navigation menu. - When set to
true, the "Training" item appears in the left navigation menu. - When set to
true, the "Fine-Tuning" item appears in the left navigation menu.
Task Templates
- Custom template upload: Ensure your custom fine-tuning template files are complete and uploaded to
Task Template. - Template authoring guide: For instructions on creating custom templates, refer to the Fine-tuning Template Developing Guide.
Download Templates:
Download the alaudadockerhub/training-templates image, then run the following command to extract example templates:
The runtime image is provided for download only. Please import it into the platform image registry before use.
Upload Templates:
Using finetune-object-detection as an example, follow these steps:
- Modify the configuration file: Locate the
config.yamlfile in the template directory. - Update image references: In
config.yaml, update the following fields:image(training image): Replace the default training image with a YOLOv5 training image available in your AI platform image registry.tool-image(tool image): Replace the default tool image with a data download/upload tool image available in your AI platform image registry.
- Upload the modified
finetune-object-detectiondirectory as a template to the AI platform template repository.
Ensure the updated image references point to images that the training environment can successfully pull.
Runtime Container Images
The training and data operations rely on specific container images:
- Training image
- Download the image used for training and upload it to your local image repository (some templates may require you to build the image yourself).
- (Optional, for quick trials) For a fast start, pull and import the provided YoloV5 runtime image: docker.io/alaudadockerhub/yolov5-runtime:v0 .1.0
- Tool image (for auxiliary data download and upload)
- Data download and upload operations in tasks are handled by the tool image.
- Download and import the platform-provided general-purpose tool image: docker.io/alaudadockerhub/git-tool:v0 .1.0
Add Topics to Task Templates:
To ensure a template displays correctly on the Alauda AI platform, create the following Topics for the template project:
finetuneortrainv2object-detection(indicating the template type)