Migrate Existing Huawei DCS Clusters to Pool-Managed Persistent Disks
Use this guide when you upgrade an existing Huawei DCS cluster from the older template-disk layout to the current pool-managed persistent-disk model.
In DCS provider v1.0.16 or later, this migration is YAML-driven because DCSIpHostnamePool.spec.pool[].persistentDisk is not exposed in the web UI.
Version
Use this procedure when the cluster runs ACP v4.2.1 or later and the target DCS provider version is v1.0.16 or later.
This procedure currently assumes all of the following:
- The target environment uses the DCS controller implementation that supports pool-managed persistent disks.
- The DCS VM templates are
4.2.1or later. - Guest tools (
vmtools) are working inside the guest OS so safe shutdown and disk detach can complete.
TOC
OverviewBefore You StartInspect the Current Disk LayoutDetermine Which Disks Are ClaimableWorked ExampleUpdate theDCSMachineTemplateUpdate the DCSIpHostnamePoolTrigger the Rolling UpgradeVerify Claim, Detach, Conversion, and ReattachLimitations and Recovery NotesRelated TopicsOverview
Older DCS clusters created reusable data disks through DCSMachineTemplate. That layout does not give the controller enough information to preserve disks safely during delete-recreate replacement.
The current model moves upgrade-preserved disks into DCSIpHostnamePool.spec.pool[].persistentDisk. Each disk is bound to an (ip, slot) identity. During rolling replacement, the controller:
- Claims the existing disk from the old VM.
- Safely stops the old VM.
- Detaches the disk.
- Converts stock volumes to independent shared volumes when needed.
- Deletes the old VM.
- Reattaches the disk to the replacement VM.
- Boots the replacement VM, which mounts the existing filesystem without reformatting it.
This is also the documented model for the platform-required /var/cpaas disk.
Before You Start
Verify all of the following before you begin:
- The cluster is healthy and currently stable.
- Because pool-managed persistent disks require one-by-one replacement, the relevant control plane and worker rollout strategies use
maxSurge: 0. - You can identify the current disk
sequenceNumvalues on the old VMs from the DCS UI or by querying VM details through the DCS API. - You know which disks must be preserved and which disks can still be recreated with the VM.
- The target
DCSIpHostnamePoolalready exists and maps each node to a fixed IP slot.
Inspect the Current Disk Layout
First, identify the management-cluster objects and the DCS VM that backs each node:
For any DCSMachine you plan to migrate, inspect the current VM details and record the disk sequenceNum, size, datastore, and PCI type for each disk you want to preserve.
You can gather that information from:
- The DCS platform UI.
- Your existing operational tooling that wraps
QueryVmInfo. - Direct API inspection if your environment already exposes that workflow.
You need the following values for each preserved disk:
- Old
sequenceNum quantityGBdatastoreNameordatastoreClusterNamepathformatpciType
Determine Which Disks Are Claimable
Existing clusters can only claim disks that sit in the tail-contiguous region of the old VM disk layout.
Use the following formula:
Use these constants when you apply the formula:
systemDiskCount = 1newTemplateDataDiskCount= the number of non-system disks that remain in the newDCSMachineTemplate
The computed slot must:
- Be greater than or equal to
0 - Be unique within the same IP entry
If a disk is not in the tail-contiguous region, you must either:
- Move the disks between it and the old template tail into the pool-managed persistent-disk list as well, or
- Accept that the non-claimable disk will still be lost with the old VM
Worked Example
Assume the old template disk order is:
If the new template keeps only system + /var/lib/kubelet + /var/lib/containerd, then newTemplateDataDiskCount = 2.
Update the DCSMachineTemplate
Edit the currently referenced DCSMachineTemplate in place so it no longer declares the disks you want to preserve.
-
Export the current template:
-
Update the exported manifest:
- Keep the system disk.
- Keep only the template-local disks that should still be recreated with the VM.
- Remove all disks you want to preserve through the IP pool.
- If a target disk is only claimable when trailing disks are moved as well, remove those trailing disks from the template too.
- Keep the original
metadata.name, because this migration updates the currently referenced template in place. - Remove transient metadata fields such as
resourceVersion,uid,creationTimestamp, andmanagedFields.
-
Apply the updated template:
Update the DCSIpHostnamePool
Add persistentDisk entries to the matching IP slot for every preserved disk.
The spec interacts with the live disk attributes in three ways:
Strict claim match. A mismatch on any of these fields fails the claim and sets phase=Error with lastError. The controller retries on a slow loop until the spec is corrected:
quantityGB— must match the live disk size exactlydatastoreNameordatastoreClusterName— must point to the same storage target as the live diskpciType— must match the live disk PCI type. If omitted, the provider uses the defaultVIRTIO; verify the live disk PCI type before omitting this field, because a non-VIRTIOlive disk can fail the strict claim match
Filesystem (affects guest-side initialization, not the claim check):
format— used only when initializing a fresh disk. If the live disk already has a filesystem, the existing format is preserved andmkfsis skipped.
Guest-side (applied on replacement VMs only, not part of the claim check):
path— mount path inside the guestmountOptions— mount optionsoptions—mkfsoptions applied only on first format
For the platform-required /var/cpaas disk, move it into the pool-managed layout as part of this migration.
Set slot to the value you calculated in the previous section. Do not reuse a fixed example value across different disk layouts.
Example:
Apply the pool update:
Trigger the Rolling Upgrade
Before you trigger replacement:
- Confirm
KubeadmControlPlane.spec.rolloutStrategy.rollingUpdate.maxSurge = 0 - Confirm each
MachineDeployment.spec.strategy.rollingUpdate.maxSurge = 0
These settings are prerequisites for the migration and for later upgrade-time reuse of pool-managed persistent disks.
Then trigger the rollout:
Verify Claim, Detach, Conversion, and Reattach
Watch the management-cluster resources during the rollout:
Inspect the pool status to confirm the controller has claimed and tracked the disks:
During the transition, each record appears under status.persistentDiskStatus. The stable phases to watch for are:
phase: Attachedwhile the old VM still owns the diskphase: Availableafter the disk is detached (and converted from a stock volume to an independent shared volume when needed)phase: Attachedagain after the replacement VM reattaches the disk
Transient phases (Attaching, Detaching) may briefly appear during the corresponding operations; Deleting appears when a disk is being permanently removed, for example during pool or cluster cleanup. The full phase set is Creating, Available, Attaching, Attached, Detaching, Deleting, Error.
If a disk enters phase: Error, inspect lastError before retrying.
Limitations and Recovery Notes
- Only tail-contiguous disks are claimable in the existing-cluster migration path.
- The controller only protects disks that are declared in
persistentDisk. Any undeclared disk still follows VM lifecycle and may be deleted with the old VM. - This migration changes the ownership model of preserved disks. Do not keep the same disk defined in both
DCSMachineTemplateandDCSIpHostnamePool. - If you need to preserve
/var/cpaas, move it into the IP pool as part of this migration instead of leaving it in the template. - This runbook applies to clusters on ACP
v4.2.1or later that are moving to DCS providerv1.0.16or later.