Managing Nodes on Huawei DCS
This document explains how to manage worker nodes using Cluster API Machine resources.
TOC
PrerequisitesOverviewWorker Node DeploymentStep 1: Configure IP-Hostname PoolStep 2: Configure Machine TemplatePersistent Disks Managed by the IP PoolStep 3: Configure Bootstrap TemplateStep 4: Configure Machine DeploymentNode Management OperationsScaling Worker NodesAdding Worker NodesRemoving Worker NodesUpgrading Machine InfrastructureUpdating Bootstrap TemplatesUpgrading Kubernetes VersionManaging Node Pools Using the Web UIViewing Node PoolsAdding a Worker Node PoolDeleting a Worker Node PoolViewing Conditions (Control Plane Only)Next StepsPrerequisites
Important Prerequisites
- The control plane must be deployed before performing node operations. See Create Cluster for setup instructions.
- Ensure you have proper access to the DCS platform and required permissions.
Configuration Guidelines When working with the configurations in this document:
- Only modify values enclosed in
<>brackets - Replace placeholder values with your environment-specific settings
- Preserve all other default configurations unless explicitly required
Overview
Worker nodes are managed through Cluster API Machine resources, providing declarative and automated node lifecycle management. The deployment process involves:
- IP-Hostname Pool Configuration - Network settings for worker nodes
- Machine Template Setup - VM specifications
- Bootstrap Configuration - Node initialization and join settings
- Machine Deployment - Orchestration of node creation and management
Worker Node Deployment
Step 1: Configure IP-Hostname Pool
The IP-Hostname Pool defines the network configuration for worker node virtual machines. You must plan and configure the IP addresses, hostnames, DNS servers, and other network parameters before deployment.
On Huawei DCS, the IP pool is also where you declare persistent disks that must survive VM replacement. Use persistentDisk for the platform-required /var/cpaas disk and for any other worker-node disk that must be preserved during delete-recreate operations. This workflow requires DCS provider v1.0.16 or later.
Pool Size Requirement The pool must include at least as many entries as the number of worker nodes you plan to deploy. Insufficient entries will prevent node deployment.
Example:
Create a DCSIpHostnamePool named <worker-iphostname-pool-name>:
Key parameters:
Step 2: Configure Machine Template
The DCSMachineTemplate defines the specifications for worker node virtual machines, including VM templates, compute resources, storage configuration, and network settings.
Required Disk Configurations The following disk mount points are mandatory. Do not remove them:
- System volume (
systemVolume: true) /var/lib/kubelet- Kubelet data directory/var/lib/containerd- Container runtime data
Configure /var/cpaas in the IP pool as a persistent disk, not in DCSMachineTemplate.
You may add additional template disks, but these essential template disks must be preserved.
Example:
Create a DCSMachineTemplate named <worker-dcs-machine-template-name>:
Key parameters:
*Required when parent object is specified
Persistent Disks Managed by the IP Pool
Declare any upgrade-preserved disk in the matching DCSIpHostnamePool.spec.pool[].persistentDisk entry (DCS provider v1.0.16 or later).
- Use this for
/var/cpaas, which is required by the platform. - Keep
DCSMachineTemplatefor the system disk and template-local disks that may be recreated with the VM. - Choose a unique
slotper IP entry. The controller uses(ip, slot)as the persistent-disk identity. - On replacement nodes, the guest disk setup logic checks for an existing filesystem. If the disk is already formatted, it skips
mkfsand mounts the disk directly. - Persistent-disk workflows require one-by-one replacement, so keep
MachineDeployment.spec.strategy.rollingUpdate.maxSurge = 0. - You can append new
persistentDiskentries, but deleting existing entries is not supported. The controller attaches the newly added disk to the running VM on the DCS side, but it does not format or mount the disk inside the guest OS. Guest formatting and mounting take effect only after the VM is replaced and the replacement VM runs the generated disk setup during bootstrap. - Treat
format,options, andpciTypeas immutable after creation. - Treat
quantityGBand datastore changes as rollout-sensitive changes. The webhook performs best-effort validation against the DCS platform when it has enough cluster context.
To inspect the runtime state of persistent disks during node operations, check status.persistentDiskStatus on the pool:
Step 3: Configure Bootstrap Template
The KubeadmConfigTemplate defines the bootstrap configuration for worker nodes, including user accounts, SSH keys, system files, and kubeadm join settings.
Template Optimization The template includes pre-optimized configurations for security and performance. Modify only the parameters that require customization for your environment.
Example:
Step 4: Configure Machine Deployment
The MachineDeployment orchestrates the creation and management of worker nodes by referencing the previously configured DCSMachineTemplate and KubeadmConfigTemplate resources. It manages the desired number of nodes and handles rolling updates.
Example:
Key parameters:
Node Management Operations
This section covers common operational tasks for managing worker nodes, including scaling, updates, upgrades, and template modifications.
Cluster API Framework Node management operations are based on the Cluster API framework. For detailed information, refer to the official Cluster API documentation.
Scaling Worker Nodes
Worker node scaling allows you to adjust cluster capacity based on workload demands. The Cluster API manages the node lifecycle automatically through the MachineDeployment resource.
Adding Worker Nodes
Increase the number of worker nodes to handle increased workload or add new capacity.
Use Case: Scale up cluster to add more compute resources
Prerequisites:
- Verify the IP pool has sufficient available IP addresses for new nodes
- Ensure the DCS platform has adequate resources to provision new VMs
Procedure:
-
Check Current Node Status
View the current machines in the cluster:
-
Extend IP Pool
Before scaling up, add new IP configurations to the pool for the additional nodes.
INFOIP Pool Expansion The IP pool must contain at least as many entries as the desired replica count. Add new IP entries for each additional worker node you plan to deploy.
Add IP entries to the pool:
First, export the current pool configuration to preserve existing entries:
Then use the following command to add new IP configurations. The
poolarray must include all existing entries plus the new entries:WARNINGImportant Notes
- The
poolarray must include all existing entries plus the new entries you want to add - Copy the existing entries from the exported YAML to avoid data loss
kubectl patch --type='merge'replaces the entirespec.poolarray, so copy every existingpersistentDiskblock unchanged unless you are intentionally adding new disks- Ensure each new entry has unique
ip,hostname, andmachineNamevalues - If new worker nodes also need the platform-required
/var/cpaasdisk, declare it in each new entry'spersistentDisk - Network parameters (
mask,gateway,dns) typically match existing entries
Example: Adding 2 new nodes to an existing pool of 3 nodes
- The
-
Verify IP Pool Capacity
After extending the IP pool, verify it has sufficient entries for the desired replica count:
Check that the pool contains at least as many entries as the desired replica count.
-
Scale Up the MachineDeployment
Update the
replicasfield to the desired number of nodes:Example: Scale from 3 to 5 nodes
-
Monitor the Scaling Progress
Watch the machine creation process:
The Cluster API controller will automatically create new machines based on the MachineDeployment template.
-
Verify Nodes Joined the Cluster
Switch to the target cluster context and verify the new nodes:
The new nodes should appear in the list and transition to
Readystatus.
Rolling Update Behavior When scaling up, new nodes are created immediately without affecting existing nodes. This ensures zero-downtime scaling.
Removing Worker Nodes
Decrease the number of worker nodes to reduce cluster capacity or remove underutilized resources. The Cluster API supports two removal strategies:
- Random removal: Reduce replicas, the platform randomly selects and deletes machines
- Targeted removal: Mark specific machines for deletion, then reduce replicas (recommended for IP recovery)
IP Recovery Scenario When you need to recycle specific machine IPs (e.g., for reassignment or IP pool management), use the targeted removal method. The deletion annotation ensures the platform deletes the marked machines, not random ones.
Data Loss Warning Scaling down removes nodes and their associated VMs. Ensure:
- Workloads can tolerate node loss through proper replication
- No critical data is stored only on the nodes being removed
- Applications are designed for horizontal scaling
Declared persistent disks in DCSIpHostnamePool.spec.pool[].persistentDisk are not deleted just because a Machine is replaced. They remain available for reuse while the corresponding IP slot stays in the pool. Removing the IP slot from the pool, deleting the pool, or deleting the cluster can trigger persistent-volume cleanup.
Random Removal
Use Case: Scale down cluster where any node can be removed (no specific IP requirements)
Procedure:
-
Identify Current Machine Status
View the current machines in the MachineDeployment:
-
Scale Down the MachineDeployment
Update the
replicasfield to reduce the node count:Example: Scale from 5 to 3 nodes
The Cluster API controller will randomly select and delete machines to match the desired replica count.
-
Monitor the Removal Progress
Watch the machine deletion process:
The Cluster API controller will:
- Drain the selected nodes (evict pods if possible)
- Delete the underlying VMs from the DCS platform
- Remove the machine resources
-
Verify Nodes Removed
Switch to the target cluster context:
The removed nodes should no longer appear in the list.
Targeted Removal
Use Case: Remove specific machines (e.g., for IP recovery, replace unhealthy nodes)
Procedure:
-
Identify Machines to Remove
View the current machines:
Note the
<machine-name>of the machines you want to remove. -
Annotate Machines for Deletion
Mark the specific machines for deletion:
Repeat for each machine you want to remove.
Example: Remove two specific machines
-
Scale Down the MachineDeployment
After annotating the machines, reduce the replica count:
INFOReplica Count Must Match Annotated Machines Reduce replicas by exactly the number of annotated machines.
- If you reduce by fewer, not all annotated machines will be removed
- If you reduce by more, additional machines will be randomly selected for deletion
Example: If you annotated 2 machines, reduce replicas by exactly 2 (e.g., from 5 to 3)
The platform will delete the annotated machines, not randomly selected ones.
-
Monitor the Removal Progress
Watch the machine deletion process:
-
Verify Nodes Removed
Switch to the target cluster context:
The removed nodes should no longer appear in the list.
Upgrading Machine Infrastructure
To upgrade worker machine specifications (CPU, memory, disk, VM template), follow these steps:
-
Create New Machine Template
- Copy the existing
DCSMachineTemplatereferenced by yourMachineDeployment - Modify the required values (CPU, memory, disk, VM template, etc.)
- Give the new template a unique name
- Apply the new
DCSMachineTemplateto the cluster
- Copy the existing
-
Update Machine Deployment
- Modify the
MachineDeploymentresource - Update the
spec.template.spec.infrastructureRef.namefield to reference the new template - Apply the changes
- Modify the
-
Rolling Update
- The system will automatically trigger a rolling update
- Worker nodes will be replaced with the new specifications
- Any disks declared in
DCSIpHostnamePool.spec.pool[].persistentDiskare detached from the old VM and reattached to the replacement VM - Monitor the update progress through the MachineDeployment status
If you are migrating an existing cluster from the old template-disk layout to pool-managed persistent disks, follow Migrate Existing Huawei DCS Clusters to Pool-Managed Persistent Disks before you rely on upgrade-time data preservation.
Updating Bootstrap Templates
Bootstrap templates (KubeadmConfigTemplate) are used by MachineDeployment and MachineSet resources. Changes to existing templates do not automatically trigger rollouts of existing machines; only new machines use the updated template.
Update Process:
-
Export Existing Template
-
Modify Configuration
- Update the desired fields in the exported YAML
- Change the
metadata.nameto a new unique name - Remove extraneous metadata fields (
resourceVersion,uid,creationTimestamp, etc.)
-
Create New Template
-
Update MachineDeployment
- Modify the MachineDeployment resource
- Update
spec.template.spec.bootstrap.configRef.nameto reference the new template - Apply the changes to trigger a rolling update
Template Rollout Behavior Existing machines continue using the old bootstrap configuration. Only newly created machines (during scaling or rolling updates) will use the updated template.
Upgrading Kubernetes Version
For Kubernetes upgrades on Huawei DCS, see Upgrading Kubernetes on Huawei DCS. That guide covers the required upgrade order, the YAML workflow for MachineDeployment resources, and the web UI workflow for Node Pool upgrades.
Managing Node Pools Using the Web UI
Node pools provide a declarative way to manage groups of nodes with identical configurations. You can view, add, and delete worker node pools through the web UI.
Version requirement: This workflow requires Fleet Essentials and Alauda Container Platform DCS Infrastructure Provider 1.0.13 or later. If the provider version is earlier than 1.0.13, use the YAML-based node pool workflows in this document. If the node-pool workflow relies on pool-managed persistent disks, use DCS provider v1.0.16 or later. In v1.0.16, the persistentDisk declaration on DCSIpHostnamePool remains YAML-only and is not exposed in the node-pool UI.
If the node pool relies on pool-managed persistent disks, prepare or update the corresponding DCSIpHostnamePool entry with YAML before you use the web UI workflow here.
Navigation: Clusters → Clusters → Select cluster → Node Pools Tab
Viewing Node Pools
The Node Pools Tab displays all node pools in the cluster:
Control Plane Node Pool:
- Fixed at 3 replicas for high availability
- Displays Kubernetes version with upgrade indicator if available
- Shows Conditions link for detailed status
Worker Node Pools:
- Customizable replica counts
- Individual Kubernetes version management
- Scale and upgrade operations
Node Pool Card Information:
Adding a Worker Node Pool
Navigation: Node Pools Tab → Click Add Worker Node Pool
Form Fields:
Validation:
- Pool name must be unique within the cluster
- IP Pool must have sufficient available IP addresses (≥ Replicas)
- maxSurge/maxUnavailable constraints must be satisfied
- If the node pool relies on persistent disks, keep
maxSurge = 0so Machines are replaced one by one
Tip: Prefix the pool name with the cluster name followed by a hyphen (e.g., mycluster-worker-1) to avoid naming conflicts.
After creation, new nodes appear in the Nodes Tab. The number of nodes equals the configured Replicas value.
Deleting a Worker Node Pool
Steps:
- Click the delete icon on the Worker Node Pool card
- Confirm deletion in the dialog
Deleting a worker node pool permanently removes all associated nodes and machines. Ensure workloads can tolerate the loss of these nodes through proper replication.
Viewing Conditions (Control Plane Only)
Click the Conditions link on the Control Plane Node Pool card to view detailed status information.
Conditions List: