Mastering Workload-Aware Scheduling in Kubernetes v1.36: A Step-by-Step Guide
Introduction
Kubernetes v1.36 introduces groundbreaking improvements for scheduling AI/ML and batch workloads, which require more than simple per-Pod decision-making. This release separates concerns between a static Workload template and a runtime PodGroup object, enabling atomic scheduling, topology awareness, and advanced preemption. In this guide, you will learn how to set up and use these new features to manage complex workloads efficiently.
What You Need
- A Kubernetes cluster running v1.36 (or later) with the
scheduling.k8s.io/v1alpha2API enabled. kubectlconfigured to access your cluster.- Basic understanding of Kubernetes Pods, controllers (e.g., Job), and scheduling concepts.
- Optional: A working knowledge of Dynamic Resource Allocation (DRA) if using resource claims.
Step-by-Step Instructions
Step 1: Understand the API Separation
In v1.36, the Workload API acts only as a static template for Pod groups. The runtime state—such as scheduling policy and individual Pod conditions—moves to the new PodGroup API (scheduling.k8s.io/v1alpha2). This clean split improves scalability by allowing per-replica sharding of status updates and simplifies the scheduler’s logic. Familiarize yourself with these two resources before proceeding.
Step 2: Define a Workload Template
Create a YAML file for your Workload object. Inside the spec.podGroupTemplates section, define one or more templates—each representing a group of Pods that must be scheduled together. For example, a gang scheduling policy requires a minimum number of Pods (minCount) before the group becomes schedulable.
apiVersion: scheduling.k8s.io/v1alpha2
kind: Workload
metadata:
name: training-job-workload
namespace: some-ns
spec:
podGroupTemplates:
- name: workers
schedulingPolicy:
gang:
minCount: 4The Workload does not contain runtime state; it only describes the desired group structure.
Step 3: Create the Workload Object
Apply the YAML to your cluster:
kubectl apply -f workload.yamlThis registers the template. A controller (such as the built-in Job controller or a custom one) will later generate PodGroup instances based on these templates.
Step 4: Let the Controller Generate PodGroup Instances
In v1.36, the Workload controller automatically stamps out runtime PodGroup objects from the templates you defined. Each PodGroup carries the effective scheduling policy and a reference to its parent template. It also includes a status.conditions array that mirrors the scheduling states of individual Pods. To inspect a PodGroup:
kubectl get podgroup -n some-nsThe scheduler now reads only the PodGroup, not the Workload, for faster decision-making.
Step 5: Configure the Scheduler for PodGroup Scheduling
Kube-scheduler in v1.36 includes a new PodGroup scheduling cycle. No additional configuration is required if you are using the default scheduler—it automatically recognizes PodGroups. However, if you have a custom scheduler, ensure it implements the podgroup scheduling plugin. This cycle enables atomic workload processing: all Pods in a group are considered together, which is essential for gang scheduling and batch jobs.
Step 6: Leverage Topology-Aware Scheduling and Preemption
v1.36 introduces initial support for topology-aware scheduling for PodGroups, which tries to place the entire group within a defined topology domain (e.g., same rack or GPU node). Also, workload-aware preemption improves fairness by considering PodGroups when preempting lower-priority Pods. To enable these, set appropriate topology keys and priority classes in your PodGroup templates or via the Workload’s spec.podGroupTemplates[].schedulingPolicy. For example:
schedulingPolicy:
topology:
topologyKey: kubernetes.io/hostnameStep 7: Use ResourceClaim for Dynamic Resource Allocation (DRA)
If your workload requires specialized hardware (e.g., GPUs, FPGAs), attach a ResourceClaim to the Workload or PodGroup. This unlocks Dynamic Resource Allocation, allowing Pods within a group to share resources efficiently. Define a ResourceClaim template under the Workload’s spec.podGroupTemplates[].resourceClaims or use an existing claim. Example:
resourceClaims:
- name: gpu-claim
source:
resourceClaimTemplateName: gpu-claim-templateThen reference the claim in the Pod template (usually via a container’s resources.claims). The scheduler will account for these resources when making group-level decisions.
Step 8: Integrate with the Job Controller (First Phase)
v1.36 ships the first phase of integration between the Job controller and the new Workload/PodGroup API. When you create a Job using the standard batch/v1 API, the Job controller can automatically generate a Workload object with appropriate pod group templates. To use this, enable the JobWorkload feature gate (if not already on by default) and set the annotation scheduling.k8s.io/workload-name on the Job’s Pod template. The controller will then create a PodGroup per replica set or per index, depending on your Job configuration.
Tips for Success
- Start small: Test with a simple gang of 2-4 Pods before scaling to large batch jobs.
- Monitor PodGroup status: Use
kubectl describe podgroupto see conditions likePodGroupScheduledorPodGroupFailed. - Combine with priority classes: Set different priority levels for PodGroups to control preemption behavior.
- Use namespaces: Isolate Workload and PodGroup resources in separate namespaces to avoid conflicts.
- Keep security in mind: RBAC roles should allow creation of
workloadsandpodgroupsunderscheduling.k8s.io. - Performance considerations: Because PodGroup status shards updates per replica, large groups scale better than in v1.35. However, keep the number of templates manageable.
- Stay updated: Future releases will expand topology-aware features and Job integration—check the changelog regularly.
Related Articles
- Redefining the American Dream: A Conversation on Integrity and Opportunity
- 7 Critical Updates: VSTest Drops Newtonsoft.Json Dependency – What You Need to Know
- Massive Discount on Lego Star Wars UCS Venator: A Must-Have for Collectors
- Unconventional NAS: Turning Odd Hardware into a Surprisingly Effective Storage Server
- Cruise Ship Hantavirus Outbreak: Key Questions Answered
- Unified Infrastructure Visibility: Q&A on HCP Terraform with Infragraph
- Your Step-by-Step Guide to Harnessing the HP Z6 G5 A as a Linux-Ready Powerhouse
- ROCm 7.2.3 vs ROCm 7.0.0: Performance Gains on the AMD Radeon AI PRO R9700