Background#
Ray is a unified framework that enables distributed computing, as a single server may not be capable of handling heavy computational tasks. In this article, we present an in-depth analysis of two important features of Ray, namely multitenancy and auto-scaling. Multitenancy allows for the allocation of distinct resources, such as CPU and memory, to different users. On the other hand, auto-scaling facilitates the addition or removal of instances based on traffic and utilization levels.
The reason to choose Kubernetes#
While Ray offers an official auto-scaling solution that scales the Ray cluster according to resource requests, our particular scenario requires auto-scaling based on memory usage. As a result, we have developed two potential solutions. The first solution involves utilizing the Ray resource monitor to monitor the resource usage of each task and actor. Given that we have allocated different quotas for each user, it is necessary to accumulate the resource usage of each task and actor. Additionally, we must establish a mechanism to determine when to trigger auto-scaling based on memory usage. The second solution involves utilizing the Kubernetes Horizontal Pod Autoscaler to automatically adjust the number of replicas based on memory usage. Thus, we mainly choose the second solution here.
Build up Ray Clusters#
Assuming that you have already set up the Kubernetes and Harbor environment, the next step would be to install the Ray cluster. It is important to note that you will need to modify the image to match your specific configuration.
# Ray head node service, allowing worker pods to discover the head node to perform the bidirectional communication.
# More contexts can be found at [the Ports configurations doc](https://docs.ray.io/en/latest/ray-core/configure.html#ports-configurations).
apiVersion: v1
kind: Service
metadata:
name: service-ray-cluster
labels:
app: ray-cluster-head
spec:
clusterIP: None
ports:
- name: client
protocol: TCP
port: 10001
targetPort: 10001
- name: dashboard
protocol: TCP
port: 8265
targetPort: 8265
- name: gcs-server
protocol: TCP
port: 6380
targetPort: 6380
selector:
app: ray-cluster-head
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-ray-head
labels:
app: ray-cluster-head
spec:
# Do not change this - Ray currently only supports one head node per cluster.
replicas: 1
selector:
matchLabels:
component: ray-head
type: ray
app: ray-cluster-head
template:
metadata:
labels:
component: ray-head
type: ray
app: ray-cluster-head
spec:
# If the head node goes down, the entire cluster (including all worker
# nodes) will go down as well. If you want Kubernetes to bring up a new
# head node in this case, set this to "Always," else set it to "Never."
restartPolicy: Always
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if it's not a shared memory volume.
# volumes:
# - name: dshm
# emptyDir:
# medium: Memory
containers:
- name: ray-head
image: rayproject/ray:2.3.0
imagePullPolicy: Always
command: [ "/bin/bash", "-c", "--" ]
# if there is no password for Redis, set --redis-password=''
args:
- "ray start --head --port=6380 --num-cpus=$MY_CPU_REQUEST --dashboard-host=0.0.0.0 --object-manager-port=8076 --node-manager-port=8077 --dashboard-agent-grpc-port=8078 --dashboard-agent-listen-port=52365 --redis-password='123456' --block"
ports:
- containerPort: 6380 # GCS server
- containerPort: 10001 # Used by Ray Client
- containerPort: 8265 # Used by Ray Dashboard
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if it's not a shared memory volume.
# volumeMounts:
# - mountPath: /dev/shm
# name: dshm
env:
# RAY_REDIS_ADDRESS lets ray use external Redis for fault tolerance
- name: RAY_REDIS_ADDRESS
value: redis:6379 # ip address for the external Redis, which is "redis:6379" in this example
# This is used in the ray start command so that Ray can spawn the
# correct number of processes. Omitting this may lead to degraded
# performance.
- name: MY_CPU_REQUEST
valueFrom:
resourceFieldRef:
resource: requests.cpu
resources:
limits:
cpu: "2"
memory: "5G"
requests:
# For production use-cases, we recommend specifying integer CPU reqests and limits.
# We also recommend setting requests equal to limits for both CPU and memory.
# For this example, we use a 500m CPU request to accomodate resource-constrained local
# Kubernetes testing environments such as KinD and minikube.
cpu: "500m"
# The rest state memory usage of the Ray head node is around 1Gb. We do not
# recommend allocating less than 2Gb memory for the Ray head pod.
# For production use-cases, we recommend allocating at least 8Gb memory for each Ray container.
memory: "2G"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-ray-worker
labels:
app: ray-cluster-worker
spec:
# Change this to scale the number of worker nodes started in the Ray cluster.
selector:
matchLabels:
component: ray-worker
type: ray
app: ray-cluster-worker
template:
metadata:
labels:
component: ray-worker
type: ray
app: ray-cluster-worker
spec:
restartPolicy: Always
# volumes:
# - name: dshm
# emptyDir:
# medium: Memory
containers:
- name: ray-worker
image: rayproject/ray:2.3.0
imagePullPolicy: Always
command: ["/bin/bash", "-c", "--"]
args:
- "ray start --num-cpus=$MY_CPU_REQUEST --address=service-ray-cluster:6380 --object-manager-port=8076 --node-manager-port=8077 --dashboard-agent-grpc-port=8078 --dashboard-agent-listen-port=52365 --block"
# This volume allocates shared memory for Ray to use for its plasma
# object store. If you do not provide this, Ray will fall back to
# /tmp which cause slowdowns if it's not a shared memory volume.
# volumeMounts:
# - mountPath: /dev/shm
# name: dshm
env:
# This is used in the ray start command so that Ray can spawn the
# correct number of processes. Omitting this may lead to degraded
# performance.
- name: MY_CPU_REQUEST
valueFrom:
resourceFieldRef:
resource: requests.cpu
# The resource requests and limits in this config are too small for production!
# It is better to use a few large Ray pods than many small ones.
# For production, it is ideal to size each Ray pod to take up the
# entire Kubernetes node on which it is scheduled.
resources:
limits:
cpu: "1"
memory: "1G"
# For production use-cases, we recommend specifying integer CPU reqests and limits.
# We also recommend setting requests equal to limits for both CPU and memory.
# For this example, we use a 500m CPU request to accomodate resource-constrained local
# Kubernetes testing environments such as KinD and minikube.
requests:
cpu: "500m"
memory: "1G"
Once you have created the ray-dev.yaml
file on your server, you can proceed to install the Ray head and worker. If you want to install a Ray cluster in the ns
namespace, you can do so by executing the following command:
kubectl apply -f ray-dev.yaml -n ns
After creating the ray-dev.yaml
file on the server and installing the Ray head and worker, the next step is to enable Horizontal Pod Autoscaling (HPA) for the worker.
To enable HPA for the worker, you need to create another file called ray-autoscaler.yaml
on the server. This file will contain the configuration for the Kubernetes HPA object, which will automatically scale the number of worker pods based on CPU utilization. Here's an example ray-autoscaler.yaml
file:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: ray-autoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: deployment-ray-worker
minReplicas: 1
maxReplicas: 2
targetCPUUtilizationPercentage: 60
This HPA will automatically adjust the number of replicas of the specified deployment based on the CPU utilization, with a minimum of 1 replica and a maximum of 2 replicas, targeting a CPU utilization of 60%.
To apply this HPA in the ns
namespace, execute the following command:
kubectl apply -f ray-autoscaler.yaml -n ns
Monitoring and Observability#
The next step is to enable cluster monitoring. Ray auto-generates a Prometheus service discovery file named prom_metrics_service_discovery.json
on the head node, which facilitates service discovery for metrics agents. To enable monitoring for your Ray cluster using the default Prometheus for Kubernetes, you need to map this file out and add it to the Prometheus configuration.
We can update the ray-dev.yaml
configuration file to use a persistent volume claim (PVC) to map the /tmp/ray
directory (please refer to the doc for more details). This involves adding the following configuration for volumes
and volumeMounts
to the YAML file. Finally, we can associate this PVC with our Prometheus instance.
spec:
volumes:
- name: ray-prom
persistentVolumeClaim:
claimName: ray-prom-pvc
containers:
- name: ray-head
volumeMounts:
- name: ray-prom
mountPath: /tmp/ray