Enable Multitenancy and Auto-scaling in Ray By Kubernetes

Background#

Ray is a unified framework that enables distributed computing, as a single server may not be capable of handling heavy computational tasks. In this article, we present an in-depth analysis of two important features of Ray, namely multitenancy and auto-scaling. Multitenancy allows for the allocation of distinct resources, such as CPU and memory, to different users. On the other hand, auto-scaling facilitates the addition or removal of instances based on traffic and utilization levels.

The reason to choose Kubernetes#

While Ray offers an official auto-scaling solution that scales the Ray cluster according to resource requests, our particular scenario requires auto-scaling based on memory usage. As a result, we have developed two potential solutions. The first solution involves utilizing the Ray resource monitor to monitor the resource usage of each task and actor. Given that we have allocated different quotas for each user, it is necessary to accumulate the resource usage of each task and actor. Additionally, we must establish a mechanism to determine when to trigger auto-scaling based on memory usage. The second solution involves utilizing the Kubernetes Horizontal Pod Autoscaler to automatically adjust the number of replicas based on memory usage. Thus, we mainly choose the second solution here.

Build up Ray Clusters#

Assuming that you have already set up the Kubernetes and Harbor environment, the next step would be to install the Ray cluster. It is important to note that you will need to modify the image to match your specific configuration.

# Ray head node service, allowing worker pods to discover the head node to perform the bidirectional communication.
# More contexts can be found at [the Ports configurations doc](https://docs.ray.io/en/latest/ray-core/configure.html#ports-configurations).
apiVersion: v1
kind: Service
metadata:
  name: service-ray-cluster
  labels:
    app: ray-cluster-head
spec:
  clusterIP: None
  ports:
  - name: client
    protocol: TCP
    port: 10001
    targetPort: 10001
  - name: dashboard
    protocol: TCP
    port: 8265
    targetPort: 8265
  - name: gcs-server
    protocol: TCP
    port: 6380
    targetPort: 6380
  selector:
    app: ray-cluster-head
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-ray-head
  labels:
    app: ray-cluster-head
spec:
  # Do not change this - Ray currently only supports one head node per cluster.
  replicas: 1
  selector:
    matchLabels:
      component: ray-head
      type: ray
      app: ray-cluster-head
  template:
    metadata:
      labels:
        component: ray-head
        type: ray
        app: ray-cluster-head
    spec:
      # If the head node goes down, the entire cluster (including all worker
      # nodes) will go down as well. If you want Kubernetes to bring up a new
      # head node in this case, set this to "Always," else set it to "Never."
      restartPolicy: Always

      # This volume allocates shared memory for Ray to use for its plasma
      # object store. If you do not provide this, Ray will fall back to
      # /tmp which cause slowdowns if it's not a shared memory volume.
      # volumes:
      # - name: dshm
        # emptyDir:
        #   medium: Memory
      containers:
        - name: ray-head
          image: rayproject/ray:2.3.0
          imagePullPolicy: Always
          command: [ "/bin/bash", "-c", "--" ]
          # if there is no password for Redis, set --redis-password=''
          args:
            - "ray start --head --port=6380 --num-cpus=$MY_CPU_REQUEST --dashboard-host=0.0.0.0 --object-manager-port=8076 --node-manager-port=8077 --dashboard-agent-grpc-port=8078 --dashboard-agent-listen-port=52365 --redis-password='123456' --block"
          ports:
            - containerPort: 6380 # GCS server
            - containerPort: 10001 # Used by Ray Client
            - containerPort: 8265 # Used by Ray Dashboard
          # This volume allocates shared memory for Ray to use for its plasma
          # object store. If you do not provide this, Ray will fall back to
          # /tmp which cause slowdowns if it's not a shared memory volume.
          # volumeMounts:
          #   - mountPath: /dev/shm
          #     name: dshm
          env:
            # RAY_REDIS_ADDRESS lets ray use external Redis for fault tolerance
            - name: RAY_REDIS_ADDRESS
              value: redis:6379 # ip address for the external Redis, which is "redis:6379" in this example
            # This is used in the ray start command so that Ray can spawn the
            # correct number of processes. Omitting this may lead to degraded
            # performance.
            - name: MY_CPU_REQUEST
              valueFrom:
                resourceFieldRef:
                  resource: requests.cpu
          resources:
            limits:
              cpu: "2"
              memory: "5G"
            requests:
              # For production use-cases, we recommend specifying integer CPU reqests and limits.
              # We also recommend setting requests equal to limits for both CPU and memory.
              # For this example, we use a 500m CPU request to accomodate resource-constrained local
              # Kubernetes testing environments such as KinD and minikube.
              cpu: "500m"
              # The rest state memory usage of the Ray head node is around 1Gb. We do not
              # recommend allocating less than 2Gb memory for the Ray head pod.
              # For production use-cases, we recommend allocating at least 8Gb memory for each Ray container.
              memory: "2G"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-ray-worker
  labels:
    app: ray-cluster-worker
spec:
  # Change this to scale the number of worker nodes started in the Ray cluster.
  selector:
    matchLabels:
      component: ray-worker
      type: ray
      app: ray-cluster-worker
  template:
    metadata:
      labels:
        component: ray-worker
        type: ray
        app: ray-cluster-worker
    spec:
      restartPolicy: Always
      # volumes:
      # - name: dshm
      #   emptyDir:
      #     medium: Memory
      containers:
        - name: ray-worker
          image: rayproject/ray:2.3.0
          imagePullPolicy: Always
          command: ["/bin/bash", "-c", "--"]
          args:
            - "ray start --num-cpus=$MY_CPU_REQUEST --address=service-ray-cluster:6380 --object-manager-port=8076 --node-manager-port=8077 --dashboard-agent-grpc-port=8078 --dashboard-agent-listen-port=52365 --block"

          # This volume allocates shared memory for Ray to use for its plasma
          # object store. If you do not provide this, Ray will fall back to
          # /tmp which cause slowdowns if it's not a shared memory volume.
          # volumeMounts:
          #   - mountPath: /dev/shm
          #     name: dshm
          env:
            # This is used in the ray start command so that Ray can spawn the
            # correct number of processes. Omitting this may lead to degraded
            # performance.
            - name: MY_CPU_REQUEST
              valueFrom:
                resourceFieldRef:
                  resource: requests.cpu
            # The resource requests and limits in this config are too small for production!
            # It is better to use a few large Ray pods than many small ones.
            # For production, it is ideal to size each Ray pod to take up the
            # entire Kubernetes node on which it is scheduled.
          resources:
            limits:
              cpu: "1"
              memory: "1G"
              # For production use-cases, we recommend specifying integer CPU reqests and limits.
              # We also recommend setting requests equal to limits for both CPU and memory.
              # For this example, we use a 500m CPU request to accomodate resource-constrained local
              # Kubernetes testing environments such as KinD and minikube.
            requests:
              cpu: "500m"
              memory: "1G"

Once you have created the ray-dev.yaml file on your server, you can proceed to install the Ray head and worker. If you want to install a Ray cluster in the ns namespace, you can do so by executing the following command:

kubectl apply -f ray-dev.yaml -n ns

After creating the ray-dev.yaml file on the server and installing the Ray head and worker, the next step is to enable Horizontal Pod Autoscaling (HPA) for the worker.

To enable HPA for the worker, you need to create another file called ray-autoscaler.yaml on the server. This file will contain the configuration for the Kubernetes HPA object, which will automatically scale the number of worker pods based on CPU utilization. Here's an example ray-autoscaler.yaml file:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: ray-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deployment-ray-worker
  minReplicas: 1
  maxReplicas: 2
  targetCPUUtilizationPercentage: 60

This HPA will automatically adjust the number of replicas of the specified deployment based on the CPU utilization, with a minimum of 1 replica and a maximum of 2 replicas, targeting a CPU utilization of 60%.

To apply this HPA in the ns namespace, execute the following command:

kubectl apply -f ray-autoscaler.yaml -n ns

Monitoring and Observability#

The next step is to enable cluster monitoring. Ray auto-generates a Prometheus service discovery file named prom_metrics_service_discovery.json on the head node, which facilitates service discovery for metrics agents. To enable monitoring for your Ray cluster using the default Prometheus for Kubernetes, you need to map this file out and add it to the Prometheus configuration.

We can update the ray-dev.yaml configuration file to use a persistent volume claim (PVC) to map the /tmp/ray directory (please refer to the doc for more details). This involves adding the following configuration for volumes and volumeMounts to the YAML file. Finally, we can associate this PVC with our Prometheus instance.

spec:
  volumes:
    - name: ray-prom
      persistentVolumeClaim:
        claimName: ray-prom-pvc
  containers:
    - name: ray-head
      volumeMounts:
        - name: ray-prom
          mountPath: /tmp/ray