feat(e2e-test): Add e2e-tests for zfs-localpv (#298)

Signed-off-by: w3aman <aman.gupta@mayadata.io>
2026-02-02 07:35:12 +01:00 · 2021-06-09 21:21:39 +05:30 · 2021-06-09 21:21:39 +05:30 · 4e73638b5a
commit 4e73638b5a
parent 53f872fcf1
137 changed files with 8745 additions and 0 deletions
--- a/e2e-tests/experiments/functional/zfs-controller-high-availability/README.md
+++ b/e2e-tests/experiments/functional/zfs-controller-high-availability/README.md
@ -0,0 +1,63 @@
+## About this experiment
+
+This functional experiment scale up the zfs-controller statefulset replicas to use it in high availability mode and then verify the zfs-localpv behaviour when one of the replicas go down. This experiment checks the initial number of replicas of zfs-controller statefulset and scale it by one if a free node is present which should be able to schedule the pods. Default value for zfs-controller statefulset replica is one.
+
+## Supported platforms:
+
+K8s : 1.18+
+
+OS : Ubuntu, CentOS
+
+ZFS : 0.7, 0.8
+
+## Entry-Criteria
+
+- k8s cluster should be in healthy state including all desired worker nodes in ready state.
+- zfs-localpv driver should be deployed and zfs-controller and csi node-agent daemonset pods should be in running state.
+- one spare schedulable node should be present in the cluster so that after scaling up the zfs-controller replica by one, new replica gets scheduled on that node. These replicas will follow the anti-affinity rules so that replica pods will be present on different nodes only.
+
+## Exit-Criteria
+
+- zfs-controller statefulset should be scaled up by one replica.
+- All the replias should be in running state.
+- zfs-localpv volumes should be healthy and data after scaling up controller should not be impacted.
+- This experiment makes one of the zfs-controller statefulset replica to go down, as a result active/master replica of zfs-controller prior to the experiment will be changed to some other remaining replica after the experiment completes. This happens because of the lease mechanism, which is being used to decide which replica will be serving as master. At a time only one replica will be master.
+- Volumes provisioning / deprovisioning should not be impacted if any one replica goes down.
+
+## Steps performed
+
+- Get the no of zfs-controller statefulset replica count.
+- Scale down the controller replicas to zero, wait until controller pods gets terminated successfully and then try to provision a volume to use by busybox application.
+- Due to zero active replicas of zfs-controller, pvc should remain in Pending state.
+- If no. of schedulable nodes are greater or equal to the previous replica count + 1, then zfs-controller will be scaled up by +1 replica. Doing this will Bound the pvc and application pod will come in Running state.
+- Now taint all the nodes with `NoSchedule` so that when we delete the master replica of zfs-controller it doesn't come back to running state and at that time lease should be given to some other replica and now that replica will work as master.
+- Now deprovision the application. This time deprovision will be done by that master replica which is active at present. So here we validated that provision and deprovisioning was successully done by two different replica of zfs-controller. And remove the taints before exiting the test execution. And then we check running statue of all the replicas and csi-node agent pods.
+- If no. of schedulable nodes are not present for scheduling updated no. of replicas then this test will fail at the task of scaling up replicas and then it will skip further tasks. Before exiting it will scale up the down replicas with same no of replica count which was present at starting of this experiment. Doing this will Bound the pvc and application pod will come in running state. This test execution will end after deleting that pvc and application pod.
+
+## How to run
+
+- This experiment accepts the parameters in form of kubernetes job environmental variables.
+- For running this experiment of zfs-localpv controller high availability, clone openens/zfs-localpv[https://github.com/openebs/zfs-localpv] repo and then first apply rbac and crds for e2e-framework.
+```
+kubectl apply -f zfs-localpv/e2e-tests/hack/rbac.yaml
+kubectl apply -f zfs-localpv/e2e-tests/hack/crds.yaml
+```
+then update the needed test specific values in run_e2e_test.yml file and create the kubernetes job.
+```
+kubectl create -f run_e2e_test.yml
+```
+All the env variables description is provided with the comments in the same file.
+
+After creating kubernetes job, when the job’s pod is instantiated, we can see the logs of that pod which is executing the test-case.
+
+```
+kubectl get pods -n e2e
+kubectl logs -f <zfs-controller-high-availability-xxxxx-xxxxx> -n e2e
+```
+To get the test-case result, get the corresponding e2e custom-resource `e2eresult` (short name: e2er ) and check its phase (Running or Completed) and result (Pass or Fail).
+
+```
+kubectl get e2er
+kubectl get e2er zfs-controller-high-availability -n e2e --no-headers -o custom-columns=:.spec.testStatus.phase
+kubectl get e2er zfs-controller-high-availability -n e2e --no-headers -o custom-columns=:.spec.testStatus.result
+```
--- a/e2e-tests/experiments/functional/zfs-controller-high-availability/busybox_app.yml
+++ b/e2e-tests/experiments/functional/zfs-controller-high-availability/busybox_app.yml
@ -0,0 +1,46 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: app-busybox-ha
+  labels:
+    app: test_ha
+spec:
+  selector:
+    matchLabels:
+      app: test_ha
+  template:
+    metadata:
+      labels:
+        app: test_ha
+    spec:
+      tolerations:
+      - key: "key"
+        operator: "Equal"
+        value: "value"
+        effect: "NoSchedule"
+      containers:
+      - name: app-busybox
+        imagePullPolicy: IfNotPresent
+        image: gcr.io/google-containers/busybox
+        command: ["/bin/sh"]
+        args: ["-c", "while true; do sleep 10;done"]
+        env:
+        volumeMounts:
+        - name: data-vol
+          mountPath: /busybox
+      volumes:
+      - name: data-vol
+        persistentVolumeClaim:
+          claimName: pvcha
+---
+kind: PersistentVolumeClaim
+apiVersion: v1
+metadata:
+  name: pvcha
+spec:
+  storageClassName: zfspv-sc
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 1Gi
--- a/e2e-tests/experiments/functional/zfs-controller-high-availability/run_e2e_test.yml
+++ b/e2e-tests/experiments/functional/zfs-controller-high-availability/run_e2e_test.yml
@ -0,0 +1,30 @@
+apiVersion: batch/v1
+kind: Job
+metadata:
+  generateName: zfs-controller-high-availability-
+  namespace: e2e
+spec:
+  template:
+    metadata:
+      labels:
+        test: zfs-controller-high-availability
+    spec:
+      serviceAccountName: e2e
+      restartPolicy: Never
+
+      containers:
+      - name: ansibletest
+        image: openebs/zfs-localpv-e2e:ci
+        imagePullPolicy: IfNotPresent
+        env:
+          - name: ANSIBLE_STDOUT_CALLBACK
+            value: default
+
+            # This is the namespace where the zfs driver created all its resources including zvol.
+            # By default it is in openebs namespace. If we changed it at the time of zfs-driver provisioning
+            # give that namespace name here for the value of this env.
+          - name: ZFS_OPERATOR_NAMESPACE
+            value: 'openebs'
+
+        command: ["/bin/bash"]
+        args: ["-c", "ansible-playbook ./e2e-tests/experiments/functional/zfs-controller-high-availability/test.yml -i /etc/ansible/hosts -vv; exit 0"]
--- a/e2e-tests/experiments/functional/zfs-controller-high-availability/test.yml
+++ b/e2e-tests/experiments/functional/zfs-controller-high-availability/test.yml
@ -0,0 +1,315 @@
+- hosts: localhost
+  connection: local
+  gather_facts: False
+
+  vars_files:
+    - test_vars.yml
+    
+  tasks:
+    - block:
+
+          ## Generating the testname for zfs localpv controller high-availability test
+        - include_tasks: /e2e-tests/hack/create_testname.yml
+    
+          ## Record SOT (start of test) in e2e result e2e-cr (e2e-custom-resource)
+        - include_tasks: /e2e-tests/hack/update_e2e_result_resource.yml
+          vars:
+            status: 'SOT'
+        
+        - name: Get the no of replicas in zfs-controller statefulset 
+          shell: >
+            kubectl get sts openebs-zfs-controller -n kube-system -o jsonpath='{.status.replicas}'
+          args:
+            executable: /bin/bash
+          register: controller_rep_count
+
+        - name: Record the replica count of zfs-controller
+          set_fact:
+            zfs_ctrl_replicas: "{{ controller_rep_count.stdout }}"
+
+        - name: Get the list of names of all the nodes in cluster
+          shell: >
+            kubectl get nodes --no-headers -o custom-columns=:.metadata.name
+          args:
+            executable: /bin/bash
+          register: node_list
+
+        - name: Get the count of the schedulable nodes, which don't have `NoSchedule` taints
+          shell: >
+            kubectl get nodes --no-headers -o custom-columns=:.spec.taints
+            | grep -v NoSchedule | wc -l
+          args:
+            executable: /bin/bash
+          register: schedulable_nodes_count
+
+        - name: Record the number of schedulable nodes in cluster
+          set_fact:
+            no_of_schedulable_nodes: "{{ schedulable_nodes_count.stdout }}"
+          
+        - name: scale down the replicas to zero of zfs-controller statefulset
+          shell: >
+            kubectl scale sts openebs-zfs-controller -n kube-system --replicas=0
+          args:
+            executable: /bin/bash
+          register: status
+          failed_when: "status.rc != 0"
+
+        - name: check that zfs-controller pods has been terminated successfully
+          shell: >
+            kubectl get pods -n kube-system -l app=openebs-zfs-controller
+          args:
+            executable: /bin/bash
+          register: ctrl_pods
+          until: "'No resources found' in ctrl_pods.stderr"
+          delay: 3
+          retries: 40
+
+        - name: Provision a test volume when zfs-controller is not active
+          shell: >
+            kubectl apply -f busybox_app.yml
+          args: 
+            executable: /bin/bash
+       
+        - name: check the pvc status, it should be in pending state
+          shell: >
+            kubectl get pvc pvcha -n e2e -o jsonpath='{.status.phase}'
+          args:
+            executable: /bin/bash
+          register: pvc_status
+          failed_when: "'Pending' not in pvc_status.stdout"
+
+        - name: Manual wait for 15 seconds, pvc should not get bound in this time
+          shell: sleep 15
+
+        - name: again check the pvc status
+          shell: >
+            kubectl get pvc pvcha -n e2e -o jsonpath='{.status.phase}'
+          args:
+            executable: /bin/bash
+          register: pvc_status
+          failed_when: "'Pending' not in pvc_status.stdout"
+        
+        - block: 
+
+            - name: scale up the zfs-controller statefulset with +1 no of replica count
+              shell: >
+                kubectl scale sts openebs-zfs-controller -n kube-system
+                --replicas="{{ zfs_ctrl_replicas|int + 1 }}"
+              args:
+                executable: /bin/bash
+
+            - name: check that zfs-controller statefulset replicas are up and running
+              shell: >
+                kubectl get pods -n kube-system -l app=openebs-zfs-controller --no-headers 
+                -o custom-columns=:.status.phase | grep Running | wc -l
+              args:
+                executable: /bin/bash
+              register: ready_replicas
+              until: "{{ ready_replicas.stdout|int }} == {{ zfs_ctrl_replicas|int + 1 }}"
+              delay: 3
+              retries: 50
+
+            - name: check the pvc status after zfs controller is up and running
+              shell: >
+                kubectl get pvc pvcha -n e2e -o jsonpath='{.status.phase}'
+              args:
+                executable: /bin/bash
+              register: pvc_status
+              until: "'Bound' in pvc_status.stdout"
+              delay: 3
+              retries: 40
+
+            - name: Get the application pod name
+              shell: >
+                kubectl get pods -n e2e -o jsonpath='{.items[?(@.metadata.labels.app=="test_ha")].metadata.name}'
+              args:
+                executable: /bin/bash
+              register: app_pod_name
+
+            - name: Check if the application pod is in running state.
+              shell: >
+                kubectl get pods -n e2e -o jsonpath='{.items[?(@.metadata.labels.app=="test_ha")].status.phase}'
+              register: pod_status
+              until: "'Running' in pod_status.stdout"
+              delay: 3
+              retries: 40
+
+            - name: Get the name of the controller pod replica which is active as master at present
+              shell: >
+                kubectl get lease zfs-csi-openebs-io -n kube-system -o jsonpath='{.spec.holderIdentity}'
+              args: 
+                executable: /bin/bash
+              register: master_replica
+
+            - name: Taint all nodes with `NoSchedule` to keep replica {{ master_replica.stdout }} out of action
+              shell: >
+                kubectl taint node {{ item }} key=value:NoSchedule
+              args:
+                executable: /bin/bash
+              register: taint_status
+              until: "'tainted' in taint_status.stdout "
+              with_items: "{{ node_list.stdout_lines }}"
+
+            - name: Delete the {{ master_replica.stdout }} replica pod
+              shell: >
+                kubectl delete pod {{ master_replica.stdout }} -n kube-system
+              args:
+                executable: /bin/bash
+              register: status
+              failed_when: "status.rc != 0"
+
+            - name: Get the new replica name which is in action as master for zfs-controller
+              shell: >
+                kubectl get lease zfs-csi-openebs-io -n kube-system -o jsonpath='{.spec.holderIdentity}'
+              args: 
+                executable: /bin/bash
+              register: new_master_replica
+              retries: 40
+              delay: 3
+              until: master_replica.stdout != new_master_replica.stdout
+
+            - name: Get the zfs-volume name from the pvc name
+              shell: >
+                kubectl get pvc pvcha -n e2e -o jsonpath='{.spec.volumeName}'
+              args:
+                executable: /bin/bash
+              register: zfsvol_name
+
+            - name: Deprovision the application
+              shell: >
+                kubectl delete -f busybox_app.yml
+              args:
+                executable: /bin/bash
+              
+            - name: Verify that application pods have been deleted successfully 
+              shell: >
+                kubectl get pods -n e2e
+              args:
+                executable: /bin/bash
+              register: app_pod
+              until: "'{{ app_pod_name.stdout }}' not in app_pod.stdout"
+              delay: 3
+              retries: 40
+              
+            - name: verify that pvc has been deleted successfully
+              shell: >
+                kubectl get pvc -n e2e
+              args:
+                executable: /bin/bash
+              register: pvc_status
+              until: "'pvcha' not in pvc_status.stdout"
+              delay: 3
+              retries: 40
+              
+            - name: verify that zfsvol has been deleted successfully
+              shell: >
+                kubectl get zv -n {{ zfs_operator_ns }}
+              args:
+                executable: /bin/bash
+              register: zfsvol_status
+              until: "zfsvol_name.stdout not in zfsvol_status.stdout"
+              delay: 3
+              retries: 40
+
+          when: "{{ zfs_ctrl_replicas|int + 1 }} <= {{no_of_schedulable_nodes|int}}"
+
+        - set_fact:
+            flag: "Pass"
+
+      rescue:
+        - set_fact:
+            flag: "Fail"
+
+      always:
+
+        - name: Remove the taint from the nodes
+          shell: >
+            kubectl taint node {{ item }} key-
+          args:
+            executable: /bin/bash
+          register: status
+          failed_when: "status.rc != 0"
+          with_items: "{{ node_list.stdout_lines }}"
+          ignore_errors: true
+
+        - block: 
+
+            - name: Scale up the zfs-controller with same no of replica count
+              shell: >
+                kubectl scale sts openebs-zfs-controller -n kube-system --replicas={{ zfs_ctrl_replicas }}
+              args:
+                executable: /bin/bash
+              register: status
+              failed_when: "status.rc != 0"
+
+            - name: Verify that the zfs-controller pod and zfs-node daemonset pods are running
+              shell: >
+                kubectl get pods -n kube-system -l role=openebs-zfs
+                --no-headers -o custom-columns=:status.phase | sort | uniq
+              args: 
+                executable: /bin/bash
+              register: zfs_driver_components
+              until: "zfs_driver_components.stdout == 'Running'"
+              delay: 3
+              retries: 50
+
+            - name: Get the zfs-volume name from the pvc name
+              shell: >
+                kubectl get pvc pvcha -n e2e -o jsonpath='{.spec.volumeName}'
+              args:
+                executable: /bin/bash
+              register: zfsvol_name
+
+            - name: Deprovision the application
+              shell: >
+                kubectl delete -f busybox_app.yml
+              args:
+                executable: /bin/bash
+              
+            - name: Verify that application pods have been deleted successfully 
+              shell: >
+                kubectl get pods -n e2e -l app=test_ha
+              args:
+                executable: /bin/bash
+              register: app_pod
+              until: "'No resources found' in app_pod.stderr"
+              delay: 3
+              retries: 40
+              
+            - name: verify that pvc has been deleted successfully
+              shell: >
+                kubectl get pvc -n e2e
+              args:
+                executable: /bin/bash
+              register: pvc_status
+              until: "'pvcha' not in pvc_status.stdout"
+              delay: 3
+              retries: 40
+              
+            - name: verify that zfsvol has been deleted successfully
+              shell: >
+                kubectl get zv -n {{ zfs_operator_ns }}
+              args:
+                executable: /bin/bash
+              register: zfsvol_status
+              until: "zfsvol_name.stdout not in zfsvol_status.stdout"
+              delay: 3
+              retries: 40
+
+          when: "{{ zfs_ctrl_replicas|int + 1 }} > {{no_of_schedulable_nodes|int}}"
+
+        - name: Verify that the zfs-controller pod and zfs-node daemonset pods are running
+          shell: >
+            kubectl get pods -n kube-system -l role=openebs-zfs
+            --no-headers -o custom-columns=:status.phase | sort | uniq
+          args: 
+            executable: /bin/bash
+          register: zfs_driver_components
+          until: "zfs_driver_components.stdout == 'Running'"
+          delay: 3
+          retries: 50
+
+          ## RECORD END-OF-TEST IN e2e RESULT CR
+        - include_tasks: /e2e-tests/hack/update_e2e_result_resource.yml
+          vars:
+            status: 'EOT'
--- a/e2e-tests/experiments/functional/zfs-controller-high-availability/test_vars.yml
+++ b/e2e-tests/experiments/functional/zfs-controller-high-availability/test_vars.yml
@ -0,0 +1,3 @@
+test_name: zfs-controller-high-availability
+
+zfs_operator_ns: "{{ lookup('env','ZFS_OPERATOR_NAMESPACE') }}"