feat(e2e-test): Add e2e-tests for zfs-localpv (#298)

Signed-off-by: w3aman <aman.gupta@mayadata.io>
This commit is contained in:
Aman Gupta 2021-06-09 21:21:39 +05:30 committed by GitHub
parent 53f872fcf1
commit 4e73638b5a
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
137 changed files with 8745 additions and 0 deletions

View file

@ -0,0 +1,63 @@
## About this experiment
This functional experiment scale up the zfs-controller statefulset replicas to use it in high availability mode and then verify the zfs-localpv behaviour when one of the replicas go down. This experiment checks the initial number of replicas of zfs-controller statefulset and scale it by one if a free node is present which should be able to schedule the pods. Default value for zfs-controller statefulset replica is one.
## Supported platforms:
K8s : 1.18+
OS : Ubuntu, CentOS
ZFS : 0.7, 0.8
## Entry-Criteria
- k8s cluster should be in healthy state including all desired worker nodes in ready state.
- zfs-localpv driver should be deployed and zfs-controller and csi node-agent daemonset pods should be in running state.
- one spare schedulable node should be present in the cluster so that after scaling up the zfs-controller replica by one, new replica gets scheduled on that node. These replicas will follow the anti-affinity rules so that replica pods will be present on different nodes only.
## Exit-Criteria
- zfs-controller statefulset should be scaled up by one replica.
- All the replias should be in running state.
- zfs-localpv volumes should be healthy and data after scaling up controller should not be impacted.
- This experiment makes one of the zfs-controller statefulset replica to go down, as a result active/master replica of zfs-controller prior to the experiment will be changed to some other remaining replica after the experiment completes. This happens because of the lease mechanism, which is being used to decide which replica will be serving as master. At a time only one replica will be master.
- Volumes provisioning / deprovisioning should not be impacted if any one replica goes down.
## Steps performed
- Get the no of zfs-controller statefulset replica count.
- Scale down the controller replicas to zero, wait until controller pods gets terminated successfully and then try to provision a volume to use by busybox application.
- Due to zero active replicas of zfs-controller, pvc should remain in Pending state.
- If no. of schedulable nodes are greater or equal to the previous replica count + 1, then zfs-controller will be scaled up by +1 replica. Doing this will Bound the pvc and application pod will come in Running state.
- Now taint all the nodes with `NoSchedule` so that when we delete the master replica of zfs-controller it doesn't come back to running state and at that time lease should be given to some other replica and now that replica will work as master.
- Now deprovision the application. This time deprovision will be done by that master replica which is active at present. So here we validated that provision and deprovisioning was successully done by two different replica of zfs-controller. And remove the taints before exiting the test execution. And then we check running statue of all the replicas and csi-node agent pods.
- If no. of schedulable nodes are not present for scheduling updated no. of replicas then this test will fail at the task of scaling up replicas and then it will skip further tasks. Before exiting it will scale up the down replicas with same no of replica count which was present at starting of this experiment. Doing this will Bound the pvc and application pod will come in running state. This test execution will end after deleting that pvc and application pod.
## How to run
- This experiment accepts the parameters in form of kubernetes job environmental variables.
- For running this experiment of zfs-localpv controller high availability, clone openens/zfs-localpv[https://github.com/openebs/zfs-localpv] repo and then first apply rbac and crds for e2e-framework.
```
kubectl apply -f zfs-localpv/e2e-tests/hack/rbac.yaml
kubectl apply -f zfs-localpv/e2e-tests/hack/crds.yaml
```
then update the needed test specific values in run_e2e_test.yml file and create the kubernetes job.
```
kubectl create -f run_e2e_test.yml
```
All the env variables description is provided with the comments in the same file.
After creating kubernetes job, when the jobs pod is instantiated, we can see the logs of that pod which is executing the test-case.
```
kubectl get pods -n e2e
kubectl logs -f <zfs-controller-high-availability-xxxxx-xxxxx> -n e2e
```
To get the test-case result, get the corresponding e2e custom-resource `e2eresult` (short name: e2er ) and check its phase (Running or Completed) and result (Pass or Fail).
```
kubectl get e2er
kubectl get e2er zfs-controller-high-availability -n e2e --no-headers -o custom-columns=:.spec.testStatus.phase
kubectl get e2er zfs-controller-high-availability -n e2e --no-headers -o custom-columns=:.spec.testStatus.result
```

View file

@ -0,0 +1,46 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-busybox-ha
labels:
app: test_ha
spec:
selector:
matchLabels:
app: test_ha
template:
metadata:
labels:
app: test_ha
spec:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"
containers:
- name: app-busybox
imagePullPolicy: IfNotPresent
image: gcr.io/google-containers/busybox
command: ["/bin/sh"]
args: ["-c", "while true; do sleep 10;done"]
env:
volumeMounts:
- name: data-vol
mountPath: /busybox
volumes:
- name: data-vol
persistentVolumeClaim:
claimName: pvcha
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvcha
spec:
storageClassName: zfspv-sc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi

View file

@ -0,0 +1,30 @@
apiVersion: batch/v1
kind: Job
metadata:
generateName: zfs-controller-high-availability-
namespace: e2e
spec:
template:
metadata:
labels:
test: zfs-controller-high-availability
spec:
serviceAccountName: e2e
restartPolicy: Never
containers:
- name: ansibletest
image: openebs/zfs-localpv-e2e:ci
imagePullPolicy: IfNotPresent
env:
- name: ANSIBLE_STDOUT_CALLBACK
value: default
# This is the namespace where the zfs driver created all its resources including zvol.
# By default it is in openebs namespace. If we changed it at the time of zfs-driver provisioning
# give that namespace name here for the value of this env.
- name: ZFS_OPERATOR_NAMESPACE
value: 'openebs'
command: ["/bin/bash"]
args: ["-c", "ansible-playbook ./e2e-tests/experiments/functional/zfs-controller-high-availability/test.yml -i /etc/ansible/hosts -vv; exit 0"]

View file

@ -0,0 +1,315 @@
- hosts: localhost
connection: local
gather_facts: False
vars_files:
- test_vars.yml
tasks:
- block:
## Generating the testname for zfs localpv controller high-availability test
- include_tasks: /e2e-tests/hack/create_testname.yml
## Record SOT (start of test) in e2e result e2e-cr (e2e-custom-resource)
- include_tasks: /e2e-tests/hack/update_e2e_result_resource.yml
vars:
status: 'SOT'
- name: Get the no of replicas in zfs-controller statefulset
shell: >
kubectl get sts openebs-zfs-controller -n kube-system -o jsonpath='{.status.replicas}'
args:
executable: /bin/bash
register: controller_rep_count
- name: Record the replica count of zfs-controller
set_fact:
zfs_ctrl_replicas: "{{ controller_rep_count.stdout }}"
- name: Get the list of names of all the nodes in cluster
shell: >
kubectl get nodes --no-headers -o custom-columns=:.metadata.name
args:
executable: /bin/bash
register: node_list
- name: Get the count of the schedulable nodes, which don't have `NoSchedule` taints
shell: >
kubectl get nodes --no-headers -o custom-columns=:.spec.taints
| grep -v NoSchedule | wc -l
args:
executable: /bin/bash
register: schedulable_nodes_count
- name: Record the number of schedulable nodes in cluster
set_fact:
no_of_schedulable_nodes: "{{ schedulable_nodes_count.stdout }}"
- name: scale down the replicas to zero of zfs-controller statefulset
shell: >
kubectl scale sts openebs-zfs-controller -n kube-system --replicas=0
args:
executable: /bin/bash
register: status
failed_when: "status.rc != 0"
- name: check that zfs-controller pods has been terminated successfully
shell: >
kubectl get pods -n kube-system -l app=openebs-zfs-controller
args:
executable: /bin/bash
register: ctrl_pods
until: "'No resources found' in ctrl_pods.stderr"
delay: 3
retries: 40
- name: Provision a test volume when zfs-controller is not active
shell: >
kubectl apply -f busybox_app.yml
args:
executable: /bin/bash
- name: check the pvc status, it should be in pending state
shell: >
kubectl get pvc pvcha -n e2e -o jsonpath='{.status.phase}'
args:
executable: /bin/bash
register: pvc_status
failed_when: "'Pending' not in pvc_status.stdout"
- name: Manual wait for 15 seconds, pvc should not get bound in this time
shell: sleep 15
- name: again check the pvc status
shell: >
kubectl get pvc pvcha -n e2e -o jsonpath='{.status.phase}'
args:
executable: /bin/bash
register: pvc_status
failed_when: "'Pending' not in pvc_status.stdout"
- block:
- name: scale up the zfs-controller statefulset with +1 no of replica count
shell: >
kubectl scale sts openebs-zfs-controller -n kube-system
--replicas="{{ zfs_ctrl_replicas|int + 1 }}"
args:
executable: /bin/bash
- name: check that zfs-controller statefulset replicas are up and running
shell: >
kubectl get pods -n kube-system -l app=openebs-zfs-controller --no-headers
-o custom-columns=:.status.phase | grep Running | wc -l
args:
executable: /bin/bash
register: ready_replicas
until: "{{ ready_replicas.stdout|int }} == {{ zfs_ctrl_replicas|int + 1 }}"
delay: 3
retries: 50
- name: check the pvc status after zfs controller is up and running
shell: >
kubectl get pvc pvcha -n e2e -o jsonpath='{.status.phase}'
args:
executable: /bin/bash
register: pvc_status
until: "'Bound' in pvc_status.stdout"
delay: 3
retries: 40
- name: Get the application pod name
shell: >
kubectl get pods -n e2e -o jsonpath='{.items[?(@.metadata.labels.app=="test_ha")].metadata.name}'
args:
executable: /bin/bash
register: app_pod_name
- name: Check if the application pod is in running state.
shell: >
kubectl get pods -n e2e -o jsonpath='{.items[?(@.metadata.labels.app=="test_ha")].status.phase}'
register: pod_status
until: "'Running' in pod_status.stdout"
delay: 3
retries: 40
- name: Get the name of the controller pod replica which is active as master at present
shell: >
kubectl get lease zfs-csi-openebs-io -n kube-system -o jsonpath='{.spec.holderIdentity}'
args:
executable: /bin/bash
register: master_replica
- name: Taint all nodes with `NoSchedule` to keep replica {{ master_replica.stdout }} out of action
shell: >
kubectl taint node {{ item }} key=value:NoSchedule
args:
executable: /bin/bash
register: taint_status
until: "'tainted' in taint_status.stdout "
with_items: "{{ node_list.stdout_lines }}"
- name: Delete the {{ master_replica.stdout }} replica pod
shell: >
kubectl delete pod {{ master_replica.stdout }} -n kube-system
args:
executable: /bin/bash
register: status
failed_when: "status.rc != 0"
- name: Get the new replica name which is in action as master for zfs-controller
shell: >
kubectl get lease zfs-csi-openebs-io -n kube-system -o jsonpath='{.spec.holderIdentity}'
args:
executable: /bin/bash
register: new_master_replica
retries: 40
delay: 3
until: master_replica.stdout != new_master_replica.stdout
- name: Get the zfs-volume name from the pvc name
shell: >
kubectl get pvc pvcha -n e2e -o jsonpath='{.spec.volumeName}'
args:
executable: /bin/bash
register: zfsvol_name
- name: Deprovision the application
shell: >
kubectl delete -f busybox_app.yml
args:
executable: /bin/bash
- name: Verify that application pods have been deleted successfully
shell: >
kubectl get pods -n e2e
args:
executable: /bin/bash
register: app_pod
until: "'{{ app_pod_name.stdout }}' not in app_pod.stdout"
delay: 3
retries: 40
- name: verify that pvc has been deleted successfully
shell: >
kubectl get pvc -n e2e
args:
executable: /bin/bash
register: pvc_status
until: "'pvcha' not in pvc_status.stdout"
delay: 3
retries: 40
- name: verify that zfsvol has been deleted successfully
shell: >
kubectl get zv -n {{ zfs_operator_ns }}
args:
executable: /bin/bash
register: zfsvol_status
until: "zfsvol_name.stdout not in zfsvol_status.stdout"
delay: 3
retries: 40
when: "{{ zfs_ctrl_replicas|int + 1 }} <= {{no_of_schedulable_nodes|int}}"
- set_fact:
flag: "Pass"
rescue:
- set_fact:
flag: "Fail"
always:
- name: Remove the taint from the nodes
shell: >
kubectl taint node {{ item }} key-
args:
executable: /bin/bash
register: status
failed_when: "status.rc != 0"
with_items: "{{ node_list.stdout_lines }}"
ignore_errors: true
- block:
- name: Scale up the zfs-controller with same no of replica count
shell: >
kubectl scale sts openebs-zfs-controller -n kube-system --replicas={{ zfs_ctrl_replicas }}
args:
executable: /bin/bash
register: status
failed_when: "status.rc != 0"
- name: Verify that the zfs-controller pod and zfs-node daemonset pods are running
shell: >
kubectl get pods -n kube-system -l role=openebs-zfs
--no-headers -o custom-columns=:status.phase | sort | uniq
args:
executable: /bin/bash
register: zfs_driver_components
until: "zfs_driver_components.stdout == 'Running'"
delay: 3
retries: 50
- name: Get the zfs-volume name from the pvc name
shell: >
kubectl get pvc pvcha -n e2e -o jsonpath='{.spec.volumeName}'
args:
executable: /bin/bash
register: zfsvol_name
- name: Deprovision the application
shell: >
kubectl delete -f busybox_app.yml
args:
executable: /bin/bash
- name: Verify that application pods have been deleted successfully
shell: >
kubectl get pods -n e2e -l app=test_ha
args:
executable: /bin/bash
register: app_pod
until: "'No resources found' in app_pod.stderr"
delay: 3
retries: 40
- name: verify that pvc has been deleted successfully
shell: >
kubectl get pvc -n e2e
args:
executable: /bin/bash
register: pvc_status
until: "'pvcha' not in pvc_status.stdout"
delay: 3
retries: 40
- name: verify that zfsvol has been deleted successfully
shell: >
kubectl get zv -n {{ zfs_operator_ns }}
args:
executable: /bin/bash
register: zfsvol_status
until: "zfsvol_name.stdout not in zfsvol_status.stdout"
delay: 3
retries: 40
when: "{{ zfs_ctrl_replicas|int + 1 }} > {{no_of_schedulable_nodes|int}}"
- name: Verify that the zfs-controller pod and zfs-node daemonset pods are running
shell: >
kubectl get pods -n kube-system -l role=openebs-zfs
--no-headers -o custom-columns=:status.phase | sort | uniq
args:
executable: /bin/bash
register: zfs_driver_components
until: "zfs_driver_components.stdout == 'Running'"
delay: 3
retries: 50
## RECORD END-OF-TEST IN e2e RESULT CR
- include_tasks: /e2e-tests/hack/update_e2e_result_resource.yml
vars:
status: 'EOT'

View file

@ -0,0 +1,3 @@
test_name: zfs-controller-high-availability
zfs_operator_ns: "{{ lookup('env','ZFS_OPERATOR_NAMESPACE') }}"