zfs-localpv/e2e-tests/experiments/infra-chaos/node_failure/README.md
Aman Gupta 4e73638b5a
feat(e2e-test): Add e2e-tests for zfs-localpv (#298)
Signed-off-by: w3aman <aman.gupta@mayadata.io>
2021-06-09 21:21:39 +05:30

4.2 KiB

Experiment Metadata

Type Description Storage K8s Platform
Chaos Power off the node where application pod is hosted and observe application behavior OpenEBS on-premise-VMware

Entry-Criteria

  • Application services are accessible & pods are healthy
  • Application writes are successful

Exit-Criteria

  • Application pod should be evicted and rescheduled on other node.
  • Data written prior to chaos is successfully retrieved/read
  • Database consistency is maintained as per db integrity check utils
  • Storage target pods are healthy

Notes

  • Typically used as a disruptive test, to cause loss of access to storage target by killing the node where application pod is scheduled.
  • The container should be created again and it should be healthy.

Associated Utils

  • vm_power_operations.yml,mysql_data_persistence.yml,busybox_data_persistence.yml

Procedure

This scenario validates the behaviour of application and OpenEBS persistent volumes in the amidst of chaos induced on the node where the application pod is scheduled. It is performed by shutting down the node(virtual machine) created on VMware hypervisor. After attaining podevictiontimeout(5 minutes by default), the application pod is expected to be scheduled on other available node. Due to abrupt shutdown, the old application pod still remain in unknown state. As an impact, volume mount in the newly scheduled pod fails due to multi-attach error. As a workaround for this, the node CR will be deleted which kills the old pod. Then, the application pod is expected to run successfully after 5 minutes.

Based on the value of env DATA_PERSISTENCE, the corresponding data consistency util will be executed. At present, only busybox and percona-mysql are supported. Along with specifying env in the litmus experiment, user needs to pass name for configmap and the data consistency specific parameters required via configmap in the format as follows:

    parameters.yml: |
      blocksize: 4k
      blockcount: 1024
      testfile: difiletest

It is recommended to pass test-name for configmap and mount the corresponding configmap as volume in the litmus pod. The above snippet holds the parameters required for validation data consistency in busybox application.

For percona-mysql, the following parameters are to be injected into configmap.

    parameters.yml: |
      dbuser: root
      dbpassword: k8sDem0
      dbname: tdb

The configmap data will be utilised by litmus experiments as its variables while executing the scenario.

Based on the data provided, litmus checks if the data is consistent after recovering from induced chaos.

ESX password has to updated through k8s secret created. The litmus runner can retrieve the password from secret as environmental variable and utilize it for performing admin operations on the server.

Note: To perform admin operatons on vmware, the VM display name in hypervisor should match its hostname.

Litmus experiment Environment Variables

Application

Parameter Description
APP_NAMESPACE Namespace in which application pods are deployed
APP_LABEL Unique Labels in key=value format of application deployment
APP_PVC Name of persistent volume claim used for app's volume mounts
TARGET_NAMESPACE Namespace where OpenEBS is installed
DATA_PERSISTENCE Specify the application name against which data consistency has to be ensured. Example: busybox

Chaos

Parameter Description
PLATFORM The platform where k8s cluster is created. Currently, only 'vmware' is supported.
ESX_HOST_IP The IP address of ESX server where the virtual machines are hosted.
ESX_PASSWORD To be passed as configmap data.