Usecase: A node in the Kubernetes cluster is replaced with a new node. The
new node gets a different `kubernetes.io/hostname`. The storage devices
that were attached to the old node are re-attached to the new node.
Fix: Instead of using the default `kubenetes.io/hostname` as the node affinity
label, this commit changes to use `openebs.io/nodeid`. The ZFS LocalPV driver
will pick the value from the nodes and set the affinity.
Once the old node is removed from the cluster, the K8s scheduler will continue
to schedule applications on the old node only.
User can now modify the value of `openebs.io/nodeid` on the new node to the same
value that was available on the old node. This will make sure the pods/volumes are
scheduled to the node now.
Note: Now to migrate the PV to the other node, we have to move the disks to the other node
and remove the old node from the cluster and set the same label on the new node using
the same key, which will let k8s scheduler to schedule the pods to that node.
Other updates:
* adding faq doc
* renaming the config variable to nodename
Signed-off-by: Pawan <pawan@mayadata.io>
Co-authored-by: Akhil Mohan <akhilerm@gmail.com>
* Update docs/faq.md
Co-authored-by: Akhil Mohan <akhilerm@gmail.com>
We set the Finalizer to nil while handling the delete event, instead,
we should try to destroy the volume when there are no user finalizers
set. User might have added his own finalizers and we should not try to destroy
the volumes until those user finalizers are removed.
Signed-off-by: Pawan <pawan@mayadata.io>
Currently controller picks one node and the node agent keeps on trying to
create the volume on that node. There might not be enough space available
on that node to create the volume.
The controller can try on all the nodes sequentially and fail
the request if volume creation fails on all the nodes which satisfies the
topology contraints.
Signed-off-by: Pawan <pawan@mayadata.io>
Encrypted pool does not allow the volume to be pre created for the
restore purpose. Here changing the design to do the restore first
and then create the ZFSVolume object which will bind the volume
already created while doing restore.
Signed-off-by: Pawan <pawan@mayadata.io>
The ZFS Driver will use capacity scheduler to pick a node
which has less capacity occupied by the volumes. Making this
as default scheduler as it is better than the volume count based
scheduling. We can use below storageclass to specify the scheduler
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-zfspv
allowVolumeExpansion: true
parameters:
scheduler: "CapacityWeighted"
poolname: "zfspv-pool"
provisioner: zfs.csi.openebs.io
```
Please Note that after the upgrade, there will be a change in the behavior.
If we are not using `scheduler` parameter in the storage class then after
the upgrade ZFS Driver will pick the node bases on volume capacity weight
instead of the count.
Signed-off-by: Pawan <pawan@mayadata.io>
- Adding typecasting to make compilation work under MAC build environment
- Using go env variable instead of uname for determining platform
Signed-off-by: praveengt <praveen.gt@flipkart.com>
This PR adds the capability to create the Clone from pvc directly
```
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-clone
spec:
storageClassName: openebs-snap
dataSource:
name: pvc-snap
kind: PersistentVolumeClaim
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
```
The ZFS_LocalPV driver will create one internal snapshot of the name
same as the new volume name and will create a clone out of it. Also,
while destroying the volume the driver will take care of deleting
the created snapshot for the clone.
Signed-off-by: Pawan <pawan@mayadata.io>
Added a schema validation for backup and restore CR. Also validating
the server address in the backup/restore controller.
Validating the server address as :
^([0-9]+.[0-9]+.[0-9]+.[0-9]+:[0-9]+)$
which is :
<any number>.<any number>.<any number>.<any number>:<any number>
Here we are validating just the format of the IP, not validating that IP should be
correct which will be little more complex. In any case if IP is not correct,
the zfs send will fail, so no need to do complex validation to validate the
correct IP and port.
Signed-off-by: Pawan <pawan@mayadata.io>
This commit adds support for Backup and Restore controller, which will be watching for
the events. The velero plugin will create a Backup CR to create a backup
with the remote location information, the controller will send the data
to that remote location.
In the same way, the velero plugin will create a Restore CR to restore the
volume from the the remote location and the restore controller will restore
the data.
Steps to use velero plugin for ZFS-LocalPV are :
1. install velero
2. add openebs plugin
velero plugin add openebs/velero-plugin:latest
3. Create the volumesnapshot location :
for full backup :-
```yaml
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: default
namespace: velero
spec:
provider: openebs.io/zfspv-blockstore
config:
bucket: velero
prefix: zfs
namespace: openebs
provider: aws
region: minio
s3ForcePathStyle: "true"
s3Url: http://minio.velero.svc:9000
```
for incremental backup :-
```yaml
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: default
namespace: velero
spec:
provider: openebs.io/zfspv-blockstore
config:
bucket: velero
prefix: zfs
backup: incremental
namespace: openebs
provider: aws
region: minio
s3ForcePathStyle: "true"
s3Url: http://minio.velero.svc:9000
```
4. Create backup
velero backup create my-backup --snapshot-volumes --include-namespaces=velero-ns --volume-snapshot-locations=aws-cloud-default --storage-location=default
5. Create Schedule
velero create schedule newschedule --schedule="*/1 * * * *" --snapshot-volumes --include-namespaces=velero-ns --volume-snapshot-locations=aws-local-default --storage-location=default
6. Restore from backup
velero restore create --from-backup my-backup --restore-volumes=true --namespace-mappings velero-ns:ns1
Signed-off-by: Pawan <pawan@mayadata.io>
few customers are using old version of the driver where
Status field is not present. So mount will fail after the
upgrade to the 0.9 or later version.
Reverting back to the checking if finalizer is set to check if
volume is ready to be mounted.
Signed-off-by: Pawan <pawan@mayadata.io>
ZFS does not create the zvol if volume size is not multiple of
the volblocksize. There are use cases where customer will create
a PVC with size as 5G, which will be 5 * 1000 * 1000 * 1000 bytes
and this is not the multiple of default volblocksize 8k.
In ZFS, volblocksize and recordsize must be power of 2 from 512B to 1M,
so keeping the size in the form of Gi or Mi should be
sufficient to make volsize multiple of volblocksize/recordsize.
Signed-off-by: Pawan <pawan@mayadata.io>
This issue is specific to xfs only, when we create a clone volume and system is taking time in creating the device.
When we create a clone volume from a xfs filesystem, ZFS-LocalPV will go ahead and generate a new UUID for the clone volumes as we need a new UUID to mount the new clone filesystem. To generate a new UUID for the clone volume, ZFS-LocalPV first replays the xfs log by mounting the device to a tmp localtion.
Here, what is happening is since device creation is slow, so we went ahead and created the tmp location to mount the clone volume but since device has not created yet, the mount failed. In the next try since the tmp location is present, it will keep failing there only at every reconciliation time.
Signed-off-by: Pawan <pawan@mayadata.io>
Instead of checking for the finalizer, checking for the
volume state to be ready is more intuitive before mounting it.
Also removed duplicate if statement for btrfs which was added while resolveing
the merge conflict in https://github.com/openebs/zfs-localpv/pull/175.
Signed-off-by: Pawan <pawan@mayadata.io>
btrfs, like xfs, needs to generate a new UUID for the
cloned volumes. All the devices with the same UUID will be treated
same for btrfs, so here generating the new UUID for the cloned volumes
using btrfstune command.
Signed-off-by: Pawan <pawan@mayadata.io>
Applications who want to share a volume can use below storageclass
to make their volumes shared by multiple pods
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-zfspv
parameters:
shared: "yes"
fstype: "zfs"
poolname: "zfspv-pool"
provisioner: zfs.csi.openebs.io
```
Now the provisioned volume using this storageclass can be used by multiple pods.
Here pods have to make sure of the data consistency and have to have locking mechanism.
One thing to note here is pods will be scheduled to the node where volume is present
so that all the pods can use the same volume as they can access it locally only.
This was we can avoid the NFS overhead and can get the optimal performance also.
Also fixed the log formatting in the GRPC log.
Signed-off-by: Pawan <pawan@mayadata.io>
We can not mount the datasets to more than one path via zfs mount command,
shifting to the legacy way of handling ZFS volumes where we can mount/umount
the datasets via legacy mount and umount commands.
This will also add a building block for SINGLE-NODE-MULTI-WRITER Capability.
Signed-off-by: Pawan <pawan@mayadata.io>
PVC will not bound if there are wrong parameters/poolname in the storageclass,
the ZFSVolume CR will be still created and will remain in Pending State,
deletion of the PVC will delete PVC and since PVC is not bound, ZFS-LocalPV
driver will not get the delete call and will leave the ZFSVolume CR hanging there.
Reverting the behavior introduced in https://github.com/openebs/zfs-localpv/pull/121,
Now PVC will be bound but still ZFSVolume will be in Pending state until the volume is created.
Signed-off-by: Pawan <pawan@mayadata.io>
More specifically,
- introduce helper function to get maps with all keys set to lowercase,
- introduce lookup helper based on such maps and
- change lookups for CreateVolumeRequest()s and CreateVolume()s so that
parameter keys are processed as lowercase irrespective of actual
spelling.
Signed-off-by: Christopher J. Ruwe <cjr@cruwe.de>
Readonly flag does not come as mount option, it has
separate field to mention readonly flag. ZFS-LocalPV
driver should check that field and add "ro" as mountoption.
Signed-off-by: Pawan <pawan@mayadata.io>
The controller does not check whether the volume has been created or not
and return successful. Which in turn binds the pvc to the pv.
The PVC should not bound until corresponding zfs volume has been created.
Now controller will check the ZFSVolume CR state to be "Ready" before returning
successful. The CSI will retry the CreateVolume request when it will get
a error reply and when the ZFS node agent creates the ZFS volume and sets the
ZFSVolume CR state to be "Ready", the controller will return success for the
CreateVolume Request and then PVC will be bound.
Signed-off-by: Pawan <pawan@mayadata.io>