Volumes are detached while they're still mounted on a node

### What happened?

A/D controller tries to detach volumes that are still mounted after 6 minutes. The conditions to trigger the detach are: https://github.com/kubernetes/kubernetes/blob/9a75e7b0fd1b567f774a3373be640e19b33e7ef1/pkg/volume/util/util.go#L387

I.e. all containers are finished. But that does not guarantee that volumes are unmounted. When a CSI driver can't unmount a volume for 6 minutes, the volume is force detached by A/D controller, corrupting the volume and leaving volume mounts on the node in a bad state. Note that the node is perfectly healthy, it just can't unmount a volume from whatever reason.

### What did you expect to happen?

A/D controller detaches volumes after a Pod is deleted from the API server, i.e. the volumes are unmounted or someone force-deleted the pods.

### How can we reproduce it (as minimally and precisely as possible)?

Very artificial reproducer:

1. Run a Pod with an attachable CSI volume.
2. On the node, make the volume mount busy, e.g.:
    ```
    cd /var/lib/kubelet/pods/f377d5c5-1463-4d55-ad80-51a3e12e2251/volumes/kubernetes.io~csi/pvc-2443e93b-380f-4015-bda9-3a71ddc6e737/mount
    ```
    Every unmount of the volume fails and kubelet/the CSI driver periodically retry.
3. Delete the pod
4. After 6 minutes, see the volume is detached without unmounting first.

KCM logs:
```
W1129 10:46:55.629375    9230 reconciler.go:224] attacherDetacher.DetachVolume started for volume "pvc-2443e93b-380f-4015-bda9-3a71ddc6e737" (UniqueName: "kubernetes.io/csi/ebs.csi.aws.com^vol-0e6b91392bdce66f3") on node "ip-172-18-5-140.ec2.internal" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching
```

Note that the Pod is still in the API server:
```
NAME      READY   STATUS        RESTARTS   AGE
testpod   0/1     Terminating   0          9m26s
```

And the node is healthy:
```
NAME                           STATUS   ROLES    AGE   VERSION
ip-172-18-5-140.ec2.internal   Ready    <none>   10m   v1.24.0-alpha.0.9+9a75e7b0fd1b56-dirty
```

### Anything else we need to know?

Note that some CSI drivers (e.g. the cloud ones) do not allow detaching a volume while it's still mounted. Not all CSI driver have such possibility and they rely on Kubernetes not to call ControllerUnpublish before NodeUnpublish / NodeUnstage succeeds.

### Kubernetes version

<details>
Today's master
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Volumes are detached while they're still mounted on a node #106710

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Volumes are detached while they're still mounted on a node #106710

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions