Fix cert-manager CA rotation race in TLS cert rotation KUTTL test#1964
Conversation
OpenStackControlPlane CRD Size Report
Threshold reference
|
| apiVersion: kuttl.dev/v1beta1 | ||
| kind: TestAssert | ||
| timeout: 900 | ||
| commands: | ||
| - script: | | ||
| echo "Waiting for cert-manager to complete re-issuance of all service certificates..." | ||
| CERTS="keystone-public-route keystone-public-svc keystone-internal-svc neutron-internal-svc glance-default-internal-svc cinder-internal-svc placement-internal-svc" | ||
| for cert in $CERTS; do | ||
| echo "Waiting for Certificate $cert to be re-issued..." | ||
| oc wait certificate/$cert -n $NAMESPACE --for=condition=Ready --timeout=300s | ||
| done | ||
| echo "Waiting for control plane to stabilize after cert re-issuance..." | ||
| oc wait openstackcontrolplane -n $NAMESPACE --for=condition=Ready --timeout=600s -l core.openstack.org/openstackcontrolplane |
There was a problem hiding this comment.
do we need this with the above ctlplane CR assert to expect true?
There was a problem hiding this comment.
Tested without it and it worked (tried it several times over to try to make sure). Removed now.
The ctlplane-tls-cert-rotation KUTTL test fails intermittently because the custom_duration patch changes both CA and leaf cert durations simultaneously. cert-manager processes Certificate resources in parallel, so leaf certs can be re-issued before the CA itself is re-issued, resulting in some certs signed by the old CA and others by the new CA. This causes cross-service SSL verification failures (e.g. neutron cannot connect to OVN NB due to CA mismatch). Fix by removing CA duration changes from the patch so only leaf cert durations change, preventing the CA key from rotating. Also add cert-manager re-issuance waits and control plane stability checks in step 03, and add retry logic to the non-API service cert check in step 04. Ref: https://redhat.atlassian.net/browse/OSPRH-32142 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3928575 to
1b1771f
Compare
Unrelated to PR changes: /test openstack-operator-build-deploy-kuttl-4-18 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abays, dprince The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
925884c
into
openstack-k8s-operators:main
|
/cherry-pick 18.0-fr6 |
|
@abays: new pull request created: #1967 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The ctlplane-tls-cert-rotation KUTTL test fails intermittently because the custom_duration patch changes both CA and leaf cert durations simultaneously. cert-manager processes Certificate resources in parallel, so leaf certs can be re-issued before the CA itself is re-issued, resulting in some certs signed by the old CA and others by the new CA. This causes cross-service SSL verification failures (e.g. neutron cannot connect to OVN NB due to CA mismatch).
Fix by removing CA duration changes from the patch so only leaf cert durations change, preventing the CA key from rotating. Also add cert-manager re-issuance waits and control plane stability checks in step 03, and add retry logic to the non-API service cert check in step 04.
Ref: https://redhat.atlassian.net/browse/OSPRH-32142
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com