Skip to content

Fix cert-manager CA rotation race in TLS cert rotation KUTTL test#1964

Merged
openshift-merge-bot[bot] merged 1 commit into
openstack-k8s-operators:mainfrom
abays:fix_tls_cert_rotate_kuttl
Jul 2, 2026
Merged

Fix cert-manager CA rotation race in TLS cert rotation KUTTL test#1964
openshift-merge-bot[bot] merged 1 commit into
openstack-k8s-operators:mainfrom
abays:fix_tls_cert_rotate_kuttl

Conversation

@abays

@abays abays commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

The ctlplane-tls-cert-rotation KUTTL test fails intermittently because the custom_duration patch changes both CA and leaf cert durations simultaneously. cert-manager processes Certificate resources in parallel, so leaf certs can be re-issued before the CA itself is re-issued, resulting in some certs signed by the old CA and others by the new CA. This causes cross-service SSL verification failures (e.g. neutron cannot connect to OVN NB due to CA mismatch).

Fix by removing CA duration changes from the patch so only leaf cert durations change, preventing the CA key from rotating. Also add cert-manager re-issuance waits and control plane stability checks in step 03, and add retry logic to the non-API service cert check in step 04.

Ref: https://redhat.atlassian.net/browse/OSPRH-32142

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

@abays abays requested review from Deydra71 and stuggi July 1, 2026 09:37
@openshift-ci openshift-ci Bot requested review from dprince and fultonj July 1, 2026 09:37
@openshift-ci openshift-ci Bot added the approved label Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

OpenStackControlPlane CRD Size Report

Metric Value
CRD JSON size 350002 bytes (342KB)
Base branch size 350002 bytes
Change +0.00%
Status yellow — growing
Threshold reference
Color Range Meaning
🟢 green < 300KB Comfortable
🟡 yellow 300–400KB Growing
🟠 orange 400–750KB Concerning
🔴 red > 750KB Approaching 1.5MB etcd limit (cut in half to allow space for update)

Comment on lines +363 to +375
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 900
commands:
- script: |
echo "Waiting for cert-manager to complete re-issuance of all service certificates..."
CERTS="keystone-public-route keystone-public-svc keystone-internal-svc neutron-internal-svc glance-default-internal-svc cinder-internal-svc placement-internal-svc"
for cert in $CERTS; do
echo "Waiting for Certificate $cert to be re-issued..."
oc wait certificate/$cert -n $NAMESPACE --for=condition=Ready --timeout=300s
done
echo "Waiting for control plane to stabilize after cert re-issuance..."
oc wait openstackcontrolplane -n $NAMESPACE --for=condition=Ready --timeout=600s -l core.openstack.org/openstackcontrolplane

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this with the above ctlplane CR assert to expect true?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested without it and it worked (tried it several times over to try to make sure). Removed now.

The ctlplane-tls-cert-rotation KUTTL test fails intermittently because
the custom_duration patch changes both CA and leaf cert durations
simultaneously. cert-manager processes Certificate resources in
parallel, so leaf certs can be re-issued before the CA itself is
re-issued, resulting in some certs signed by the old CA and others by
the new CA. This causes cross-service SSL verification failures (e.g.
neutron cannot connect to OVN NB due to CA mismatch).

Fix by removing CA duration changes from the patch so only leaf cert
durations change, preventing the CA key from rotating. Also add
cert-manager re-issuance waits and control plane stability checks in
step 03, and add retry logic to the non-API service cert check in
step 04.

Ref: https://redhat.atlassian.net/browse/OSPRH-32142

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abays abays force-pushed the fix_tls_cert_rotate_kuttl branch from 3928575 to 1b1771f Compare July 1, 2026 19:18
@abays

abays commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@abays: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
Test name Commit Details Required Rerun command
ci/prow/openstack-operator-build-deploy-kuttl-4-18 1b1771f link true /test openstack-operator-build-deploy-kuttl-4-18

Full PR test history. Your PR dashboard.
Details

Unrelated to PR changes:

{  "openstack-operator-build-deploy-kuttl-4-18" pod "openstack-operator-build-deploy-kuttl-4-18-openstack-k8s-operators-gather" failed: could not watch pod: the pod ci-op-qhdzhm2r/openstack-operator-build-deploy-kuttl-4-18-openstack-k8s-operators-gather failed after 11m54s (failed containers: test): ContainerFailed one or more containers exited

/test openstack-operator-build-deploy-kuttl-4-18

@openshift-ci

openshift-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abays, dprince

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit 925884c into openstack-k8s-operators:main Jul 2, 2026
9 checks passed
@abays

abays commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

/cherry-pick 18.0-fr6

@openshift-cherrypick-robot

Copy link
Copy Markdown

@abays: new pull request created: #1967

Details

In response to this:

/cherry-pick 18.0-fr6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants