среда, 24 февраля 2021 г.

OpenShift nodes not ready state

Redeploy node certificates in Openshift 3.10 and 3.11

Окружение

  • OpenShift Enterprise Container Platform
    • 3.10
    • 3.11

Вопрос

  • I redeployed a new CA and the nodes are no longer in a Ready State.
  • How do I manually force new certificates to get created.
  • Nodes are failing to renew their certificate with the following error:
atomic-openshift-node[3715]: I0313 11:40:48.864375    3715 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
atomic-openshift-node[3715]: I0313 11:40:48.865525    3715 bootstrap.go:86] No valid private key and/or certificate found, reusing existing private key or creating a new one
atomic-openshift-node[3715]: F0313 11:40:48.893737    3715 server.go:262] failed to run Kubelet: cannot create certificate signing request: Unauthorized

Решение

To trigger only new node client/server certificates steps 1-3 can be skipped. Steps 1-3 are only needed if the OpenShift CA changed or if you are having TLS trust issues on the node.

  1. Create a new bootstrap.kubeconfig for nodes (MASTER nodes will just copy admin.kubeconfig). Run from any Master:

    # oc serviceaccounts create-kubeconfig node-bootstrapper -n openshift-infra --config /etc/origin/master/admin.kubeconfig > ~/bootstrap.kubeconfig
    
  2. JUST ON THE MASTERS, copy the admin.kubeconfig file to /etc/origin/node/bootstrap.kubeconfig. Run on each Master:

    # cp /etc/origin/master/admin.kubeconfig /etc/origin/node/bootstrap.kubeconfig
    
  3. Distribute ~/bootstrap.kubeconfig from step 1 to infra and compute nodes replacing /etc/origin/node/bootstrap.kubeconfig. Also distribute it to /etc/origin/master/bootstrap.kubeconfig on all masters (watch out, it is master subfolder, not node).

  4. Move node.kubeconfig and client-ca.crt. These will get recreated when the node service is restarted. Run on every Node (master, worker, infra):

    # mv /etc/origin/node/client-ca.crt{,.old}
    # mv /etc/origin/node/node.kubeconfig{,.old}
    
  5. Remove contents of /etc/origin/node/certificates/. Run on each Node (master, worker, infra):

    # rm -rf  /etc/origin/node/certificates
    
  6. Restart node services and jump to the next step to approve the csr's immediately:

    # systemctl restart atomic-openshift-node.service 
    
  7. Approve CSRs, 2 should be approved for each node (master, worker, infra):.

    # oc get csr -o name | xargs oc adm certificate approve
    

    If you have a large number of CSR in Pending status, the following command may be more helpful:

    # for i in $(oc get csr | grep -i Pending | awk '{ print $1 }'); do oc adm certificate approve $i ; done
    
  8. Check if the node is READY.

    # oc get node
    # for i in `oc get nodes -o jsonpath=$'{range .items[*]}{.metadata.name}\n{end}'`; do oc get --raw /api/v1/nodes/$i/proxy/healthz; echo -e "\t$i"; done

Комментариев нет:

Отправить комментарий