Header Shadow Image


OpenShift w/ Kubernetes Setup: Installing using the UPI Method

Building an OpenShift Kubernetes Cluster. Method used here will be the UPI installation method.  Start off by loading the official page from RedHat:

https://i0.wp.com/www.microdevsys.com/WordPressIMages/KubernetesAndOpenShift.PNG?ssl=1

Before you begin, ensure the following files are downloaded off the RedHat OpenShift pages (see links in the above document):

/root/openshift # ls -altri
total 439680
201572861 -rw-r–r–.  1 root        root              706 Apr 25 04:15 README.md
201572704 -rwxr-xr-x.  1 root        root        360710144 Apr 25 04:15 openshift-install
201572859 -rw-rw-r–.  1 tom@mds.xyz tom@mds.xyz      2775 May  8 22:53 pull-secret.txt
201572858 -rw-rw-r–.  1 tom@mds.xyz tom@mds.xyz  89491042 May  8 22:55 openshift-install-linux.tar.gz
201572850 drwxr-xr-x.  3 root        root             4096 May  8 23:58 .
201326721 dr-xr-x—. 12 root        root             4096 May  9 08:43 ..

Extract the .tar.gz using:

tar -zxf openshift-install-linux.tar.gz

Build and use the following staging machine.  This staging machine can be any basic RHEL 7 node.  It is where OS and Kubernetes commands will be executed from:

oss01.unix.lab.com

Following are the IP's and hosts for the cluster:

rhbs01.osc01.unix.lab.com       10.0.0.5

rhcpm01.osc01.unix.lab.com      10.0.0.6

rhcpm02.osc01.unix.lab.com      10.0.0.7

rhcpm03.osc01.unix.lab.com      10.0.0.8

hk01.osc01.unix.lab.com         192.168.0.196

hk02.osc01.unix.lab.com         192.168.0.232

rhwn01.osc01.unix.lab.com       10.0.0.9

rhwn02.osc01.unix.lab.com       10.0.0.10

rhwn03.osc01.unix.lab.com       10.0.0.11

VIP's:

api.osc01.unix.lab.com         192.168.0.70

api-int.osc01.unix.lab.com     192.168.0.70

HAProxy/ Keepalived

The two hosts above, hk01 and hk02 are the haproxy and keepalived servers for the installation.  More on that and load balancing below. 

Installation Instructions

To create the above, FreeIPA was used to create the subdomain.  Below are the images highlighting how these were created.  If not FreeIPA, a manual DNS configuration will be required similar to the one on the OpenShift page above.

  • Create the Zone
  • Add the hosts to the new zone, including the PTR (reverse) entries

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-Configuring-FreeIPA-Zone.PNG?ssl=1

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-Configuring-FreeIPA-Forward-Zone-Details.PNG?ssl=1

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-Configuring-FreeIPA-Defining-A-Dedicated-Zone.PNG?ssl=1

 

(Optional) Ansible Code to speed up host deployment.   (  Not necessary since rhcos machines are being used.  )  Can be added if central management is required.

/ansible # tail -n 30 infra

[os-all:children]
os-bstrap
os-cpm
os-hk
os-wn

[os-bstrap]
rhbs01.osc01.unix.lab.com

[os-cpm]
rhcpm01.osc01.unix.lab.com
rhcpm02.osc01.unix.lab.com
rhcpm03.osc01.unix.lab.com

[os-hk]
hk01.osc01.unix.lab.com
hk02.osc01.unix.lab.com

[os-wn]
rhwn01.osc01.unix.lab.com
rhwn02.osc01.unix.lab.com
rhwn03.osc01.unix.lab.com

(Not Required) IPA

ipa-client-install –uninstall;

ipa-client-install –force-join -p autojoin -w "NotMyPass" –fixed-primary –server=idmipa01.unix.lab.com –server=idmipa02.unix.lab.com –domain=unix.lab.com –realm=unix.lab.com -U –hostname=$(hostname);

ipa-client-automount –location=UserHomeDir01 -U;

authconfig –enablesssd –enablesssdauth –enablemkhomedir –updateall –update;

 

(Not Required) krb5.conf

# cat /etc/krb5.conf
#File modified by ipa-client-install

includedir /etc/krb5.conf.d/
includedir /var/lib/sss/pubconf/krb5.include.d/

[libdefaults]
  default_realm = OSC01.unix.lab.com
  dns_lookup_realm = false
  dns_lookup_kdc = true
  rdns = false
  dns_canonicalize_hostname = true
  ticket_lifetime = 24h
  forwardable = true
  udp_preference_limit = 0
  default_ccache_name = KEYRING:persistent:%{uid}


[realms]

  OSC01.unix.lab.com = {
    kdc = idmipa01.unix.lab.com:88
    master_kdc = idmipa01.unix.lab.com:88
    admin_server = idmipa01.unix.lab.com:749
    kpasswd_server = idmipa01.unix.lab.com:464
    kdc = idmipa02.unix.lab.com:88
    master_kdc = idmipa02.unix.lab.com:88
    admin_server = idmipa02.unix.lab.com:749
    kpasswd_server = idmipa02.unix.lab.com:464
    default_domain = osc01.unix.lab.com
    pkinit_anchors = FILE:/var/lib/ipa-client/pki/kdc-ca-bundle.pem
    pkinit_pool = FILE:/var/lib/ipa-client/pki/ca-bundle.pem

  }

  unix.lab.com = {
    kdc = idmipa01.unix.lab.com:88
    master_kdc = idmipa01.unix.lab.com:88
    admin_server = idmipa01.unix.lab.com:749
    kpasswd_server = idmipa01.unix.lab.com:464
    kdc = idmipa02.unix.lab.com:88
    master_kdc = idmipa02.unix.lab.com:88
    admin_server = idmipa02.unix.lab.com:749
    kpasswd_server = idmipa02.unix.lab.com:464
    default_domain = unix.lab.com
    pkinit_anchors = FILE:/var/lib/ipa-client/pki/kdc-ca-bundle.pem
    pkinit_pool = FILE:/var/lib/ipa-client/pki/ca-bundle.pem

  }

  MDS.XYZ = {
    kdc = ad.lab.com
    default_domain = mds.xyz
  }

[domain_realm]
  .unix.lab.com = unix.lab.com
  unix.lab.com = unix.lab.com
  bs01.osc01.unix.lab.com = unix.lab.com
  .lab.com = MDS.XYZ
  mds.xyz = MDS.XYZ
  .osc01.unix.lab.com = OSC01.unix.lab.com
  osc01.unix.lab.com = OSC01.unix.lab.com

 

( Not Required ) sssd.conf


[root@bs01 home]#
[root@bs01 home]# cat /etc/sssd/sssd.conf
[domain/unix.lab.com]

cache_credentials = True
krb5_store_password_if_offline = True
ipa_domain = unix.lab.com
id_provider = ipa
auth_provider = ipa
access_provider = ipa
ldap_tls_cacert = /etc/ipa/ca.crt
ipa_hostname = bs01.osc01.unix.lab.com
chpass_provider = ipa
ipa_server = idmipa01.unix.lab.com, idmipa02.unix.lab.com
dns_discovery_domain = unix.lab.com
autofs_provider = ipa
ipa_automount_location = UserHomeDir01

dyndns_update = True
dyndns_update_ptr = True
ldap_schema = ad
ldap_id_mapping = True

override_homedir = /n/%d/%u
# fallback_homedir = /n/%d/%u
# ldap_user_home_directory = unixHomeDirectory

[nss]
homedir_substring = /home

[sssd]
services = nss, sudo, pam, autofs, ssh

domains = unix.lab.com

[pam]

[sudo]

[autofs]

[ssh]

[pac]

[ifp]

[secrets]

[session_recording]

 

Generate a key pair on the staging node where running of oc, kubectl and installation will take place.

oss01.unix.lab.com

ssh-keygen -t ed25519 -N ''     -f /root/.ssh/id_rsa-osc01

eval "$(ssh-agent -s)"

ssh-add /root/.ssh/id_rsa-osc01

Create a configuration file:

install-config.yml

# cat install-config.yaml
apiVersion: v1
baseDomain: unix.lab.com
compute:
– hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: osc01
platform:
  vsphere:
    vcenter: vcsa01.unix.lab.com
    username: openshift
    password: S3cretP@ssw0rdR#ally
    datacenter: mds.xyz
    defaultDatastore: mdsesxip05-d01
    folder: "/mds.xyz/vm/OpenShift"
fips: false

pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K29jbV9hY2Nlc3NfMjI2ZjlkZDFiODg4NDdkOGI2NWFmOTNiNDg1ZDk5Mzg6MVRXUUwzQ1FWQkhUSlpaTURTQ0tWVlgyU0U4UU5VRDRDRUtXVlIwN01MRjczV041NktIR0Q0M0JaVzNBMkdHOA==","email":"somedude AT microdevsys DOT com"},"quay.io":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K29jbV9hY2Nlc3NfMjI2ZjlkZDFiODg4NDdkOGI2NWFmOTNiNDg1ZDk5Mzg6MVRXUUwzQ1FWQkhUSlpaTURTQ0tWVlgyU0U4UU5VRDRDRUtXVlIwN01MRjczV041NktIR0Q0M0JaVzNBMkdHOA==","email":"somedude AT microdevsys DOT com"},"registry.connect.redhat.com":{"auth":"fHVoYy1wb29sLWYyMGU3ODExLTY1NjctNDBlZC05MWExLTUwYjgxZGVhNDY4ZDpleUpoYkdjaU9pSlNVelV4TWlKOS5leUp6ZFdJaU9pSTNZV1UxWldFeU1EaGxORFEwTlRGa09HUTBaR1F5TTJJNE5HVmpZMlV6WkNKOS5SQXMta2g1bFZUTDdQT0pvNEZ3UUZKN3o4c1NCOVQ3WEhQV2VvMkoyakU2cnVMS3VEZHlMWWlGcEhoOVJSZVFyTzVaa0xodGt4aVRqRDBJV3pMYzdzR3dfMThfc0thejZYaTNrM3pmZ2RuWS1YbnlPbHU1RGhGdnEyUW5GcFRIWFBsUFB2SG85OVNndG00dnkwVk5OSXE3SjJ1TUVPNE84c2wzdXJ1X0JNSkNUX1FTeFIyUVViTTVFaFViWUM1blF6LVV2VEo3VlpnR2hqZDVvQ3Z2Y3FvWnc3bXJkUlFvQTNuUUl2MGRrb3hXN2lVZXh1cVl2RDZFdFRyVFFoUnNrRkVTVV9pZURGMDNhSWlsYnRsZFRqNTBGQXE1bzllbW1HZTdITHFyVGY2d2FJS3UxUnpHbkdCOUN1ZWpZaGowSU9GUmVsNFdES2ItMGJrbVZTdjRtXzJPSllkcEJYc0lVaFlKTHFpZFdxLXFaVlBMRzQ5Q1JaRTUwWnZGcDl2ckZrZU5yZnJKdzdtOUVTTUNIbms5UW9fQ1hXZlRmTlBvWWhNdDhmZUJBQi1GNlp6Mnl4Ni0wYzJOMjdob005ZlVuREdxWXpTbk1OZFRvY05vNkl1SWExZ0NmNnlaenNOWHdMLURlZVBOUnhzMHAtUld3UkZGME5xd0VsUEhycEhVWHg0MnVHSGh3Y0dHYUJsczk1eDBYeXFEM1JoYzdySjdaWUNkVko1OGhCNURoWDc0QjhrWjNxOTVfdmtPX1Jtd1Nvcy1sZ09ITTNLWFVlMUNvSWUzVzlJT2l4STNFLUVWd3hFTkNyRFFLck04QlB4NjhUVHlxN2JTeUxFUjZ5OGFxZERjT0ZSaE4xM1FDT1I3bmRGaUVyUGRkRWxaRmh4Tm1NU2NuYnhPMkdoRQ==","email":"somedude AT microdevsys DOT com"},"registry.redhat.io":{"auth":"fHVoYy1wb29sLWYzMGU3ODExLTY1NjctNDBlZC05MWExLTUwYjgxZGVhNDY4ZDpleUpoYkdjaU9pSlNVelV4TWlKOS5leUp6ZFdJaU9pSTNZV1UxWldFeU1EaGxORFEwTlRGa09HUTBaR1F5TTJJNE5HVmpZMlV6WkNKOS5SQXMta2g1bFZUTDdQT0pvNEZ3UUZKN3o4c1NCOVQ3WEhQV2VvMkoyakU2cnVMS3VEZHlMWWlGcEhoOVJSZVFyTzVaa0xodGt4aVRqRDBJV3pMYzdzR3dfMThfc0thejZYaTNrM3pmZ2RuWS1YbnlPbHU1RGhGdnEyUW5GcFRIWFBsUFB2SG85OVNndG00dnkwVk5OSXE3SjJ1TUVPNE84c2wzdXJ1X0JNSkNUX1FTeFIyUVViTTVFaFViWUM1clF6LVV2VEo3VlpnR2hqZDVvQ3Z2Y3FvWnc3bXJkUlFvQTNuUUl2MGRrb3hXN2lVZXh1cVl2RDZFdFRyVFFoUnNrRkVTVV9pZURGMDNhSWlsYnRsZFRqNTBGQXE1bzllbW1HZTdITHFyVGY2d2FJS3UxUnpHbkeCOUN1ZWpZaGowSU9GUmVsNFdES2ItMGJrbVZTdjRtXzJPSllkcEJYc0lVaFlKTHFpZFdxLXFaVlBMRzQ5Q1JaRTUwWnZGcDl2ckZrZU5yZnJKdzdtOUVTTUNIbms5UW9fQ1hXZlRmTlBvWWhNdDhmZUJBQi1GNlp6Mnl4Ni0wYzJOMjdob005ZlVuREdxWXpTbk1OZFRvY05vNkl1SWExZ0NmNnlaenNOWHdMLURlZVBOUnhzMHAtUld3UkZGME5xd0VrUEhycEhVWHg0MnVHSGh3Y0dHYUJsczk1eDBYeXFEM1JoYzdySjdaWUNkVko1OGhCNURoWDc0QjhrWjNxOTVfdmtPX1Jtd1Nvcy1sZ09ITTNLWFVlMUNvSWUzVzlJT2l4STNFLUVWd3hFTkNyRFFLck04QlB4NjhUVHlxN2JTeUxFUjZ5OGFxZERjT0ZSaE4xM1FDT1I3bmRGaUVyUGRkRWxaRmh4Tm1NU2NuYnhPMkdoRQ==","email":"somedude AT microdevsys DOT com"}}}'
sshKey: 'ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHr4AuezxQ/azAAfHLa9+HCqGZthewYf2yNNPQ6uwDhd root@awx01.unix.lab.com'

Execute the manifest creation (Assuming /root/openshift/install/ will be the location of your installation configuration:

./openshift-install create manifests –dir=/root/openshift/install/

Ensure the following file has mastersSchedulable set to false :

cat manifests/cluster-scheduler-02-config.yml

apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
  creationTimestamp: null
  name: cluster
spec:
  mastersSchedulable: false
  policy:
    name: ""
status: {}

And remove the following yaml files (explanation in doc above)

rm -f openshift/99_openshift-cluster-api_master-machines-*.yaml openshift/99_openshift-cluster-api_worker-machineset-*.yaml

Next, generate the configuration files.

./openshift-install create ignition-configs –dir=/root/openshift/install/

Convert the .ign files to base64:

# history | grep base64
base64 -w0 master.ign > master.64
base64 -w0 worker.ign > worker.64
base64 -w0 bootstrap.ign > bootstrap.64
base64 -w0 https-bootstrap.ign > https-bootstrap.64

# cat https-bootstrap.ign
{
  "ignition": {
    "config": {
      "merge": [
        {
          "source": "http://192.168.0.142/bootstrap.ign"
        }
      ]
    },
    "version":"3.1.0"
  }
}
#

Note how the https-bootstrap.ign refers to an HTTP server.  Because the bootstrap.ign is too big, there is a need to host it on a separate HTTPS server so it's pulled down by the VMware configuration on startup.  The file hosted on an HTTP sever:

# ls -altri /var/www/html/bootstrap.ign
571421 -rw-r–r–. 1 root root 291591 May  9 00:04 /var/www/html/bootstrap.ign

Configure the HAproxy and Keepalived Nodes.  Please pay careful attention to the commented lines below before the rhbs01 nodes.  The bootstrap should be uncommented when creating the master nodes.  However, these entries should be commented out once the Master / Control Plane nodes are created.  The Workers are created from the master nodes.  The configuration files for both nodes.  In addition, the HAProxy files are identical.  The Keepalived files are NOT identical;

hk01

# cat /etc/haproxy/haproxy.cfg
global
        log                     127.0.0.1:514                   local0  debug
        pidfile                 /var/run/haproxy.pid
        maxconn                 4000
        user                    haproxy
        group                   haproxy

        stats socket            /etc/haproxy/stats
        tune.ssl.default-dh-param 2048
        daemon
        debug
        maxconn 4096

defaults
        mode                    tcp
        log                     global
        option                  dontlognull
        option                  redispatch
        retries                 3
        timeout queue           1m
        timeout connect         10s
        timeout client          3m
        timeout server          3m
        timeout http-keep-alive 10s
        timeout check           10s
        maxconn                 30000


listen cm
        bind api-int:80
        mode    tcp
        redirect scheme https if !{ ssl_fc }

frontend osin
        bind    api-int:443                                      # ssl crt  /etc/haproxy/certs/api-int.osc01.unix.lab.com-haproxy.pem no-sslv3
        default_backend osback

backend osback
        mode tcp
        balance roundrobin

        server rhcpm01.osc01.unix.lab.com rhcpm01.osc01.unix.lab.com:443 check
        server rhcpm02.osc01.unix.lab.com rhcpm02.osc01.unix.lab.com:443 check
        server rhcpm03.osc01.unix.lab.com rhcpm03.osc01.unix.lab.com:443 check
        server rhwn01.osc01.unix.lab.com rhwn01.osc01.unix.lab.com:443 check
        server rhwn02.osc01.unix.lab.com rhwn02.osc01.unix.lab.com:443 check
        server rhwn03.osc01.unix.lab.com rhwn03.osc01.unix.lab.com:443 check


frontend bscpm6443in
        log                             127.0.0.1:514   local0          debug
        bind    api-int:6443
        default_backend bscpm6443back

backend bscpm6443back
        log                             127.0.0.1:514   local0          debug
        mode tcp
        balance source

#        server rhbs01.osc01.unix.lab.com rhbs01.osc01.unix.lab.com:6443 check
        server rhcpm01.osc01.unix.lab.com rhcpm01.osc01.unix.lab.com:6443 check
        server rhcpm02.osc01.unix.lab.com rhcpm02.osc01.unix.lab.com:6443 check
        server rhcpm03.osc01.unix.lab.com rhcpm03.osc01.unix.lab.com:6443 check


frontend bscpm22623in
        log                             127.0.0.1:514   local0          debug
        bind    api-int:22623
        default_backend bscpm22623back

backend bscpm22623back
        log                             127.0.0.1:514   local0          debug
        mode tcp
        balance source

#        server rhbs01.osc01.unix.lab.com rhbs01.osc01.unix.lab.com:22623 check
        server rhcpm01.osc01.unix.lab.com rhcpm01.osc01.unix.lab.com:22623 check
        server rhcpm02.osc01.unix.lab.com rhcpm02.osc01.unix.lab.com:22623 check
        server rhcpm03.osc01.unix.lab.com rhcpm03.osc01.unix.lab.com:22623 check


listen stats
        bind :9000
        mode http
        stats enable
        stats hide-version
        stats realm Haproxy\ Statistics
        stats uri /haproxy-stats
        stats auth admin:n0tmypass

 

hk02

# cat /etc/haproxy/haproxy.cfg
global
        log                     127.0.0.1:514                   local0  debug
        pidfile                 /var/run/haproxy.pid
        maxconn                 4000
        user                    haproxy
        group                   haproxy

        stats socket            /etc/haproxy/stats
        tune.ssl.default-dh-param 2048
        daemon
        debug
        maxconn 4096

defaults
        mode                    tcp
        log                     global
        option                  dontlognull
        option                  redispatch
        retries                 3
        timeout queue           1m
        timeout connect         10s
        timeout client          3m
        timeout server          3m
        timeout http-keep-alive 10s
        timeout check           10s
        maxconn                 30000


listen cm
        bind api-int:80
        mode    tcp
        redirect scheme https if !{ ssl_fc }

frontend osin
        bind    api-int:443                                                       # ssl crt  /etc/haproxy/certs/api-int.osc01.unix.lab.com-haproxy.pem no-sslv3
        default_backend osback

backend osback
        mode tcp
        balance roundrobin

        server rhcpm01.osc01.unix.lab.com rhcpm01.osc01.unix.lab.com:443 check
        server rhcpm02.osc01.unix.lab.com rhcpm02.osc01.unix.lab.com:443 check
        server rhcpm03.osc01.unix.lab.com rhcpm03.osc01.unix.lab.com:443 check
        server rhwn01.osc01.unix.lab.com rhwn01.osc01.unix.lab.com:443 check
        server rhwn02.osc01.unix.lab.com rhwn02.osc01.unix.lab.com:443 check
        server rhwn03.osc01.unix.lab.com rhwn03.osc01.unix.lab.com:443 check


frontend bscpm6443in
        log                             127.0.0.1:514   local0          debug
        bind    api-int:6443
        default_backend bscpm6443back

backend bscpm6443back
        log                             127.0.0.1:514   local0          debug
        mode tcp
        balance source

#        server rhbs01.osc01.unix.lab.com rhbs01.osc01.unix.lab.com:6443 check
        server rhcpm01.osc01.unix.lab.com rhcpm01.osc01.unix.lab.com:6443 check
        server rhcpm02.osc01.unix.lab.com rhcpm02.osc01.unix.lab.com:6443 check
        server rhcpm03.osc01.unix.lab.com rhcpm03.osc01.unix.lab.com:6443 check


frontend bscpm22623in
        log                             127.0.0.1:514   local0          debug
        bind    api-int:22623
        default_backend bscpm22623back

backend bscpm22623back
        log                             127.0.0.1:514   local0          debug
        mode tcp
        balance source

#        server rhbs01.osc01.unix.lab.com rhbs01.osc01.unix.lab.com:22623 check
        server rhcpm01.osc01.unix.lab.com rhcpm01.osc01.unix.lab.com:22623 check
        server rhcpm02.osc01.unix.lab.com rhcpm02.osc01.unix.lab.com:22623 check
        server rhcpm03.osc01.unix.lab.com rhcpm03.osc01.unix.lab.com:22623 check


listen stats
        bind :9000
        mode http
        stats enable
        stats hide-version
        stats realm Haproxy\ Statistics
        stats uri /haproxy-stats
        stats auth admin:n0tmypass

Likewise, keepalived configuration for both nodes:

hk01 ( master )

# cat /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy {
        script "killall -0 haproxy"             # check the haproxy process
        interval 2                              # every 2 seconds
        weight 2                                # add 2 points if OK
}

vrrp_track_file fail-70 {
        file /etc/keepalived/vrrp-70
}

vrrp_instance ins-70 {
        interface eth0                          # interface to monitor
        state MASTER                            # MASTER on haproxy1, BACKUP on haproxy2
        virtual_router_id 70                    # Set to last digit of cluster IP.
        priority 110                            # 101 on haproxy1, 100 on haproxy2

        authentication {
                auth_type PASS
                auth_pass password70
        }

        virtual_ipaddress {
                delay_loop 12
                lb_algo wrr
                lb_kind DR
                protocol TCP
                192.168.0.70                    # virtual ip address
        }

        track_file {
                fail-70 weight 0
        }

        track_script {
                chk_haproxy
        }
}

 

hk02 ( slave )

# cat /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy {
        script "killall -0 haproxy"             # check the haproxy process
        interval 2                              # every 2 seconds
        weight 2                                # add 2 points if OK
}

vrrp_track_file fail-70 {
        file /etc/keepalived/vrrp-70
}

vrrp_instance ins-70 {
        interface eth0                          # interface to monitor
        state BACKUP                            # MASTER on haproxy1, BACKUP on haproxy2
        virtual_router_id 70                    # Set to last digit of cluster IP.
        priority 100                            # 101 on haproxy1, 100 on haproxy2

        authentication {
                auth_type PASS
                auth_pass password70
        }

        virtual_ipaddress {
                delay_loop 12
                lb_algo wrr
                lb_kind DR
                protocol TCP
                192.168.0.70                    # virtual ip address
        }

        track_file {
                fail-70 weight 0
        }

        track_script {
                chk_haproxy
        }
}

 

VMware Configuration

Now that the dependent services infrastructure is configured, it's time to install the OVA in vSphere Client.  Before this is done, it's worthwhile to mention the high-level plan:

  1. The entire build must occur within 24 hours.  Otherwise, the installations will fail if not done by the expiry time.  Please see below for more info on verifying this.
  2. Deploy the RedHat OpenShift Core OS OVA
  3. Adjust the parameters of the Core OS VM instance such as memory and CPU.
  4. Set the Advanced Configuration to add the following parameters: guestinfo.ignition.config.data.encodingdisk.EnableUUID, guestinfo.ignition.config.dataguestinfo.afterburn.initrd.network-kargs
  5. Clone the Core OS instance to buildout the BootStrap node.  Verify it comes up.
  6. Clone and build out the Master / Control Plane nodes.  These will boot off and get the configuration of the BootStrap node, assuming everything went well with the BootStrap node creation.
  7. Comment out the bootstrap node from the HAProxy configuration.  The worker nodes will connect to and build out from the master nodes.
  8. Accept certificates that will allow the worker nodes to complete installing.
  9. Verify!

Configuration Table

guestinfo.ignition.config.data.encoding = base64
disk.EnableUUID = TRUE
guestinfo.ignition.config.data = <ONE OF THE BASE64 FILE CONTENTS>

guestinfo.afterburn.initrd.network-kargs = <IP SETTINGS FROM BELOW TABLE>

Host IP Settings
rhbs01.osc01.unix.lab.com ip=10.0.0.105::10.0.0.1:255.255.255.0:rhbs01.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102
   
rhcpm01.osc01.unix.lab.com ip=10.0.0.106::10.0.0.1:255.255.255.0:rhcpm01.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102
rhcpm02.osc01.unix.lab.com ip=10.0.0.107::10.0.0.1:255.255.255.0:rhcpm02.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102
rhcpm03.osc01.unix.lab.com ip=10.0.0.108::10.0.0.1:255.255.255.0:rhcpm03.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102
   
rhwn01.osc01.unix.lab.com ip=10.0.0.109::10.0.0.1:255.255.255.0:rhwn01.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102
rhwn02.osc01.unix.lab.com ip=10.0.0.110::10.0.0.1:255.255.255.0:rhwn02.osc01.unix.lab.com::none nameserver=192.168.0.446 nameserver=10.100.0.101 nameserver=10.100.0.102
rhwn03.osc01.unix.lab.com ip=10.0.0.111::10.0.0.1:255.255.255.0:rhwn03.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102

 

Install the RedHat OpenShift Core OS OVA

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-Deploy-RedHat-CoreOS-OVA.PNG?ssl=1

Using the Core OS you just deployed, adjust the properties. Note the minimum requirements.  Need at least that if not more resources for each VM. 

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-CoreOS-OVA-Machine-Settings.PNG?ssl=1

Next, set some of the parameters that are common to all the VM's.  

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-CoreOS-OVA-Advanced-Settings.PNG?ssl=1

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-CoreOS-OVA-Advanced-Configuration-Parameters.PNG?ssl=1

Save the image above.  Next, clone the Core OS to build out the bootstrap node.  

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-Clone-To-Bootstrap-Node.PNG?ssl=1

Recall, in the above bootstrap configuration section, https-bootstrap.64 ( ie https-bootstrap.ign ) will be used to pull down the configuration from the HTTP web server.

Adding a serial console log option to the VM can also help in troubleshooting issues.  This can be done as follows:

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-Machine-Config-Serial-Port-For-Logging.PNG?ssl=1

Startup the bootstrap node and monitor its installation.  To do so, ssh in and use the below-highlighted command to view the messages:

# ssh -i /root/.ssh/id_rsa-osc01 core@rhbs01.osc01.unix.lab.com
Red Hat Enterprise Linux CoreOS 47.83.202103251640-0
  Part of OpenShift 4.7, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.7/architecture/architecture-rhcos.html


This is the bootstrap node; it will be destroyed when the master is fully up.

The primary services are release-image.service followed by bootkube.service. To watch their status, run e.g.

  journalctl -b -f -u release-image.service -u bootkube.service
Last login: Sun May  9 12:35:19 2021 from 192.168.0.242
[core@rhbs01 ~]$

Verify that two key ports are open on the bootstrap node before proceeding.  ( Additional logging is available under /var/log/containers )

[root@rhbs01 log]# netstat -pnltu|grep -Ei machine-config
tcp6       0      0 :::22623                :::*                    LISTEN      2804/machine-config
tcp6       0      0 :::22624                :::*                    LISTEN      2804/machine-config
[root@rhbs01 log]#

IMPORTANT: Verify the certificate expiration time.  This is the time that is allowed for a cluster install.

# echo | openssl s_client -connect rhcpm02.osc01.unix.lab.com:6443 | openssl x509 -noout -text 2>&1 | grep -Ei "Not Before|Not After"
depth=1 OU = openshift, CN = kube-apiserver-service-network-signer
verify error:num=19:self signed certificate in certificate chain
DONE
            Not Before: May  9 13:24:20 2021 GMT
            Not After : Jun  8 13:24:21 2021 GMT

#

More details on this can be found here:  https://github.com/openshift/installer/issues/1792

Next, boot up the master / control plane nodes, initially one by one to troubleshoot before kicking off any more. When installing the first time, this will allow you time to troubleshoot any issues you may have. 

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-Master-Control-Plane-Setup.PNG?ssl=1

SSH to each machine to monitor the progress.  Clone the Core OS image to two more masters.  Once complete, edit the HAproxy configuration on the above listed nodes to remove the bootstrap node once all masters are completed. Restart HAproxy on each node after the configuration.  Verify the master nodes are all ready:

# oc get nodes
NAME                        STATUS   ROLES    AGE   VERSION
rhcpm01.osc01.unix.lab.com   Ready    master   11h   v1.20.0+7d0a2b2
rhcpm02.osc01.unix.lab.com   Ready    master   10h   v1.20.0+7d0a2b2
rhcpm03.osc01.unix.lab.com   Ready    master   10h   v1.20.0+7d0a2b2
#

Next, configure and bootup the worker nodes.  The exact same sequence applied as for the master nodes with the exception that the workers will boot off and use the configuration off the master nodes behind the HAProxy configuration.  For the worker nodes, there is an additional step to check and accept the certificates:

# oc get csr
NAME        AGE     SIGNERNAME                                    REQUESTOR                               CONDITION
csr-26944   117m    kubernetes.io/kubelet-serving                 system:node:rhwn01.osc01.unix.lab.com    Pending
csr-2shv7   148m    kubernetes.io/kubelet-serving                 system:node:rhwn01.osc01.unix.lab.com    Pending
csr-4fxhf   9m35s   kubernetes.io/kubelet-serving                 system:node:rhwn01.osc01.unix.lab.com    Pending
csr-4w29l   8h      kubernetes.io/kubelet-serving                 system:node:rhwn01.osc01.unix.lab.com    Pending

Accept any certificates in the process. Example using a for look over a large number of certificates:  

# for cert in $( cat file.txt ); do oc adm certificate approve $cert; done

Alternately, run the following:  

# oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs –no-run-if-empty oc adm certificate approve

Once complete, all workers should be ready:  

# oc get nodes
NAME                        STATUS   ROLES    AGE     VERSION
rhcpm01.osc01.unix.lab.com   Ready    master   13h     v1.20.0+7d0a2b2
rhcpm02.osc01.unix.lab.com   Ready    master   12h     v1.20.0+7d0a2b2
rhcpm03.osc01.unix.lab.com   Ready    master   12h     v1.20.0+7d0a2b2
rhwn01.osc01.unix.lab.com    Ready    worker   10h     v1.20.0+7d0a2b2
rhwn02.osc01.unix.lab.com    Ready    worker   23m     v1.20.0+7d0a2b2
rhwn03.osc01.unix.lab.com    Ready    worker   7m15s   v1.20.0+7d0a2b2

Check status of cluster and components:

[root@rhbs01 ~]# bootupctl status
Component EFI
  Installed: grub2-efi-x64-1:2.02-90.el8_3.1.x86_64,shim-x64-15-16.el8.x86_64
  Update: At latest version
No components are adoptable.
CoreOS aleph image ID: rhcos-47.83.202103251640-0-qemu.x86_64.qcow2
Boot method: BIOS
[root@rhbs01 ~]#

Confirm cluster operators:  

# oc get clusteroperators
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                       False       True          True       13h
baremetal                                  4.7.9     True        False         False      13h
cloud-credential                           4.7.9     True        False         False      22h
cluster-autoscaler                         4.7.9     True        False         False      13h
config-operator                            4.7.9     True        False         False      13h
console                                    4.7.9     False       True          True       11h
csi-snapshot-controller                    4.7.9     True        False         False      13h
dns                                        4.7.9     True        False         False      13h
etcd                                       4.7.9     True        False         False      13h
image-registry                             4.7.9     True        False         False      13h
ingress                                    4.7.9     True        False         True       11h
insights                                   4.7.9     True        False         False      13h
kube-apiserver                             4.7.9     True        False         False      13h
kube-controller-manager                    4.7.9     True        False         False      13h
kube-scheduler                             4.7.9     True        False         False      13h
kube-storage-version-migrator              4.7.9     True        False         False      11h
machine-api                                4.7.9     True        False         False      13h
machine-approver                           4.7.9     True        False         False      13h
machine-config                             4.7.9     True        False         False      13h
marketplace                                4.7.9     True        False         False      13h
monitoring                                 4.7.9     True        False         False      11h
network                                    4.7.9     True        False         False      13h
node-tuning                                4.7.9     True        False         False      13h
openshift-apiserver                        4.7.9     True        False         False      13h
openshift-controller-manager               4.7.9     True        False         False      137m
openshift-samples                          4.7.9     True        False         False      13h
operator-lifecycle-manager                 4.7.9     True        False         False      13h
operator-lifecycle-manager-catalog         4.7.9     True        False         False      13h
operator-lifecycle-manager-packageserver   4.7.9     True        False         False      13h
service-ca                                 4.7.9     True        False         False      13h
storage                                    4.7.9     True        False         True       13h

Edit and configure the parameters for each operator above.  For example:

# oc edit console.config.openshift.io cluster
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: config.openshift.io/v1
kind: Console
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    release.openshift.io/create-only: "true"
  creationTimestamp: "2021-05-09T05:13:36Z"
  generation: 1
  name: cluster
  resourceVersion: "108939"
  selfLink: /apis/config.openshift.io/v1/consoles/cluster
  uid: 3ea99342-706d-4149-91e3-a2107fe75f65
spec: {}
status:
  consoleURL: https://console-openshift-console.apps.osc01.nix.mds.xyz

The above will also provide the console link you can use to access the OpenShift UI through:

https://console-openshift-console.apps.osc01.nix.mds.xyz

Howevre, per the above, the console was not yet ready.  Turns out there was a misconfiguration in the HAProxy file:

listen cm
        bind api-int:80
        mode    http
        redirect scheme https if !{ ssl_fc }

frontend osin
        bind    api-int:443 # ssl crt  /etc/haproxy/certs/api-int.osc01.nix.mds.xyz-haproxy.pem no-sslv3
        default_backend osback

backend osback
        mode http
        balance roundrobin

It should be:

listen cm
        bind api-int:80
        mode   tcp
        redirect scheme https if !{ ssl_fc }

frontend osin
        bind    api-int:443                                     # ssl crt  /etc/haproxy/certs/api-int.osc01.nix.mds.xyz-haproxy.pem no-sslv3
        default_backend osback

backend osback
        mode tcp
        balance roundrobin

Once that was modified, the system reconfigured and the OpenShift console became available.  Verifying again:

# oc get clusteroperators                                                                                                                                    ? ? master 
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
console                                    4.7.9     True        False         False      25s

Confirm if the cluster installation is complete:

./openshift-install –dir=/root/openshift/install/ wait-for bootstrap-complete –log-level=info
INFO Waiting up to 20m0s for the Kubernetes API at https://api.osc01.nix.mds.xyz:6443…
INFO API v1.20.0+7d0a2b2 up
INFO Waiting up to 30m0s for bootstrapping to complete…
INFO It is now safe to remove the bootstrap resources
INFO Time elapsed: 0s

Confirm cluster login works:

# export KUBECONFIG=/root/openshift/install/auth/kubeconfig
# oc whoami
system:admin

Get the password for the UI login:

# vi /root/openshift/install/.openshift_install_state.json   
"*password.KubeadminPassword": {
        "Password": "<SECRET PASS>",
        "PasswordHash": "JDJhUDEwJElxdb9BRnZ1TzxhWVp6VmlHenB1Qk9mOUhlnkF2Sk1NWEZsUW6OdGRTZHd5UeNRdlJuRml5",
        "File": {
            "Filename": "auth/kubeadmin-password",
            "Data": "Mkp4OXOgNFdTbUQtW5R1SkztR2cFYNI="
        }
    },

Verify the console login:

console-openshift-console.apps.osc01.nix.mds.xyz

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-OpenShift-UI-Console.PNG?ssl=1

Let's deploy a sample application, Hashicorp Vault:

# helm repo add hashicorp https://helm.releases.hashicorp.com
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/openshift/install/auth/kubeconfig
"hashicorp" has been added to your repositories
# ls -altri /root/openshift/install/auth/kubeconfig
201328533 -rw-r—–. 1 root root 18261 May 10 00:59 /root/openshift/install/auth/kubeconfig
# ls -altri /root/openshift/install/auth
total 28
201328537 -rw-r—–. 1 root root    23 May  8 23:57 kubeadmin-password
201328531 drwxr-x—. 2 root root    48 May  8 23:57 .
201328533 -rw-r—–. 1 root root 18261 May 10 00:59 kubeconfig
134369806 drwxr-xr-x. 3 root root  4096 May 10 01:06 ..
# chmod 600 /root/openshift/install/auth/kubeconfig
# helm repo add hashicorp https://helm.releases.hashicorp.com 
"hashicorp" already exists with the same configuration, skipping
# helm install vault hashicorp/vault
NAME: vault
LAST DEPLOYED: Mon May 10 01:18:27 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing HashiCorp Vault!

Now that you have deployed Vault, you should look over the docs on using
Vault with Kubernetes available here:

ht tps://www.vau ltproject.io/do cs/


Your release is named vault. To learn more about the release, try:

  $ helm status vault
  $ helm get manifest vault
# helm status vault
NAME: vault
LAST DEPLOYED: Mon May 10 01:18:27 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing HashiCorp Vault!

Now that you have deployed Vault, you should look over the docs on using
Vault with Kubernetes available here:

ht tps://www.vau ltproje ct.io/d ocs/


Your release is named vault. To learn more about the release, try:

  $ helm status vault
  $ helm get manifest vault
# helm get manifest vault 

.

.

.

Fix the storage issues causing OpenShift not to deploy the Hashicorp Vault:

# kubectl describe pod standalone-vault-0 
….
Events:
  Type     Reason            Age    From               Message
  —-     ——            —-   —-               ——-
  Warning  FailedScheduling  3m41s  default-scheduler  0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling  3m41s  default-scheduler  0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
# kubectl get pod standalone-vault-0 
NAME                 READY   STATUS    RESTARTS   AGE
standalone-vault-0   0/1     Pending   0          5m6s

Check in the UI:

"Failed to provision volume with StorageClass "thin": ServerFaultCode: Cannot complete login due to an incorrect user name or password."

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-OpenShift-Storage-Error.PNG?ssl=1
Changing the vSphere Client password in OpenShift to resolve the above issue:

https://access.redhat.com/solutions/4618011

Detailed steps in our case:

# echo -n "openshift@mds.xyz" | base64 -w0
# oc get secret vsphere-creds -o yaml -n kube-system > creds_backup.yaml
# oc get cm cloud-provider-config -o yaml -n openshift-config > cloud.yaml
# cp creds_backup.yaml creds.yaml
# vi creds.yaml
# oc replace -f creds.yaml

secret/vsphere-creds replaced
# grep -Ei "vcsa01.nix.mds.xyz" creds.yaml
  vcsa01.nix.mds.xyz.password: <BASE64>
  vcsa01.nix.mds.xyz.username: <BASE64>
        f:vcsa01.nix.mds.xyz.password: {}
        f:vcsa01.nix.mds.xyz.username: {}
# oc patch kubecontrollermanager cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date –rfc-3339=ns )"'"}}' –type=merge
kubecontrollermanager.operator.openshift.io/cluster patched
#

Confirm volume is provisioned:

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-OpenShift-Storage-Successfull-Provisioning-After-Pass-Change.PNG?ssl=1

Confirm Hashicorp Vault is now provisioning:

# kubectl get pod standalone-vault-0
NAME                 READY   STATUS    RESTARTS   AGE
standalone-vault-0   0/1     Running   0          37m

# helm list
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
standalone      default         1               2021-05-10 01:38:25.127295916 -0400 EDT deployed        vault-0.8.1-ocp 1.5.4

Complete Hashicorp Vault configuration:


# kubectl get pod standalone-vault-0
NAME                 READY   STATUS    RESTARTS   AGE
standalone-vault-0   0/1     Running   0          15s
# kubectl get pod standalone-vault-0
NAME                 READY   STATUS    RESTARTS   AGE
standalone-vault-0   0/1     Running   0          17s
# POD=$(oc get pods -lapp.kubernetes.io/name=vault –no-headers -o custom-columns=NAME:.metadata.name) 
# oc rsh $POD

/ # vault operator init –tls-skip-verify -key-shares=1 -key-threshold=1
Unseal Key 1: <SECRET KEY>

Initial Root Token: <ROOT TOKEN>

Vault initialized with 1 key shares and a key threshold of 1. Please securely
distribute the key shares printed above. When the Vault is re-sealed,
restarted, or stopped, you must supply at least 1 of these keys to unseal it
before it can start servicing requests.

Vault does not store the generated master key. Without at least 1 key to
reconstruct the master key, Vault will remain permanently sealed!

It is possible to generate new unseal keys, provided you have a quorum of
existing unseal keys shares. See "vault operator rekey" for more information.
/ # ls -altri /vault/data/
total 32
     11 drwx——    2 root     root         16384 May 10 06:12 lost+found
645923864 drwxr-xr-x    1 vault    vault           18 May 10 06:52 ..
 131073 drwx——    4 vault    vault         4096 May 10 06:53 sys
 524289 drwx——    3 vault    vault         4096 May 10 06:53 logical
 393217 drwxr-xr-x    5 vault    vault         4096 May 10 06:53 core
      2 drwxr-xr-x    6 vault    vault         4096 May 10 06:53 .
/ # export KEYS=<SECRET KEY>
/ # export ROOT_TOKEN=<ROOT TOKEN>
/ # echo $KEYS
<SECRET KEY>
/ # echo $ROOT_TOKEN
<ROOT TOKEN>
/ # export VAULT_TOKEN=$ROOT_TOKEN
/ # vault operator unseal –tls-skip-verify $KEYS
Key             Value
—             —–
Seal Type       shamir
Initialized     true
Sealed          false
Total Shares    1
Threshold       1
Version         1.5.4
Cluster Name    vault-cluster-45ab6d46
Cluster ID      96282353-4975-fd66-438b-4ce65f3f7146
HA Enabled      false
/ #
/ #
/ #

Access the Hashicorp Application:

https://i0.wp.com/www.microdevsys.com/WordPressImages/KubernetesAndOpenShift-Hashicorp-Vault-Successful-Verification.PNG?ssl=1

Enjoy the new cluster!

 

Mailing List Support and Troubleshooting

This section deals with some troubleshooting en route to creating a fully functional OpenShift + Kubernetes Cluster. One helpful resource was the mailing lists available for OpenShift:

Re: OpenShift and "export IPCFG="ip=<ip>::<gateway>:<netmask>:<hostname>:<iface>:none nameserver=srv1 [nameserver=srv2 [nameserver=srv3 […]]]""

Suggestions worked.  Thanks once more.

For reference, here's what I did, in case it helps others as well.

1) Add a Serial Port to the VM under Virtual Hardware.  Type in the name of the output file where to save the logs.

2) Download the log files from the datastore.  Review and fix any errors.  Example below:

[   11.156393] systemd[1]: Startup finished in 6.800s (kernel) + 0 (initrd) + 4.352s (userspace) = 11.153s.
——
Ignition has failed. Please ensure your config is valid. Note that only
Ignition spec v3.0.0+ configs are accepted.

A CLI validation tool to check this called ignition-validate can be
downloaded from GitHub:
    https://github.com/coreos/ignition/releases
——

Displaying logs from failed units: ignition-fetch-offline.service
— Logs begin at Sun 2021-03-21 03:07:51 UTC, end at Sun 2021-03-21 03:07:54 UTC. —
Mar 21 03:07:54 ignition[749]: no config URL provided
Mar 21 03:07:54 ignition[749]: reading system config file "/usr/lib/ignition/user.ign"
Mar 21 03:07:54 ignition[749]: no config at "/usr/lib/ignition/user.ign"
Mar 21 03:07:54 ignition[749]: config successfully fetched
Mar 21 03:07:54 ignition[749]: parsing config with SHA512: b71f59139d6c3101031fd0cee073e0503f233c47129db8597462687a608ae0a4b594bf9c170ce55dbd289d4be2638f68e4d39c9b2f50c81f956d5bca24955959
Mar 21 03:07:54 systemd[1]: ignition-fetch-offline.service: Triggering OnFailure= dependencies.
Mar 21 03:07:54 ignition[749]: error at line 7 col 5: invalid character ‘]' after object key:value pair
Mar 21 03:07:54 ignition[749]: failed to fetch config: config is not valid
Mar 21 03:07:54 ignition[749]: failed to acquire config: config is not valid
Mar 21 03:07:54 ignition[749]: Ignition failed: config is not valid
Press Enter for emergency shell or wait 5 minutes for reboot.                
Press Enter for emergency shell or wait 4 minutes 45 seconds for reboot.     

Once fixed and booted, fix any key issues:

# ssh core@192.168.0.105
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:+PaPjXcO/gOaen9+fHfI1q7s7XQgaczHXUWm6Gtf56E.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ED25519 key in /var/lib/sss/pubconf/known_hosts:18
ECDSA host key for 192.168.0.105 has changed and you have requested strict checking.
Host key verification failed.

# ssh-keyscan -t ecdsa 192.168.0.105 >> ~/.ssh/known_hosts

And login using the previously generated SSH key:

# ssh -i ../../.ssh/id_rsa-os01  core@192.168.0.105
Red Hat Enterprise Linux CoreOS 47.83.202102090044-0
  Part of OpenShift 4.7, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.7/architecture/architecture-rhcos.html


This is the bootstrap node; it will be destroyed when the master is fully up.

The primary services are release-image.service followed by bootkube.service. To watch their status, run e.g.

  journalctl -b -f -u release-image.service -u bootkube.service
[core@bootstrap01 ~]$

 

TUVM!

 

Installation Path

HFIOS!

If this error is seen when running something simple as os whoami :

Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get users.user.openshift.io ~)

It is likely that the bootstrap server was booted last or was not the first server to be started up.  The bootstrap server needs to be started first and allowed to fully configure itself.  The bootstrap server is meant to connect to the rest of the nodes to configure them.  Without those nodes, it doesn't do much.  So you get the above because the cluster is not configured.

You can test that the bootstrap server was started and configured itself correctly when you see the following output:

# nc -v rhbs01.osc01.unix.lab.com 6443
Ncat: Version 6.40 ( http://nmap.org/ncat )
Ncat: Connected to 10.0.0.105:6443.

And the following message is visible:

./openshift-install –dir=/root/openshift/install wait-for bootstrap-complete –log-level=info
INFO Waiting up to 20m0s for the Kubernetes API at https://api.osc01.unix.lab.com:6443…
INFO API v1.20.0+5fbfd19 up
INFO Waiting up to 30m0s for bootstrapping to complete…

Once all the machines are bootstrapped, you should see the following message:

INFO It is now safe to remove the bootstrap resources

The above was ultimately due to expired installation certificates.  The master nodes need to be built out before the installation certificate fully expires, which is typically 24 hours. 

References

REF: https://docs.openshift.com/container-platform/4.7/installing/installing_vsphere/installing-vsphere.html#installing-vsphere  

REF: https://www.youtube.com/watch?v=6TvyHBdHhes

REF: https://github.com/openshift/machine-config-operator/blob/master/pkg/server/bootstrap_server.go

REF: https://github.com/openshift/machine-config-operator/issues/2562

Thanks,

Leave a Reply

You must be logged in to post a comment.


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License