OpenShift w/ Kubernetes Setup: Installing using the UPI Method

Building an OpenShift Kubernetes Cluster. Method used here will be the UPI installation method. Start off by loading the official page from RedHat:

Before you begin, ensure the following files are downloaded off the RedHat OpenShift pages (see links in the above document):

/root/openshift # ls -altri
total 439680
201572861 -rw-r–r–. 1 root root 706 Apr 25 04:15 README.md
201572704 -rwxr-xr-x. 1 root root 360710144 Apr 25 04:15 openshift-install
201572859 -rw-rw-r–. 1 tom@mds.xyz tom@mds.xyz 2775 May 8 22:53 pull-secret.txt
201572858 -rw-rw-r–. 1 tom@mds.xyz tom@mds.xyz 89491042 May 8 22:55 openshift-install-linux.tar.gz
201572850 drwxr-xr-x. 3 root root 4096 May 8 23:58 .
201326721 dr-xr-x—. 12 root root 4096 May 9 08:43 ..

Extract the .tar.gz using:

tar -zxf openshift-install-linux.tar.gz

Build and use the following staging machine. This staging machine can be any basic RHEL 7 node. It is where OS and Kubernetes commands will be executed from:

oss01.unix.lab.com

Following are the IP's and hosts for the cluster:

rhbs01.osc01.unix.lab.com 10.0.0.5

rhcpm01.osc01.unix.lab.com 10.0.0.6

rhcpm02.osc01.unix.lab.com 10.0.0.7

rhcpm03.osc01.unix.lab.com 10.0.0.8

hk01.osc01.unix.lab.com 192.168.0.196

hk02.osc01.unix.lab.com 192.168.0.232

rhwn01.osc01.unix.lab.com 10.0.0.9

rhwn02.osc01.unix.lab.com 10.0.0.10

rhwn03.osc01.unix.lab.com 10.0.0.11

VIP's:

api.osc01.unix.lab.com 192.168.0.70

api-int.osc01.unix.lab.com 192.168.0.70

HAProxy/ Keepalived

The two hosts above, hk01 and hk02 are the haproxy and keepalived servers for the installation. More on that and load balancing below.

Installation Instructions

To create the above, FreeIPA was used to create the subdomain. Below are the images highlighting how these were created. If not FreeIPA, a manual DNS configuration will be required similar to the one on the OpenShift page above.

Create the Zone
Add the hosts to the new zone, including the PTR (reverse) entries

(Optional) Ansible Code to speed up host deployment. ( Not necessary since rhcos machines are being used. ) Can be added if central management is required.

/ansible # tail -n 30 infra

[os-all:children]
os-bstrap
os-cpm
os-hk
os-wn

[os-bstrap]
rhbs01.osc01.unix.lab.com

[os-cpm]
rhcpm01.osc01.unix.lab.com
rhcpm02.osc01.unix.lab.com
rhcpm03.osc01.unix.lab.com

[os-hk]
hk01.osc01.unix.lab.com
hk02.osc01.unix.lab.com

[os-wn]
rhwn01.osc01.unix.lab.com
rhwn02.osc01.unix.lab.com
rhwn03.osc01.unix.lab.com

(Not Required) IPA

ipa-client-install –uninstall;

ipa-client-install –force-join -p autojoin -w "NotMyPass" –fixed-primary –server=idmipa01.unix.lab.com –server=idmipa02.unix.lab.com –domain=unix.lab.com –realm=unix.lab.com -U –hostname=$(hostname);

ipa-client-automount –location=UserHomeDir01 -U;

authconfig –enablesssd –enablesssdauth –enablemkhomedir –updateall –update;

(Not Required) krb5.conf

# cat /etc/krb5.conf
#File modified by ipa-client-install

includedir /etc/krb5.conf.d/
includedir /var/lib/sss/pubconf/krb5.include.d/

[libdefaults]
default_realm = OSC01.unix.lab.com
dns_lookup_realm = false
dns_lookup_kdc = true
rdns = false
dns_canonicalize_hostname = true
ticket_lifetime = 24h
forwardable = true
udp_preference_limit = 0
default_ccache_name = KEYRING:persistent:%{uid}

[realms]

OSC01.unix.lab.com = {
kdc = idmipa01.unix.lab.com:88
master_kdc = idmipa01.unix.lab.com:88
admin_server = idmipa01.unix.lab.com:749
kpasswd_server = idmipa01.unix.lab.com:464
kdc = idmipa02.unix.lab.com:88
master_kdc = idmipa02.unix.lab.com:88
admin_server = idmipa02.unix.lab.com:749
kpasswd_server = idmipa02.unix.lab.com:464
default_domain = osc01.unix.lab.com
pkinit_anchors = FILE:/var/lib/ipa-client/pki/kdc-ca-bundle.pem
pkinit_pool = FILE:/var/lib/ipa-client/pki/ca-bundle.pem

}

unix.lab.com = {
kdc = idmipa01.unix.lab.com:88
master_kdc = idmipa01.unix.lab.com:88
admin_server = idmipa01.unix.lab.com:749
kpasswd_server = idmipa01.unix.lab.com:464
kdc = idmipa02.unix.lab.com:88
master_kdc = idmipa02.unix.lab.com:88
admin_server = idmipa02.unix.lab.com:749
kpasswd_server = idmipa02.unix.lab.com:464
default_domain = unix.lab.com
pkinit_anchors = FILE:/var/lib/ipa-client/pki/kdc-ca-bundle.pem
pkinit_pool = FILE:/var/lib/ipa-client/pki/ca-bundle.pem

}

MDS.XYZ = {
kdc = ad.lab.com
default_domain = mds.xyz
}

[domain_realm]
.unix.lab.com = unix.lab.com
unix.lab.com = unix.lab.com
bs01.osc01.unix.lab.com = unix.lab.com
.lab.com = MDS.XYZ
mds.xyz = MDS.XYZ
.osc01.unix.lab.com = OSC01.unix.lab.com
osc01.unix.lab.com = OSC01.unix.lab.com

( Not Required ) sssd.conf

[root@bs01 home]#
[root@bs01 home]# cat /etc/sssd/sssd.conf
[domain/unix.lab.com]

cache_credentials = True
krb5_store_password_if_offline = True
ipa_domain = unix.lab.com
id_provider = ipa
auth_provider = ipa
access_provider = ipa
ldap_tls_cacert = /etc/ipa/ca.crt
ipa_hostname = bs01.osc01.unix.lab.com
chpass_provider = ipa
ipa_server = idmipa01.unix.lab.com, idmipa02.unix.lab.com
dns_discovery_domain = unix.lab.com
autofs_provider = ipa
ipa_automount_location = UserHomeDir01

dyndns_update = True
dyndns_update_ptr = True
ldap_schema = ad
ldap_id_mapping = True

override_homedir = /n/%d/%u
# fallback_homedir = /n/%d/%u
# ldap_user_home_directory = unixHomeDirectory

[nss]
homedir_substring = /home

[sssd]
services = nss, sudo, pam, autofs, ssh

domains = unix.lab.com

[pam]

[sudo]

[autofs]

[ssh]

[pac]

[ifp]

[secrets]

[session_recording]

Generate a key pair on the staging node where running of oc, kubectl and installation will take place.

oss01.unix.lab.com

ssh-keygen -t ed25519 -N '' -f /root/.ssh/id_rsa-osc01

eval "$(ssh-agent -s)"

ssh-add /root/.ssh/id_rsa-osc01

Create a configuration file:

install-config.yml

# cat install-config.yaml
apiVersion: v1
baseDomain: unix.lab.com
compute:
– hyperthreading: Enabled
name: worker
replicas: 0
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name: osc01
platform:
vsphere:
vcenter: vcsa01.unix.lab.com
username: openshift
password: S3cretP@ssw0rdR#ally
datacenter: mds.xyz
defaultDatastore: mdsesxip05-d01
folder: "/mds.xyz/vm/OpenShift"
fips: false
pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K29jbV9hY2Nlc3NfMjI2ZjlkZDFiODg4NDdkOGI2NWFmOTNiNDg1ZDk5Mzg6MVRXUUwzQ1FWQkhUSlpaTURTQ0tWVlgyU0U4UU5VRDRDRUtXVlIwN01MRjczV041NktIR0Q0M0JaVzNBMkdHOA==","email":"somedude AT microdevsys DOT com"},"quay.io":{"auth":"b3BlbnNoaWZ0LXJlbGVhc2UtZGV2K29jbV9hY2Nlc3NfMjI2ZjlkZDFiODg4NDdkOGI2NWFmOTNiNDg1ZDk5Mzg6MVRXUUwzQ1FWQkhUSlpaTURTQ0tWVlgyU0U4UU5VRDRDRUtXVlIwN01MRjczV041NktIR0Q0M0JaVzNBMkdHOA==","email":"somedude AT microdevsys DOT com"},"registry.connect.redhat.com":{"auth":"fHVoYy1wb29sLWYyMGU3ODExLTY1NjctNDBlZC05MWExLTUwYjgxZGVhNDY4ZDpleUpoYkdjaU9pSlNVelV4TWlKOS5leUp6ZFdJaU9pSTNZV1UxWldFeU1EaGxORFEwTlRGa09HUTBaR1F5TTJJNE5HVmpZMlV6WkNKOS5SQXMta2g1bFZUTDdQT0pvNEZ3UUZKN3o4c1NCOVQ3WEhQV2VvMkoyakU2cnVMS3VEZHlMWWlGcEhoOVJSZVFyTzVaa0xodGt4aVRqRDBJV3pMYzdzR3dfMThfc0thejZYaTNrM3pmZ2RuWS1YbnlPbHU1RGhGdnEyUW5GcFRIWFBsUFB2SG85OVNndG00dnkwVk5OSXE3SjJ1TUVPNE84c2wzdXJ1X0JNSkNUX1FTeFIyUVViTTVFaFViWUM1blF6LVV2VEo3VlpnR2hqZDVvQ3Z2Y3FvWnc3bXJkUlFvQTNuUUl2MGRrb3hXN2lVZXh1cVl2RDZFdFRyVFFoUnNrRkVTVV9pZURGMDNhSWlsYnRsZFRqNTBGQXE1bzllbW1HZTdITHFyVGY2d2FJS3UxUnpHbkdCOUN1ZWpZaGowSU9GUmVsNFdES2ItMGJrbVZTdjRtXzJPSllkcEJYc0lVaFlKTHFpZFdxLXFaVlBMRzQ5Q1JaRTUwWnZGcDl2ckZrZU5yZnJKdzdtOUVTTUNIbms5UW9fQ1hXZlRmTlBvWWhNdDhmZUJBQi1GNlp6Mnl4Ni0wYzJOMjdob005ZlVuREdxWXpTbk1OZFRvY05vNkl1SWExZ0NmNnlaenNOWHdMLURlZVBOUnhzMHAtUld3UkZGME5xd0VsUEhycEhVWHg0MnVHSGh3Y0dHYUJsczk1eDBYeXFEM1JoYzdySjdaWUNkVko1OGhCNURoWDc0QjhrWjNxOTVfdmtPX1Jtd1Nvcy1sZ09ITTNLWFVlMUNvSWUzVzlJT2l4STNFLUVWd3hFTkNyRFFLck04QlB4NjhUVHlxN2JTeUxFUjZ5OGFxZERjT0ZSaE4xM1FDT1I3bmRGaUVyUGRkRWxaRmh4Tm1NU2NuYnhPMkdoRQ==","email":"somedude AT microdevsys DOT com"},"registry.redhat.io":{"auth":"fHVoYy1wb29sLWYzMGU3ODExLTY1NjctNDBlZC05MWExLTUwYjgxZGVhNDY4ZDpleUpoYkdjaU9pSlNVelV4TWlKOS5leUp6ZFdJaU9pSTNZV1UxWldFeU1EaGxORFEwTlRGa09HUTBaR1F5TTJJNE5HVmpZMlV6WkNKOS5SQXMta2g1bFZUTDdQT0pvNEZ3UUZKN3o4c1NCOVQ3WEhQV2VvMkoyakU2cnVMS3VEZHlMWWlGcEhoOVJSZVFyTzVaa0xodGt4aVRqRDBJV3pMYzdzR3dfMThfc0thejZYaTNrM3pmZ2RuWS1YbnlPbHU1RGhGdnEyUW5GcFRIWFBsUFB2SG85OVNndG00dnkwVk5OSXE3SjJ1TUVPNE84c2wzdXJ1X0JNSkNUX1FTeFIyUVViTTVFaFViWUM1clF6LVV2VEo3VlpnR2hqZDVvQ3Z2Y3FvWnc3bXJkUlFvQTNuUUl2MGRrb3hXN2lVZXh1cVl2RDZFdFRyVFFoUnNrRkVTVV9pZURGMDNhSWlsYnRsZFRqNTBGQXE1bzllbW1HZTdITHFyVGY2d2FJS3UxUnpHbkeCOUN1ZWpZaGowSU9GUmVsNFdES2ItMGJrbVZTdjRtXzJPSllkcEJYc0lVaFlKTHFpZFdxLXFaVlBMRzQ5Q1JaRTUwWnZGcDl2ckZrZU5yZnJKdzdtOUVTTUNIbms5UW9fQ1hXZlRmTlBvWWhNdDhmZUJBQi1GNlp6Mnl4Ni0wYzJOMjdob005ZlVuREdxWXpTbk1OZFRvY05vNkl1SWExZ0NmNnlaenNOWHdMLURlZVBOUnhzMHAtUld3UkZGME5xd0VrUEhycEhVWHg0MnVHSGh3Y0dHYUJsczk1eDBYeXFEM1JoYzdySjdaWUNkVko1OGhCNURoWDc0QjhrWjNxOTVfdmtPX1Jtd1Nvcy1sZ09ITTNLWFVlMUNvSWUzVzlJT2l4STNFLUVWd3hFTkNyRFFLck04QlB4NjhUVHlxN2JTeUxFUjZ5OGFxZERjT0ZSaE4xM1FDT1I3bmRGaUVyUGRkRWxaRmh4Tm1NU2NuYnhPMkdoRQ==","email":"somedude AT microdevsys DOT com"}}}'
sshKey: 'ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHr4AuezxQ/azAAfHLa9+HCqGZthewYf2yNNPQ6uwDhd root@awx01.unix.lab.com'

Execute the manifest creation (Assuming /root/openshift/install/ will be the location of your installation configuration:

./openshift-install create manifests –dir=/root/openshift/install/

Ensure the following file has mastersSchedulable set to false :

cat manifests/cluster-scheduler-02-config.yml

apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
creationTimestamp: null
name: cluster
spec:
mastersSchedulable: false
policy:
name: ""
status: {}

And remove the following yaml files (explanation in doc above)

rm -f openshift/99_openshift-cluster-api_master-machines-*.yaml openshift/99_openshift-cluster-api_worker-machineset-*.yaml

Next, generate the configuration files.

./openshift-install create ignition-configs –dir=/root/openshift/install/

Convert the .ign files to base64:

# history | grep base64
base64 -w0 master.ign > master.64
base64 -w0 worker.ign > worker.64
base64 -w0 bootstrap.ign > bootstrap.64
base64 -w0 https-bootstrap.ign > https-bootstrap.64

# cat https-bootstrap.ign
{
"ignition": {
"config": {
"merge": [
{
"source": "http://192.168.0.142/bootstrap.ign"
}
]
},
"version":"3.1.0"
}
}
#

Note how the https-bootstrap.ign refers to an HTTP server. Because the bootstrap.ign is too big, there is a need to host it on a separate HTTPS server so it's pulled down by the VMware configuration on startup. The file hosted on an HTTP sever:

# ls -altri /var/www/html/bootstrap.ign
571421 -rw-r–r–. 1 root root 291591 May 9 00:04 /var/www/html/bootstrap.ign

Configure the HAproxy and Keepalived Nodes. Please pay careful attention to the commented lines below before the rhbs01 nodes. The bootstrap should be uncommented when creating the master nodes. However, these entries should be commented out once the Master / Control Plane nodes are created. The Workers are created from the master nodes. The configuration files for both nodes. In addition, the HAProxy files are identical. The Keepalived files are NOT identical;

hk01

# cat /etc/haproxy/haproxy.cfg
global
log 127.0.0.1:514 local0 debug
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy

stats socket /etc/haproxy/stats
tune.ssl.default-dh-param 2048
daemon
debug
maxconn 4096

defaults
mode tcp
log global
option dontlognull
option redispatch
retries 3
timeout queue 1m
timeout connect 10s
timeout client 3m
timeout server 3m
timeout http-keep-alive 10s
timeout check 10s
maxconn 30000

listen cm
bind api-int:80
mode tcp
redirect scheme https if !{ ssl_fc }

frontend osin
bind api-int:443 # ssl crt /etc/haproxy/certs/api-int.osc01.unix.lab.com-haproxy.pem no-sslv3
default_backend osback

backend osback
mode tcp
balance roundrobin

server rhcpm01.osc01.unix.lab.com rhcpm01.osc01.unix.lab.com:443 check
server rhcpm02.osc01.unix.lab.com rhcpm02.osc01.unix.lab.com:443 check
server rhcpm03.osc01.unix.lab.com rhcpm03.osc01.unix.lab.com:443 check
server rhwn01.osc01.unix.lab.com rhwn01.osc01.unix.lab.com:443 check
server rhwn02.osc01.unix.lab.com rhwn02.osc01.unix.lab.com:443 check
server rhwn03.osc01.unix.lab.com rhwn03.osc01.unix.lab.com:443 check

frontend bscpm6443in
log 127.0.0.1:514 local0 debug
bind api-int:6443
default_backend bscpm6443back

backend bscpm6443back
log 127.0.0.1:514 local0 debug
mode tcp
balance source

# server rhbs01.osc01.unix.lab.com rhbs01.osc01.unix.lab.com:6443 check
server rhcpm01.osc01.unix.lab.com rhcpm01.osc01.unix.lab.com:6443 check
server rhcpm02.osc01.unix.lab.com rhcpm02.osc01.unix.lab.com:6443 check
server rhcpm03.osc01.unix.lab.com rhcpm03.osc01.unix.lab.com:6443 check

frontend bscpm22623in
log 127.0.0.1:514 local0 debug
bind api-int:22623
default_backend bscpm22623back

backend bscpm22623back
log 127.0.0.1:514 local0 debug
mode tcp
balance source

# server rhbs01.osc01.unix.lab.com rhbs01.osc01.unix.lab.com:22623 check
server rhcpm01.osc01.unix.lab.com rhcpm01.osc01.unix.lab.com:22623 check
server rhcpm02.osc01.unix.lab.com rhcpm02.osc01.unix.lab.com:22623 check
server rhcpm03.osc01.unix.lab.com rhcpm03.osc01.unix.lab.com:22623 check

listen stats
bind :9000
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /haproxy-stats
stats auth admin:n0tmypass

hk02

# cat /etc/haproxy/haproxy.cfg
global
log 127.0.0.1:514 local0 debug
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy

stats socket /etc/haproxy/stats
tune.ssl.default-dh-param 2048
daemon
debug
maxconn 4096

listen cm
bind api-int:80
mode tcp
redirect scheme https if !{ ssl_fc }

frontend osin
bind api-int:443 # ssl crt /etc/haproxy/certs/api-int.osc01.unix.lab.com-haproxy.pem no-sslv3
default_backend osback

backend osback
mode tcp
balance roundrobin

frontend bscpm6443in
log 127.0.0.1:514 local0 debug
bind api-int:6443
default_backend bscpm6443back

backend bscpm6443back
log 127.0.0.1:514 local0 debug
mode tcp
balance source

frontend bscpm22623in
log 127.0.0.1:514 local0 debug
bind api-int:22623
default_backend bscpm22623back

backend bscpm22623back
log 127.0.0.1:514 local0 debug
mode tcp
balance source

listen stats
bind :9000
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /haproxy-stats
stats auth admin:n0tmypass

Likewise, keepalived configuration for both nodes:

hk01 ( master )

# cat /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy {
script "killall -0 haproxy" # check the haproxy process
interval 2 # every 2 seconds
weight 2 # add 2 points if OK
}

vrrp_track_file fail-70 {
file /etc/keepalived/vrrp-70
}

vrrp_instance ins-70 {
interface eth0 # interface to monitor
state MASTER # MASTER on haproxy1, BACKUP on haproxy2
virtual_router_id 70 # Set to last digit of cluster IP.
priority 110 # 101 on haproxy1, 100 on haproxy2

authentication {
auth_type PASS
auth_pass password70
}

virtual_ipaddress {
delay_loop 12
lb_algo wrr
lb_kind DR
protocol TCP
192.168.0.70 # virtual ip address
}

track_file {
fail-70 weight 0
}

track_script {
chk_haproxy
}
}

hk02 ( slave )

# cat /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy {
script "killall -0 haproxy" # check the haproxy process
interval 2 # every 2 seconds
weight 2 # add 2 points if OK
}

vrrp_track_file fail-70 {
file /etc/keepalived/vrrp-70
}

vrrp_instance ins-70 {
interface eth0 # interface to monitor
state BACKUP # MASTER on haproxy1, BACKUP on haproxy2
virtual_router_id 70 # Set to last digit of cluster IP.
priority 100 # 101 on haproxy1, 100 on haproxy2

authentication {
auth_type PASS
auth_pass password70
}

virtual_ipaddress {
delay_loop 12
lb_algo wrr
lb_kind DR
protocol TCP
192.168.0.70 # virtual ip address
}

track_file {
fail-70 weight 0
}

track_script {
chk_haproxy
}
}

VMware Configuration

Now that the dependent services infrastructure is configured, it's time to install the OVA in vSphere Client. Before this is done, it's worthwhile to mention the high-level plan:

The entire build must occur within 24 hours. Otherwise, the installations will fail if not done by the expiry time. Please see below for more info on verifying this.
Deploy the RedHat OpenShift Core OS OVA
Adjust the parameters of the Core OS VM instance such as memory and CPU.
Set the Advanced Configuration to add the following parameters: guestinfo.ignition.config.data.encoding, disk.EnableUUID, guestinfo.ignition.config.data, guestinfo.afterburn.initrd.network-kargs
Clone the Core OS instance to buildout the BootStrap node. Verify it comes up.
Clone and build out the Master / Control Plane nodes. These will boot off and get the configuration of the BootStrap node, assuming everything went well with the BootStrap node creation.
Comment out the bootstrap node from the HAProxy configuration. The worker nodes will connect to and build out from the master nodes.
Accept certificates that will allow the worker nodes to complete installing.
Verify!

Configuration Table

guestinfo.ignition.config.data.encoding = base64
disk.EnableUUID = TRUE
guestinfo.ignition.config.data = <ONE OF THE BASE64 FILE CONTENTS>

guestinfo.afterburn.initrd.network-kargs = <IP SETTINGS FROM BELOW TABLE>

Host	IP Settings
rhbs01.osc01.unix.lab.com	ip=10.0.0.105::10.0.0.1:255.255.255.0:rhbs01.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102

rhcpm01.osc01.unix.lab.com	ip=10.0.0.106::10.0.0.1:255.255.255.0:rhcpm01.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102
rhcpm02.osc01.unix.lab.com	ip=10.0.0.107::10.0.0.1:255.255.255.0:rhcpm02.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102
rhcpm03.osc01.unix.lab.com	ip=10.0.0.108::10.0.0.1:255.255.255.0:rhcpm03.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102

rhwn01.osc01.unix.lab.com	ip=10.0.0.109::10.0.0.1:255.255.255.0:rhwn01.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102
rhwn02.osc01.unix.lab.com	ip=10.0.0.110::10.0.0.1:255.255.255.0:rhwn02.osc01.unix.lab.com::none nameserver=192.168.0.446 nameserver=10.100.0.101 nameserver=10.100.0.102
rhwn03.osc01.unix.lab.com	ip=10.0.0.111::10.0.0.1:255.255.255.0:rhwn03.osc01.unix.lab.com::none nameserver=10.100.0.100 nameserver=10.100.0.101 nameserver=10.100.0.102

Install the RedHat OpenShift Core OS OVA

Using the Core OS you just deployed, adjust the properties. Note the minimum requirements. Need at least that if not more resources for each VM.

Next, set some of the parameters that are common to all the VM's.

Save the image above. Next, clone the Core OS to build out the bootstrap node.

Recall, in the above bootstrap configuration section, https-bootstrap.64 ( ie https-bootstrap.ign ) will be used to pull down the configuration from the HTTP web server.

Adding a serial console log option to the VM can also help in troubleshooting issues. This can be done as follows:

Startup the bootstrap node and monitor its installation. To do so, ssh in and use the below-highlighted command to view the messages:

# ssh -i /root/.ssh/id_rsa-osc01 core@rhbs01.osc01.unix.lab.com
Red Hat Enterprise Linux CoreOS 47.83.202103251640-0
Part of OpenShift 4.7, RHCOS is a Kubernetes native operating system
managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
https://docs.openshift.com/container-platform/4.7/architecture/architecture-rhcos.html

—
This is the bootstrap node; it will be destroyed when the master is fully up.

The primary services are release-image.service followed by bootkube.service. To watch their status, run e.g.

journalctl -b -f -u release-image.service -u bootkube.service
Last login: Sun May 9 12:35:19 2021 from 192.168.0.242
[core@rhbs01 ~]$

Verify that two key ports are open on the bootstrap node before proceeding. ( Additional logging is available under /var/log/containers )

[root@rhbs01 log]# netstat -pnltu|grep -Ei machine-config
tcp6       0      0 :::22623                :::*                    LISTEN      2804/machine-config
tcp6       0      0 :::22624                :::*                    LISTEN      2804/machine-config
[root@rhbs01 log]#

IMPORTANT: Verify the certificate expiration time. This is the time that is allowed for a cluster install.

# echo | openssl s_client -connect rhcpm02.osc01.unix.lab.com:6443 | openssl x509 -noout -text 2>&1 | grep -Ei "Not Before|Not After"
depth=1 OU = openshift, CN = kube-apiserver-service-network-signer
verify error:num=19:self signed certificate in certificate chain
DONE
Not Before: May 9 13:24:20 2021 GMT
Not After : Jun 8 13:24:21 2021 GMT

More details on this can be found here: https://github.com/openshift/installer/issues/1792

Next, boot up the master / control plane nodes, initially one by one to troubleshoot before kicking off any more. When installing the first time, this will allow you time to troubleshoot any issues you may have.

SSH to each machine to monitor the progress. Clone the Core OS image to two more masters. Once complete, edit the HAproxy configuration on the above listed nodes to remove the bootstrap node once all masters are completed. Restart HAproxy on each node after the configuration. Verify the master nodes are all ready:

# oc get nodes
NAME STATUS ROLES AGE VERSION
rhcpm01.osc01.unix.lab.com Ready master 11h v1.20.0+7d0a2b2
rhcpm02.osc01.unix.lab.com Ready master 10h v1.20.0+7d0a2b2
rhcpm03.osc01.unix.lab.com Ready master 10h v1.20.0+7d0a2b2
#

Next, configure and bootup the worker nodes. The exact same sequence applied as for the master nodes with the exception that the workers will boot off and use the configuration off the master nodes behind the HAProxy configuration. For the worker nodes, there is an additional step to check and accept the certificates:

# oc get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-26944 117m kubernetes.io/kubelet-serving system:node:rhwn01.osc01.unix.lab.com Pending
csr-2shv7 148m kubernetes.io/kubelet-serving system:node:rhwn01.osc01.unix.lab.com Pending
csr-4fxhf 9m35s kubernetes.io/kubelet-serving system:node:rhwn01.osc01.unix.lab.com Pending
csr-4w29l 8h kubernetes.io/kubelet-serving system:node:rhwn01.osc01.unix.lab.com Pending

Accept any certificates in the process. Example using a for look over a large number of certificates:

# for cert in $( cat file.txt ); do oc adm certificate approve $cert; done

Alternately, run the following:

# oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs –no-run-if-empty oc adm certificate approve

Once complete, all workers should be ready:

# oc get nodes
NAME STATUS ROLES AGE VERSION
rhcpm01.osc01.unix.lab.com Ready master 13h v1.20.0+7d0a2b2
rhcpm02.osc01.unix.lab.com Ready master 12h v1.20.0+7d0a2b2
rhcpm03.osc01.unix.lab.com Ready master 12h v1.20.0+7d0a2b2
rhwn01.osc01.unix.lab.com Ready worker 10h v1.20.0+7d0a2b2
rhwn02.osc01.unix.lab.com Ready worker 23m v1.20.0+7d0a2b2
rhwn03.osc01.unix.lab.com Ready worker 7m15s v1.20.0+7d0a2b2

Check status of cluster and components:

[root@rhbs01 ~]# bootupctl status
Component EFI
Installed: grub2-efi-x64-1:2.02-90.el8_3.1.x86_64,shim-x64-15-16.el8.x86_64
Update: At latest version
No components are adoptable.
CoreOS aleph image ID: rhcos-47.83.202103251640-0-qemu.x86_64.qcow2
Boot method: BIOS
[root@rhbs01 ~]#

Confirm cluster operators:

# oc get clusteroperators
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication False True True 13h
baremetal 4.7.9 True False False 13h
cloud-credential 4.7.9 True False False 22h
cluster-autoscaler 4.7.9 True False False 13h
config-operator 4.7.9 True False False 13h
console 4.7.9 False True True 11h
csi-snapshot-controller 4.7.9 True False False 13h
dns 4.7.9 True False False 13h
etcd 4.7.9 True False False 13h
image-registry 4.7.9 True False False 13h
ingress 4.7.9 True False True 11h
insights 4.7.9 True False False 13h
kube-apiserver 4.7.9 True False False 13h
kube-controller-manager 4.7.9 True False False 13h
kube-scheduler 4.7.9 True False False 13h
kube-storage-version-migrator 4.7.9 True False False 11h
machine-api 4.7.9 True False False 13h
machine-approver 4.7.9 True False False 13h
machine-config 4.7.9 True False False 13h
marketplace 4.7.9 True False False 13h
monitoring 4.7.9 True False False 11h
network 4.7.9 True False False 13h
node-tuning 4.7.9 True False False 13h
openshift-apiserver 4.7.9 True False False 13h
openshift-controller-manager 4.7.9 True False False 137m
openshift-samples 4.7.9 True False False 13h
operator-lifecycle-manager 4.7.9 True False False 13h
operator-lifecycle-manager-catalog 4.7.9 True False False 13h
operator-lifecycle-manager-packageserver 4.7.9 True False False 13h
service-ca 4.7.9 True False False 13h
storage 4.7.9 True False True 13h

Edit and configure the parameters for each operator above. For example:

# oc edit console.config.openshift.io cluster
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: config.openshift.io/v1
kind: Console
metadata:
annotations:
include.release.openshift.io/self-managed-high-availability: "true"
include.release.openshift.io/single-node-developer: "true"
release.openshift.io/create-only: "true"
creationTimestamp: "2021-05-09T05:13:36Z"
generation: 1
name: cluster
resourceVersion: "108939"
selfLink: /apis/config.openshift.io/v1/consoles/cluster
uid: 3ea99342-706d-4149-91e3-a2107fe75f65
spec: {}
status:
consoleURL: https://console-openshift-console.apps.osc01.nix.mds.xyz

The above will also provide the console link you can use to access the OpenShift UI through:

https://console-openshift-console.apps.osc01.nix.mds.xyz

Howevre, per the above, the console was not yet ready. Turns out there was a misconfiguration in the HAProxy file:

listen cm
bind api-int:80
mode http
redirect scheme https if !{ ssl_fc }

frontend osin
bind api-int:443 # ssl crt /etc/haproxy/certs/api-int.osc01.nix.mds.xyz-haproxy.pem no-sslv3
default_backend osback

backend osback
mode http
balance roundrobin

It should be:

listen cm
bind api-int:80
mode tcp
redirect scheme https if !{ ssl_fc }

frontend osin
bind api-int:443 # ssl crt /etc/haproxy/certs/api-int.osc01.nix.mds.xyz-haproxy.pem no-sslv3
default_backend osback

backend osback
mode tcp
balance roundrobin

Once that was modified, the system reconfigured and the OpenShift console became available. Verifying again:

# oc get clusteroperators ? ? master
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
console 4.7.9 True False False 25s

Confirm if the cluster installation is complete:

./openshift-install –dir=/root/openshift/install/ wait-for bootstrap-complete –log-level=info
INFO Waiting up to 20m0s for the Kubernetes API at https://api.osc01.nix.mds.xyz:6443…
INFO API v1.20.0+7d0a2b2 up
INFO Waiting up to 30m0s for bootstrapping to complete…
INFO It is now safe to remove the bootstrap resources
INFO Time elapsed: 0s

Confirm cluster login works:

# export KUBECONFIG=/root/openshift/install/auth/kubeconfig
# oc whoami
system:admin
#

Get the password for the UI login:

# vi /root/openshift/install/.openshift_install_state.json
"*password.KubeadminPassword": {
"Password": "<SECRET PASS>",
"PasswordHash": "JDJhUDEwJElxdb9BRnZ1TzxhWVp6VmlHenB1Qk9mOUhlnkF2Sk1NWEZsUW6OdGRTZHd5UeNRdlJuRml5",
"File": {
"Filename": "auth/kubeadmin-password",
"Data": "Mkp4OXOgNFdTbUQtW5R1SkztR2cFYNI="
}
},

Verify the console login:

console-openshift-console.apps.osc01.nix.mds.xyz

Let's deploy a sample application, Hashicorp Vault:

# helm repo add hashicorp https://helm.releases.hashicorp.com
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/openshift/install/auth/kubeconfig
"hashicorp" has been added to your repositories
# ls -altri /root/openshift/install/auth/kubeconfig
201328533 -rw-r—–. 1 root root 18261 May 10 00:59 /root/openshift/install/auth/kubeconfig
# ls -altri /root/openshift/install/auth
total 28
201328537 -rw-r—–. 1 root root 23 May 8 23:57 kubeadmin-password
201328531 drwxr-x—. 2 root root 48 May 8 23:57 .
201328533 -rw-r—–. 1 root root 18261 May 10 00:59 kubeconfig
134369806 drwxr-xr-x. 3 root root 4096 May 10 01:06 ..
# chmod 600 /root/openshift/install/auth/kubeconfig
# helm repo add hashicorp https://helm.releases.hashicorp.com
"hashicorp" already exists with the same configuration, skipping
# helm install vault hashicorp/vault
NAME: vault
LAST DEPLOYED: Mon May 10 01:18:27 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing HashiCorp Vault!

Now that you have deployed Vault, you should look over the docs on using
Vault with Kubernetes available here:

ht tps://www.vau ltproject.io/do cs/

Your release is named vault. To learn more about the release, try:

$ helm status vault
$ helm get manifest vault
# helm status vault
NAME: vault
LAST DEPLOYED: Mon May 10 01:18:27 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing HashiCorp Vault!

Now that you have deployed Vault, you should look over the docs on using
Vault with Kubernetes available here:

ht tps://www.vau ltproje ct.io/d ocs/

Your release is named vault. To learn more about the release, try:

$ helm status vault
$ helm get manifest vault
# helm get manifest vault

Fix the storage issues causing OpenShift not to deploy the Hashicorp Vault:

# kubectl describe pod standalone-vault-0
….
Events:
Type Reason Age From Message
—- —— —- —- ——-
Warning FailedScheduling 3m41s default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 3m41s default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
# kubectl get pod standalone-vault-0
NAME READY STATUS RESTARTS AGE
standalone-vault-0 0/1 Pending 0 5m6s

Check in the UI:

"Failed to provision volume with StorageClass "thin": ServerFaultCode: Cannot complete login due to an incorrect user name or password."

Changing the vSphere Client password in OpenShift to resolve the above issue:

https://access.redhat.com/solutions/4618011

Detailed steps in our case:

# echo -n "openshift@mds.xyz" | base64 -w0
# oc get secret vsphere-creds -o yaml -n kube-system > creds_backup.yaml
# oc get cm cloud-provider-config -o yaml -n openshift-config > cloud.yaml
# cp creds_backup.yaml creds.yaml
# vi creds.yaml
# oc replace -f creds.yaml
secret/vsphere-creds replaced
# grep -Ei "vcsa01.nix.mds.xyz" creds.yaml
vcsa01.nix.mds.xyz.password: <BASE64>
vcsa01.nix.mds.xyz.username: <BASE64>
f:vcsa01.nix.mds.xyz.password: {}
f:vcsa01.nix.mds.xyz.username: {}
# oc patch kubecontrollermanager cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date –rfc-3339=ns )"'"}}' –type=merge
kubecontrollermanager.operator.openshift.io/cluster patched
#

Confirm volume is provisioned:

Confirm Hashicorp Vault is now provisioning:

# kubectl get pod standalone-vault-0
NAME READY STATUS RESTARTS AGE
standalone-vault-0 0/1 Running 0 37m

# helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
standalone default 1 2021-05-10 01:38:25.127295916 -0400 EDT deployed vault-0.8.1-ocp 1.5.4

Complete Hashicorp Vault configuration:

# kubectl get pod standalone-vault-0
NAME READY STATUS RESTARTS AGE
standalone-vault-0 0/1 Running 0 15s
# kubectl get pod standalone-vault-0
NAME READY STATUS RESTARTS AGE
standalone-vault-0 0/1 Running 0 17s
# POD=$(oc get pods -lapp.kubernetes.io/name=vault –no-headers -o custom-columns=NAME:.metadata.name)
# oc rsh $POD
/ # vault operator init –tls-skip-verify -key-shares=1 -key-threshold=1
Unseal Key 1: <SECRET KEY>

Initial Root Token: <ROOT TOKEN>

Vault initialized with 1 key shares and a key threshold of 1. Please securely
distribute the key shares printed above. When the Vault is re-sealed,
restarted, or stopped, you must supply at least 1 of these keys to unseal it
before it can start servicing requests.

Vault does not store the generated master key. Without at least 1 key to
reconstruct the master key, Vault will remain permanently sealed!

It is possible to generate new unseal keys, provided you have a quorum of
existing unseal keys shares. See "vault operator rekey" for more information.
/ # ls -altri /vault/data/
total 32
11 drwx—— 2 root root 16384 May 10 06:12 lost+found
645923864 drwxr-xr-x 1 vault vault 18 May 10 06:52 ..
131073 drwx—— 4 vault vault 4096 May 10 06:53 sys
524289 drwx—— 3 vault vault 4096 May 10 06:53 logical
393217 drwxr-xr-x 5 vault vault 4096 May 10 06:53 core
2 drwxr-xr-x 6 vault vault 4096 May 10 06:53 .
/ # export KEYS=<SECRET KEY>
/ # export ROOT_TOKEN=<ROOT TOKEN>
/ # echo $KEYS
<SECRET KEY>
/ # echo $ROOT_TOKEN
<ROOT TOKEN>
/ # export VAULT_TOKEN=$ROOT_TOKEN
/ # vault operator unseal –tls-skip-verify $KEYS
Key Value
— —–
Seal Type shamir
Initialized true
Sealed false
Total Shares 1
Threshold 1
Version 1.5.4
Cluster Name vault-cluster-45ab6d46
Cluster ID 96282353-4975-fd66-438b-4ce65f3f7146
HA Enabled false
/ #
/ #
/ #

Access the Hashicorp Application:

Enjoy the new cluster!

Mailing List Support and Troubleshooting

This section deals with some troubleshooting en route to creating a fully functional OpenShift + Kubernetes Cluster. One helpful resource was the mailing lists available for OpenShift:

Re: OpenShift and "export IPCFG="ip=<ip>::<gateway>:<netmask>:<hostname>:<iface>:none nameserver=srv1 [nameserver=srv2 [nameserver=srv3 […]]]""

Suggestions worked. Thanks once more.

For reference, here's what I did, in case it helps others as well.

1) Add a Serial Port to the VM under Virtual Hardware. Type in the name of the output file where to save the logs.

2) Download the log files from the datastore. Review and fix any errors. Example below:

[ 11.156393] systemd[1]: Startup finished in 6.800s (kernel) + 0 (initrd) + 4.352s (userspace) = 11.153s.
——
Ignition has failed. Please ensure your config is valid. Note that only
Ignition spec v3.0.0+ configs are accepted.

A CLI validation tool to check this called ignition-validate can be
downloaded from GitHub:
https://github.com/coreos/ignition/releases
——

Displaying logs from failed units: ignition-fetch-offline.service
— Logs begin at Sun 2021-03-21 03:07:51 UTC, end at Sun 2021-03-21 03:07:54 UTC. —
Mar 21 03:07:54 ignition[749]: no config URL provided
Mar 21 03:07:54 ignition[749]: reading system config file "/usr/lib/ignition/user.ign"
Mar 21 03:07:54 ignition[749]: no config at "/usr/lib/ignition/user.ign"
Mar 21 03:07:54 ignition[749]: [0;2;37m[0;1;31m[0;2;37mconfig successfully fetched[0m
Mar 21 03:07:54 ignition[749]: [0;2;37m[0;1;31m[0;2;37mparsing config with SHA512: b71f59139d6c3101031fd0cee073e0503f233c47129db8597462687a608ae0a4b594bf9c170ce55dbd289d4be2638f68e4d39c9b2f50c81f956d5bca24955959[0m
Mar 21 03:07:54 systemd[1]: ignition-fetch-offline.service: Triggering OnFailure= dependencies.
Mar 21 03:07:54 ignition[749]: [0;1;31m[0;1;39m[0;1;31merror at line 7 col 5: invalid character ‘]' after object key:value pair[0m
Mar 21 03:07:54 ignition[749]: [0;1;39m[0;1;31m[0;1;39mfailed to fetch config: config is not valid[0m
Mar 21 03:07:54 ignition[749]: [0;1;31m[0;1;39m[0;1;31mfailed to acquire config: config is not valid[0m
Mar 21 03:07:54 ignition[749]: [0;1;31m[0;1;39m[0;1;31mIgnition failed: config is not valid[0m
Press Enter for emergency shell or wait 5 minutes for reboot.
Press Enter for emergency shell or wait 4 minutes 45 seconds for reboot.

Once fixed and booted, fix any key issues:

# ssh core@192.168.0.105
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:+PaPjXcO/gOaen9+fHfI1q7s7XQgaczHXUWm6Gtf56E.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ED25519 key in /var/lib/sss/pubconf/known_hosts:18
ECDSA host key for 192.168.0.105 has changed and you have requested strict checking.
Host key verification failed.

# ssh-keyscan -t ecdsa 192.168.0.105 >> ~/.ssh/known_hosts

And login using the previously generated SSH key:

# ssh -i ../../.ssh/id_rsa-os01 core@192.168.0.105
Red Hat Enterprise Linux CoreOS 47.83.202102090044-0
Part of OpenShift 4.7, RHCOS is a Kubernetes native operating system
managed by the Machine Config Operator (`clusteroperator/machine-config`).

—
This is the bootstrap node; it will be destroyed when the master is fully up.

The primary services are release-image.service followed by bootkube.service. To watch their status, run e.g.

journalctl -b -f -u release-image.service -u bootkube.service
[core@bootstrap01 ~]$

TUVM!

Installation Path

HFIOS!

If this error is seen when running something simple as os whoami :

Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get users.user.openshift.io ~)

It is likely that the bootstrap server was booted last or was not the first server to be started up. The bootstrap server needs to be started first and allowed to fully configure itself. The bootstrap server is meant to connect to the rest of the nodes to configure them. Without those nodes, it doesn't do much. So you get the above because the cluster is not configured.

You can test that the bootstrap server was started and configured itself correctly when you see the following output:

# nc -v rhbs01.osc01.unix.lab.com 6443
Ncat: Version 6.40 ( http://nmap.org/ncat )
Ncat: Connected to 10.0.0.105:6443.

And the following message is visible:

./openshift-install –dir=/root/openshift/install wait-for bootstrap-complete –log-level=info
INFO Waiting up to 20m0s for the Kubernetes API at https://api.osc01.unix.lab.com:6443…
INFO API v1.20.0+5fbfd19 up
INFO Waiting up to 30m0s for bootstrapping to complete…

Once all the machines are bootstrapped, you should see the following message:

INFO It is now safe to remove the bootstrap resources

The above was ultimately due to expired installation certificates. The master nodes need to be built out before the installation certificate fully expires, which is typically 24 hours.

References

REF: https://docs.openshift.com/container-platform/4.7/installing/installing_vsphere/installing-vsphere.html#installing-vsphere

REF: https://www.youtube.com/watch?v=6TvyHBdHhes

REF: https://github.com/openshift/machine-config-operator/blob/master/pkg/server/bootstrap_server.go

REF: https://github.com/openshift/machine-config-operator/issues/2562

Thanks,

This entry was posted on Sunday, March 14th, 2021 at 8:40 am and is filed under NIX Posts. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

You must be logged in to post a comment.

Thoughts and Scribbles | MicroDevSys.com