As part of a magnum cluster deploy start by deploying a, more persistent, cluster to codfw1dev. Test that having a single control node will survive a hypervisor drain as described as a potential limitation in https://phabricator.wikimedia.org/T326257#8500032
Description
Event Timeline
root@cloudcontrol2001-dev:~# openstack project create --description 'paws-dev' paws-dev --domain default +-------------+----------+ | Field | Value | +-------------+----------+ | description | paws-dev | | domain_id | default | | enabled | True | | id | paws-dev | | is_domain | False | | name | paws-dev | | options | {} | | parent_id | default | | tags | [] | +-------------+----------+ root@cloudcontrol2001-dev:~# openstack role add --project paws-dev --user rook projectadmin root@cloudcontrol2001-dev:~# openstack role add --project paws-dev --user rook user
openstack coe cluster template create paws-dev-k8s21 \ --image Fedora-CoreOS-34 \ --external-network wan-transport-codfw \ --fixed-subnet cloud-instances2-b-codfw \ --fixed-network lan-flat-cloudinstances2b \ --dns-nameserver 8.8.8.8 \ --network-driver flannel \ --docker-storage-driver overlay2 \ --docker-volume-size 30 \ --master-flavor g2.cores1.ram2.disk20 \ --flavor g2.cores1.ram2.disk20 \ --coe kubernetes \ --labels kube_tag=v1.21.8-rancher1-linux-amd64,hyperkube_prefix=docker.io/rancher/,cloud_provider_enabled=true
openstack quota set --gigabytes 150 paws-dev
openstack coe cluster create paws-dev --cluster-template paws-dev-k8s21 --master-count 1 --node-count 3 --floating-ip-disabled --keypair rookskey
Launch vm to be used for nfs and haproxy. Edit security groups to allow for 443 from anywhere, and anything from 172.16.0.0/12. Attach a floating IP. From that node...
apt install haproxy
append the following to /etc/haproxy/haproxy.cfg:
frontend k8s-ingress-http bind 0.0.0.0:80 mode http default_backend k8s-ingress frontend k8s-ingress-https bind 0.0.0.0:443 ssl crt /etc/acmecerts/paws/live/ec-prime256v1.chained.crt.key mode http default_backend k8s-ingress backend k8s-ingress mode http option httplog option tcp-check balance roundrobin timeout server 1h default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100 server <worker ip> <worker ip>:30001 check
/etc/acmecerts/paws/live/ec-prime256v1.chained.crt.key contains the key from prod, perhaps we should generate one just for this.
systemctl restart haproxy
for nfs:
apt-get install nfs-kernel-server -y press enter when prompted for which version thing to use. mkdir -p /srv/misc/shared/paws/project/paws/userhomes/ mkdir -p /srv/dumps/xmldatadumps/public chown -R nobody:nogroup /srv/ chmod 777 -R /srv/ echo '/srv 172.16.0.0/12(rw,sync,no_subtree_check)' >> /etc/exports systemctl restart nfs-server
From the labs bastion (bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org) after setting up the kube config:
helm upgrade --install ingress-nginx ingress-nginx \ --version v4.4.0 \ --repo https://kubernetes.github.io/ingress-nginx \ --namespace ingress-nginx --create-namespace \ --set controller.service.type=NodePort \ --set controller.service.enableHttps=false \ --set controller.service.nodePorts.http=30001
these steps could surely be nicer...T326417
find . -type f -exec sed -i -e 's/clouddumps100[21].wikimedia.org/haproxy-and-nfs.paws-dev.codfw1dev.wikimedia.cloud/g' {} \;
find . -type f -exec sed -i -e 's/nfs-tools-project.svc.eqiad.wmnet/haproxy-and-nfs.paws-dev.codfw1dev.wikimedia.cloud/g' {} \;
edit path names in values.yaml on lines 107 and 110 to be unique as well as mountPath on line 228
Delete the db entry in paws/secrets.yaml
Change 2000Mi to 20Mi on line 40 in paws/templates/public.yaml
kubectl config set-context --current --namespace=codfw1dev helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/ helm dep up paws/ kubectl create namespace codfw1dev helm install paws --namespace codfw1dev ./paws -f paws/production.yaml -f paws/secrets.yaml --timeout=50m kubectl apply -f manifests/psp.yaml
In the /etc/hosts of your machine (assuming the floating ip was 185.15.57.22):
185.15.57.22 hub.paws.wmcloud.org
185.15.57.22 paws.wmcloud.org
185.15.57.22 paws.wmflabs.org
185.15.57.22 public.paws.wmcloud.org
185.15.57.22 paws-public.wmflabs.org
This seemed to work and survive a hypervisor drain on the hypervisor hosting the control node.