[go: up one dir, main page]

Page MenuHomePhabricator

Improve how we address outside k8s infrastructure from within charts (e.g. network policies)
Open, MediumPublic

Description

Until now we mostly relied on Kubernetes default network policies, especially when building charts. While this increases interoperability of charts, we leave features unused that could help us:

This task is meant to collect opinions/objections to moving further into the "calico space". One additional possible use case I can think of would be to "sync" commonly used network objects (groups of hosts, like kafka, restbase, ...) from external sources (Puppet/Netbox/...) to Kubernetes clusters representing them as objects which can then be referenced from network policies without having to copy IPs to charts and/or re-deploy services to kubernetes because of IP changes.

Outside k8s infrastructure abstraction

There are currently three recurring pain points where infrastructure outside of k8s needs to be synced to multiple k8s deployments:

This is done basically by:

  • Reading the data from hiera
  • Mangle it to a more or less consistent format
  • Write it out to YAML files on deployment servers
  • Consume said YAML files via helmfile
  • Use the values in the helm charts (either via modules,for network policies or directly)

With this we try to solve one to two problems:

  • Provide data (usually IPv4/v6 and port pairs) to be used in network policies
  • Provide a list of FQDN/port pairs to be used in connection strings within the applications (currently in the works for zookeeper)

This has one big shortcoming:
The so generated network policies will reflect changes to the outside infrastructure only after a re-deployment of the charts using them. Which requires somebody to know about the change as well as that a specific deployment uses the changing infrastructure (e.g. needs to be re-deployed).

Network policies

There is probably only puppet as a source of truth spanning all of the current (and maybe future) usages that knows about the network facts (IPs/ports) as well as about how to group them (into clusters etc.).

We could build a system that generates kubernetes Service and Endpoint objects for the external infrastructure which can then be referenced in calico network policies (kubernetes native network policies don't support referencing services, just pods) and the actual workload. For example:

kind: Service
apiVersion: v1
metadata:
  name: main-eqiad
  namespace: kafka
spec:
  ports:
    - name: kafka
      protocol: TCP
	  port: 9092
      targetPort: 9092 
selector: {} # this makes the service not select any pods as endpoints
---
kind: Endpoints
apiVersion: v1
metadata:
  name: main-eqiad
  namespace: kafka
subsets: 
  - addresses:
    - ip: 10.64.0.200 # kafka-main1001.eqiad.wmnet
    - ip: 10.64.16.37 # kafka-main1002.eqiad.wmnet
    ports:
    - port: 9092
      name: kafka

A service deployment would then create a calico network policy like (not needing to know the actual kafka IPs):

apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: allow-kafka-main-eqiad
  namespace: my-service
spec:
  selector: all()
  egress:
    - action: Allow
      destination:
        services:
          name: main-eqiad
          namespace: kafka

The workload could then also use main-eqiad.kafka.svc.cluster.local:9092to connect to one of the brokers (round robin) using the kubernetes service.

Connection strings

It would be nice to leverage the above to provide central agnostic endpoints in all kubernetes clusters, leveraging the kubernetes service objects, load balancing and internal DNS. Unfortunately the actual applications have different requirements regarding their connection strings so this might not always be sufficient.

Zookeeper

The Connect String that is needed to connect to Apache ZooKeeper. This is a comma-separated list of hostname:port pairs. For example, localhost:2181,localhost:2182,localhost:2183. This should contain a list of all ZooKeeper instances in the ZooKeeper quorum.

Kafka

We usually provide a list of broker:port to connect to, but ultimately one is enough.

MariaDB

<section>-master.<DC>.wmnet:<section-port or 3306>
Where the FQDN is actually a CNAME pointing to some dbproxy host.

Outlook

There are other use-cases for this as well, one I just thought of was synchronizing Prometheus nodes to kubernetes clusters this way to restrict access to metrics ports from just them.

Details

SubjectRepoBranchLines +/-
operations/alertsmaster+45 -6
operations/alertsmaster+39 -0
operations/puppetproduction+1 -1
operations/puppetproduction+0 -13
operations/puppetproduction+29 -12
operations/puppetproduction+18 -6
operations/puppetproduction+0 -15
operations/puppetproduction+1 -1
operations/puppetproduction+85 -10
operations/deployment-chartsmaster+2 -15
operations/puppetproduction+19 -3
operations/puppetproduction+13 -0
operations/puppetproduction+73 -0
operations/puppetproduction+28 -1
operations/deployment-chartsmaster+77 -9
operations/deployment-chartsmaster+3 -6
operations/deployment-chartsmaster+3 -3
operations/deployment-chartsmaster+3 -0
operations/deployment-chartsmaster+7 -5
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+1 -1
operations/puppetproduction+5 -5
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+6 -0
operations/deployment-chartsmaster+71 -1
operations/puppetproduction+1 -1
operations/deployment-chartsmaster+28 -0
operations/deployment-chartsmaster+54 -1
operations/deployment-chartsmaster+221 -0
operations/puppetproduction+17 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I hijacked the kubestaging mw-debug namespace for a quick test, to make sure our overall NetworkPolicy strategy works.

I created the following resources in the namespace:

apiVersion: v1
kind: Pod
metadata:
  name: test-brouberol
  namespace: mw-debug
  labels:
    app: test-brouberol
spec:
  containers:
  - args:
    - '10000'
    command:
    - sleep
    image: docker-registry.wikimedia.org/alpine:3.5
    imagePullPolicy: IfNotPresent
    name: test-brouberol
    resources:
      limits:
        cpu: "1"
        memory: 128Mi
      requests:
        cpu: "1"
        memory: 128Mi

---
apiVersion: v1
kind: Service
metadata:
  name: test-brouberol-kafka
  namespace: mw-debug
spec:
  clusterIP: "None"
  ports:
  - name: kafka-plaintext
    protocol: TCP
    port: 9092
    targetPort: 9092
  selector: {} # this makes the service not select any pods as endpoints
---
apiVersion: v1
kind: Endpoints
metadata:
  name: test-brouberol-kafka
  namespace: mw-debug
subsets:
  - addresses:
    - ip: 10.64.16.165
    ports:
    - name: kafka-plaintext
      port: 9092

and then ran

sudo nsenter --target $(sudo docker top $(sudo docker ps | grep k8s_test-brouberol_test-brouberol_mw-debug | awk '{ print $1 }') | grep sleep | awk '{ print $2 }')  --net telnet 10.64.16.165 9092
Trying 10.64.16.165...
^C

I then applied the following NetworkPolicy and re-ran the same command:

apiVersion: crd.projectcalico.org/v1
kind: NetworkPolicy
metadata:
  name: test-brouberol-allow-kafka
  namespace: mw-debug
spec:
  selector: app == 'test-brouberol'
  types:
  - Egress
  egress:
    - action: Allow
      destination:
        services:
          name: test-brouberol-kafka
          namespace: mw-debug
brouberol@kubestage1004:~$ sudo nsenter --target $(sudo docker top $(sudo docker ps | grep k8s_test-brouberol_test-brouberol_mw-debug | awk '{ print $1 }') | grep sleep | awk '{ print $2 }')  --net telnet 10.64.16.165 9092
Trying 10.64.16.165...
Connected to 10.64.16.165.
Escape character is '^]'.

The next step is trying to telnet to another kafka IP in the same cluster, check that it fails, add the IP to the endpoint. At that point, telnet should be able to reach the new IP.

brouberol@kubestage1004:~$ sudo nsenter --target $(sudo docker top $(sudo docker ps | grep k8s_test-brouberol_test-brouberol_mw-debug | awk '{ print $1 }') | grep sleep | awk '{ print $2 }')  --net telnet 10.64.16.164 9092
Trying 10.64.16.164...
^C

I then applied the following endpoint spec

---
apiVersion: v1
kind: Endpoints
metadata:
  name: test-brouberol-kafka
  namespace: mw-debug
subsets:
  - addresses:
    - ip: 10.64.16.165
    - ip: 10.64.16.164 # NEW IP
    ports:
    - name: kafka-plaintext
      port: 9092
brouberol@kubestage1004:~$ sudo nsenter --target $(sudo docker top $(sudo docker ps | grep k8s_test-brouberol_test-brouberol_mw-debug | awk '{ print $1 }') | grep sleep | awk '{ print $2 }')  --net telnet 10.64.16.164 9092
Trying 10.64.16.164...
Connected to 10.64.16.164.
Escape character is '^]'.

Finally, I tried to mix IPv4 and IPv6 in the same Endpoints:

root@deploy2002:~# kubectl describe service test-brouberol-kafka -n mw-debug
Name:              test-brouberol-kafka
Namespace:         mw-debug
Labels:            <none>
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP Family Policy:  RequireDualStack
IP Families:       IPv4,IPv6
IP:                None
IPs:               None
Port:              kafka-plaintext  9092/TCP
TargetPort:        9092/TCP
Endpoints:         10.64.16.165:9092,10.64.16.164:9092,[2620:0:861:102:10:64:16:164]:9092
Session Affinity:  None
Events:            <none>
brouberol@kubestage1004:~$ sudo nsenter --target $(sudo docker top $(sudo docker ps | grep k8s_test-brouberol_test-brouberol_mw-debug | awk '{ print $1 }') | grep sleep | awk '{ print $2 }')  --net telnet 2620:0:861:102:10:64:16:164 9092
Trying 2620:0:861:102:10:64:16:164...
Connected to 2620:0:861:102:10:64:16:164.
Escape character is '^]'.
^CConnection closed by foreign host.
brouberol@kubestage1004:~$ sudo nsenter --target $(sudo docker top $(sudo docker ps | grep k8s_test-brouberol_test-brouberol_mw-debug | awk '{ print $1 }') | grep sleep | awk '{ print $2 }')  --net telnet 10.64.16.164 9092
Trying 10.64.16.164...
Connected to 10.64.16.164.
Escape character is '^]'.

It works!

Note: I needed to omit the /32 and /128 suffixes from the Endpoints IPs.

I was thinking about the chart migration phase. We currently have values with stanzas such as

kafka:
  allowed_clusters:
    - main-eqiad

generating the following IP-based NetworkPolicy via the base/networkpolicy_xxx.tpl Helm template:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
...
spec:
  podSelector:
    matchLabels:
      app: spark-history
      release: analytics-test-hadoop
  policyTypes:
    - Egress
    - Ingress
  ingress:
  ...
  egress:
  ...
    - to:
      - ipBlock:
          cidr: 10.64.48.31/32
      ports:
      - protocol: TCP
        port: 9092
      - protocol: TCP
        port: 9093
    - to:
      - ipBlock:
          cidr: 2620:0:861:107:10:64:48:31/128
      ports:
      - protocol: TCP
        port: 9092
      - protocol: TCP
        port: 9093

To make it easy to migrate charts from IP-based Network Policies to Service-based ones, we could re-use the existing values, and publish a new base/networkpolicy template which would render 2 resources:

  • a networking.k8s.io/v1/NetworkPolicy resource, in charge of managing all custom egress/ingress rules based
  • a crd.projectcalico.org/v1/NetworkPolicy resource in charge of managing all service-based egress rules

The latter would look like this

apiVersion: crd.projectcalico.org/v1
kind: NetworkPolicy
metadata:
  name: {{ template "base.meta.name" . }}-egress-external-services
  namespace: {{ .Release.Namespace }}
spec:
  selector: name == '{{ template "base.meta.name" . }}'
  types:
  - Egress
  egress:
    - action: Allow
      destination:
        services:
          name: kafka-main-eqiad
          namespace: {{ .Release.Namespace }}

This way, we could migrate all existing charts by performing a sextant update and not touch to any of the values, thus keeping any documentation about egress to kafka/zk/etc revelant and up-to-date.

Gehel triaged this task as Medium priority.Jan 22 2024, 2:27 PM

Change 987393 merged by Brouberol:

[operations/puppet@production] global_config: list IPs of hadoop master/workers and kerberos nodes

https://gerrit.wikimedia.org/r/987393

Change 1009279 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] Add template rendering external services egress NetworkPolicy resources

https://gerrit.wikimedia.org/r/1009279

I know think we might have misunderstood each other in T359334. I do think your proposal in https://phabricator.wikimedia.org/T359334#9606340 is the way to go, but what I initially meant was grouping the source data in global_config in a similar way. I now reply here because this relates to the external-services chart and how we generate data for it.

So instead of having

kafka_brokers:
  jumbo-eqiad:
  - 10.64.0.200
  ...
zookeeper_clusters:
  main-eqiad:
  - 10.64.0.207
  ...
cas:
  idp:
  - 208.80.153.108
 ...

in all the global helmfile values files on deployment hosts, move these below a specific key as well, like:

external_services_definitions:
  kafka_brokers:
    jumbo-eqiad:
    - 10.64.0.200
    ...
  zookeeper_clusters:
    main-eqiad:
    - 10.64.0.207
    ...
  cas:
    idp:
    - 208.80.153.108
   ...

With that it's less likely that the definitions get accidentally overridden by some chart and the external-services chart can range over the structure[1] without knowing it's contents (e.g. no patch of the chart is required when adding a new external-service)

While writing this I now realize that there is an additional companion data structure that comes with the external-services chart, configuring ports, service name and namespace. That feels a bit scattered around to me tbh. and I wonder if that should be part the data generated by puppet as well, so that the external-services chart can be totally agnostic of the services it manages and we only have to make changes in one place if we add a new service. Like:

external_services_definitions:
  kafka:
    _meta:
      ports:
      - name: plaintext
        ...
    jumbo-eqiad:
    - 10.64.0.200
    ...
  zookeeper:
    _meta:
      ports:
      - name: client
        ...
    main-eqiad:
    - 10.64.0.207
    ...
  hadoop-master:
    _meta:
      ports:
      - name: namenode
        ...
    analytics_test:
    - 10.64.0.208
   ...

This could also remove the need for the complexity around service and namespace in [2] which would, as it is now, require yet another change to create one namespace per external service (IIUC).

[1] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/984819/20/charts/external-services/templates/service.yaml#10
[2] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/984819/20/charts/external-services/templates/service.yaml#20

@JMeybohm Thanks for the thorough write-up. I'm on board! I'll rework the CRs.

The only change I've made to the suggested structure was to add an instances fields, holding the instances/clusters IPs:

external_services_definitions:
  kafka:
    _meta:
      ports:
      - name: plaintext
        ...
    instances:  # <---
      jumbo-eqiad:
      - 10.64.0.200

it's easier to handle within a chart, instead of having to loop over the fields within each section and ignore _meta.

Change #984819 merged by Brouberol:

[operations/deployment-charts@master] external-services: define a chart referencing external services clusters

https://gerrit.wikimedia.org/r/984819

Change #1013512 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] external-services: define helmfile

https://gerrit.wikimedia.org/r/1013512

Change #1013527 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Add external_services_definitions to fixtures

https://gerrit.wikimedia.org/r/1013527

Change #1013527 merged by jenkins-bot:

[operations/deployment-charts@master] Add external_services_definitions to fixtures

https://gerrit.wikimedia.org/r/1013527

Change #1013512 merged by jenkins-bot:

[operations/deployment-charts@master] external-services: define helmfile

https://gerrit.wikimedia.org/r/1013512

Change #1013539 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] global_config: fix druid historical port

https://gerrit.wikimedia.org/r/1013539

Change #1013539 merged by Brouberol:

[operations/puppet@production] global_config: fix druid historical port

https://gerrit.wikimedia.org/r/1013539

Change #1009279 merged by Brouberol:

[operations/deployment-charts@master] Add template rendering external services egress NetworkPolicy resources

https://gerrit.wikimedia.org/r/1009279

@JMeybohm and myself have deployed the newly released external-services chart to both the staging-codfw and dse-k8s-eqiad K8s clusters.

This now provides us with service discovery for services externally to Kubernetes.

brouberol@deploy1002:~$ host kerberos-kdc.external-services.svc.cluster.local 10.192.75.126
Using domain server:
Name: 10.192.75.126
Address: 10.192.75.126#53
Aliases:

kerberos-kdc.external-services.svc.cluster.local has address 10.192.48.190
kerberos-kdc.external-services.svc.cluster.local has address 10.64.0.112
kerberos-kdc.external-services.svc.cluster.local has IPv6 address 2620:0:861:101:10:64:0:112
kerberos-kdc.external-services.svc.cluster.local has IPv6 address 2620:0:860:104:10:192:48:190
brouberol@deploy1002:~$ host kafka-jumbo-eqiad.external-services.svc.cluster.local 10.192.75.126
Using domain server:
Name: 10.192.75.126
Address: 10.192.75.126#53
Aliases:

kafka-jumbo-eqiad.external-services.svc.cluster.local has address 10.64.136.11
kafka-jumbo-eqiad.external-services.svc.cluster.local has address 10.64.134.9
kafka-jumbo-eqiad.external-services.svc.cluster.local has address 10.64.32.106
kafka-jumbo-eqiad.external-services.svc.cluster.local has address 10.64.135.16
kafka-jumbo-eqiad.external-services.svc.cluster.local has address 10.64.130.10
kafka-jumbo-eqiad.external-services.svc.cluster.local has address 10.64.48.140
kafka-jumbo-eqiad.external-services.svc.cluster.local has address 10.64.131.16
kafka-jumbo-eqiad.external-services.svc.cluster.local has address 10.64.48.121
kafka-jumbo-eqiad.external-services.svc.cluster.local has address 10.64.132.21
kafka-jumbo-eqiad.external-services.svc.cluster.local has IPv6 address 2620:0:861:107:10:64:48:140
kafka-jumbo-eqiad.external-services.svc.cluster.local has IPv6 address 2620:0:861:107:10:64:48:121
kafka-jumbo-eqiad.external-services.svc.cluster.local has IPv6 address 2620:0:861:10e:10:64:135:16
kafka-jumbo-eqiad.external-services.svc.cluster.local has IPv6 address 2620:0:861:10f:10:64:136:11
kafka-jumbo-eqiad.external-services.svc.cluster.local has IPv6 address 2620:0:861:103:10:64:32:106
kafka-jumbo-eqiad.external-services.svc.cluster.local has IPv6 address 2620:0:861:109:10:64:130:10
kafka-jumbo-eqiad.external-services.svc.cluster.local has IPv6 address 2620:0:861:10d:10:64:134:9
kafka-jumbo-eqiad.external-services.svc.cluster.local has IPv6 address 2620:0:861:10a:10:64:131:16
kafka-jumbo-eqiad.external-services.svc.cluster.local has IPv6 address 2620:0:861:10b:10:64:132:21

These services can also be referenced by Calico NetworkPolicy resources to allow egress to these services, as defined in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1009279?usp=dashboard

Change #1013950 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] rbac: allow deploy users to perform actions on Calico NetworkPolicies

https://gerrit.wikimedia.org/r/1013950

Change #1013950 merged by Brouberol:

[operations/deployment-charts@master] rbac: allow deploy users to perform actions on Calico NetworkPolicies

https://gerrit.wikimedia.org/r/1013950

Change #1013954 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] external_services: assume the feature is disabled by default

https://gerrit.wikimedia.org/r/1013954

Change #1013954 merged by jenkins-bot:

[operations/deployment-charts@master] external_services: assume the feature is disabled by default

https://gerrit.wikimedia.org/r/1013954

Change #1013964 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] global_config: fix druid and presto configuration

https://gerrit.wikimedia.org/r/1013964

Change #1013965 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] Fix pod label selector for external-services network policy

https://gerrit.wikimedia.org/r/1013965

Change #1013964 merged by Brouberol:

[operations/puppet@production] global_config: fix druid and presto configuration

https://gerrit.wikimedia.org/r/1013964

Change #1013965 merged by Brouberol:

[operations/deployment-charts@master] Fix pod label selector for external-services network policy

https://gerrit.wikimedia.org/r/1013965

Change #1013969 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] fix template/include in networkpolicy template scaffoloding

https://gerrit.wikimedia.org/r/1013969

Change #1013969 merged by jenkins-bot:

[operations/deployment-charts@master] fix template/include in networkpolicy template scaffoloding

https://gerrit.wikimedia.org/r/1013969

Change #1014024 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Enable external-services on all wikikube clusters

https://gerrit.wikimedia.org/r/1014024

Change #1014436 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] external-services: ensure rendering idempotence by sorting services and IPs

https://gerrit.wikimedia.org/r/1014436

Change #1014436 merged by Brouberol:

[operations/deployment-charts@master] external-services: ensure rendering idempotence by sorting services and IPs

https://gerrit.wikimedia.org/r/1014436

Change #1014024 merged by jenkins-bot:

[operations/deployment-charts@master] Enable external-services on all wikikube clusters

https://gerrit.wikimedia.org/r/1014024

Change #1014505 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] external-services: remove the service name from port names

https://gerrit.wikimedia.org/r/1014505

Change #1014505 merged by jenkins-bot:

[operations/deployment-charts@master] external-services: remove the service name from port names

https://gerrit.wikimedia.org/r/1014505

Change #1014518 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Add external-services namespace to all wikikube clusters

https://gerrit.wikimedia.org/r/1014518

Change #1014518 merged by jenkins-bot:

[operations/deployment-charts@master] Add external-services namespace to all wikikube clusters

https://gerrit.wikimedia.org/r/1014518

Deployed v0.0.3 of the chart incl. rdb to all wikikube and staging clusters as well as dse

Change #1014065 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Migrate datahub to use external-services for CAS IDP

https://gerrit.wikimedia.org/r/1014065

Change #1014065 merged by jenkins-bot:

[operations/deployment-charts@master] Migrate datahub to use external-services for CAS IDP

https://gerrit.wikimedia.org/r/1014065

Change #1024610 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] global_config: add analytics mariadb/postgresql instances

https://gerrit.wikimedia.org/r/1024610

Change #1024613 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] global_config: add elasticearch instances

https://gerrit.wikimedia.org/r/1024613

Change #1024610 merged by Brouberol:

[operations/puppet@production] global_config: add analytics mariadb/postgresql instances

https://gerrit.wikimedia.org/r/1024610

Change #1024613 merged by Brouberol:

[operations/puppet@production] global_config: add elasticearch instances

https://gerrit.wikimedia.org/r/1024613

Change #1032772 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] global_config: register IP/port for the datahubsearch opensearch cluster

https://gerrit.wikimedia.org/r/1032772

Change #1032772 merged by Brouberol:

[operations/puppet@production] global_config: register IP/port for the datahubsearch opensearch cluster

https://gerrit.wikimedia.org/r/1032772

Change #1040872 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] global_config: expose services for all mariadb hosts and masters

https://gerrit.wikimedia.org/r/1040872

Change #1040992 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] deployment_server: alert on admin-ng pending changes

https://gerrit.wikimedia.org/r/1040992

Change #1041142 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] superset: replace IP-based networkpolicy by its service counterpart

https://gerrit.wikimedia.org/r/1041142

Change #1040872 merged by Brouberol:

[operations/puppet@production] global_config: expose services for all analytics mariadb hosts and masters

https://gerrit.wikimedia.org/r/1040872

Change #1041142 merged by Brouberol:

[operations/deployment-charts@master] superset: replace IP-based networkpolicy by its service counterpart

https://gerrit.wikimedia.org/r/1041142

Change #1040992 merged by Brouberol:

[operations/puppet@production] deployment_server: alert on admin-ng pending changes

https://gerrit.wikimedia.org/r/1040992

Change #1042215 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] helmfile: fix typo in python script puppet path

https://gerrit.wikimedia.org/r/1042215

Change #1042215 merged by Brouberol:

[operations/puppet@production] helmfile: fix typo in python script puppet path

https://gerrit.wikimedia.org/r/1042215

Change #1042224 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] helmfile: set HELM environment variables for the admin-ng systemd jobs

https://gerrit.wikimedia.org/r/1042224

Change #1042224 merged by Brouberol:

[operations/puppet@production] helmfile: set HELM environment variables for the admin-ng systemd jobs

https://gerrit.wikimedia.org/r/1042224

Change #1042285 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] helmfile: don't schedule admin-ng diff check jobs for the staging k8s cluster

https://gerrit.wikimedia.org/r/1042285

Change #1042286 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] helmfile: remove temporary else block once resources were absented

https://gerrit.wikimedia.org/r/1042286

Change #1042296 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] hemlfile: export admin-ng pending diff metrics hourly

https://gerrit.wikimedia.org/r/1042296

Change #1042286 abandoned by Brouberol:

[operations/puppet@production] helmfile: remove temporary else block once resources were absented

Reason:

The parent patch was heavily rebased. I'll recreate a new CR.

https://gerrit.wikimedia.org/r/1042286

Change #1042336 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] helmfile: remove temporary else block once resources were absented

https://gerrit.wikimedia.org/r/1042336

Change #1042285 merged by Brouberol:

[operations/puppet@production] helmfile: don't schedule admin-ng diff check jobs for aliases of k8s clusters

https://gerrit.wikimedia.org/r/1042285

Change #1042336 merged by Brouberol:

[operations/puppet@production] helmfile: remove temporary else block once resources were absented

https://gerrit.wikimedia.org/r/1042336

Change #1042296 merged by Brouberol:

[operations/puppet@production] hemlfile: export admin-ng pending diff metrics hourly

https://gerrit.wikimedia.org/r/1042296

Change #1046593 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/alerts@master] monitor admin_ng pending changes for dse-k8s-eqiad

https://gerrit.wikimedia.org/r/1046593

Change #1046593 merged by Brouberol:

[operations/alerts@master] monitor admin_ng pending changes for dse-k8s-eqiad

https://gerrit.wikimedia.org/r/1046593

Change #1070483 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/alerts@master] Define a catchall monitor for pending admin_ng changes

https://gerrit.wikimedia.org/r/1070483

Change #1070483 merged by Brouberol:

[operations/alerts@master] Define a catchall monitor for pending admin_ng changes

https://gerrit.wikimedia.org/r/1070483

We now have an alert firing if an admin_ng pending change has been left undeployed for > 24h.