Until now we mostly relied on Kubernetes default network policies, especially when building charts. While this increases interoperability of charts, we leave features unused that could help us:
- T275035: Refactor common_templates/0.2/default-network-policy-conf.yaml into a GlobalNetworkPolicy
- T287491: Allow to address Kubernetes API servers from NetworkPolicy
This task is meant to collect opinions/objections to moving further into the "calico space". One additional possible use case I can think of would be to "sync" commonly used network objects (groups of hosts, like kafka, restbase, ...) from external sources (Puppet/Netbox/...) to Kubernetes clusters representing them as objects which can then be referenced from network policies without having to copy IPs to charts and/or re-deploy services to kubernetes because of IP changes.
Outside k8s infrastructure abstraction
There are currently three recurring pain points where infrastructure outside of k8s needs to be synced to multiple k8s deployments:
- mariadb databases T340843: WikiKube: Investigate how to abstract misc Mariadb clusters host/ip information so that no deployment of apps is needed when a master is failed over
- kafka clusters T253058: DRY kafka broker declaration in helmfiles, T213561: Discovery for Kafka cluster brokers
- zookeeper clusters 960662
This is done basically by:
- Reading the data from hiera
- Mangle it to a more or less consistent format
- Write it out to YAML files on deployment servers
- Consume said YAML files via helmfile
- Use the values in the helm charts (either via modules,for network policies or directly)
With this we try to solve one to two problems:
- Provide data (usually IPv4/v6 and port pairs) to be used in network policies
- Provide a list of FQDN/port pairs to be used in connection strings within the applications (currently in the works for zookeeper)
This has one big shortcoming:
The so generated network policies will reflect changes to the outside infrastructure only after a re-deployment of the charts using them. Which requires somebody to know about the change as well as that a specific deployment uses the changing infrastructure (e.g. needs to be re-deployed).
Network policies
There is probably only puppet as a source of truth spanning all of the current (and maybe future) usages that knows about the network facts (IPs/ports) as well as about how to group them (into clusters etc.).
We could build a system that generates kubernetes Service and Endpoint objects for the external infrastructure which can then be referenced in calico network policies (kubernetes native network policies don't support referencing services, just pods) and the actual workload. For example:
kind: Service apiVersion: v1 metadata: name: main-eqiad namespace: kafka spec: ports: - name: kafka protocol: TCP port: 9092 targetPort: 9092 selector: {} # this makes the service not select any pods as endpoints --- kind: Endpoints apiVersion: v1 metadata: name: main-eqiad namespace: kafka subsets: - addresses: - ip: 10.64.0.200 # kafka-main1001.eqiad.wmnet - ip: 10.64.16.37 # kafka-main1002.eqiad.wmnet ports: - port: 9092 name: kafka
A service deployment would then create a calico network policy like (not needing to know the actual kafka IPs):
apiVersion: projectcalico.org/v3 kind: NetworkPolicy metadata: name: allow-kafka-main-eqiad namespace: my-service spec: selector: all() egress: - action: Allow destination: services: name: main-eqiad namespace: kafka
The workload could then also use main-eqiad.kafka.svc.cluster.local:9092to connect to one of the brokers (round robin) using the kubernetes service.
Connection strings
It would be nice to leverage the above to provide central agnostic endpoints in all kubernetes clusters, leveraging the kubernetes service objects, load balancing and internal DNS. Unfortunately the actual applications have different requirements regarding their connection strings so this might not always be sufficient.
Zookeeper
The Connect String that is needed to connect to Apache ZooKeeper. This is a comma-separated list of hostname:port pairs. For example, localhost:2181,localhost:2182,localhost:2183. This should contain a list of all ZooKeeper instances in the ZooKeeper quorum.
Kafka
We usually provide a list of broker:port to connect to, but ultimately one is enough.
MariaDB
<section>-master.<DC>.wmnet:<section-port or 3306>
Where the FQDN is actually a CNAME pointing to some dbproxy host.
Outlook
There are other use-cases for this as well, one I just thought of was synchronizing Prometheus nodes to kubernetes clusters this way to restrict access to metrics ports from just them.