Currently all projects monitored by the Prometheus instance in metricsinfra have manual security group rules to allow scraping. We'll need a way to automate managing those when rolling out to projects not managed by WMCS staff or active trusted volunteers.
I don't see an option in horizon to permit traffic from a security group on a separate project, but there are a few alternative options that come to my mind:
- The current monitoring host was given a reserved address in T250206#6056467. Expand that to a larger block of reserved addresses, say a /29 or a /28, (so that we can add redundancy and scale beyond one box) and add a rule that permits traffic from that block to all projects and the defaults for new projects. If it's restricted to the specific node-exporter port, we'd need to tell everyone to manually add rules for scraping non-default targets. This requires a production root to create new Prometheus VMs.
- Give the configuration tooling in metricsinfra powers to manage security groups in all projects. This is dangerous, but would let us automate most actions.
- Add the missing feature to Neutron to let us add a security group rule that permits traffic from a security group on a separate project. Add a rule using that to all existing projects and to the defaults of any new projects.