[go: up one dir, main page]

Page MenuHomePhabricator

2024 Q4 Goal: Revert Risk models are supported by caching in production
Open, Needs TriagePublic

Event Timeline

Update:

  • Merged puppet machinery to allow network policies to be generated for assorted cluster. So we can automatically generated the network policy without the 60 lines of istio config.
  • Will merge change to network policy to allow Istio to talk to Cassandra.

After this we should have a minimal test service live.

Update:

  • Rebased code after prototype.
  • Waiting for istio change for making a new service, which is imminent
  • Need to add new visual service that is tcp

Update:

  • Working on plumbing on staging, should be done within week
  • Feeling good about it

Update:

  • Connections from isvc namespaces on staging to the Cassandra machines now work, including TLS certs and SNI
  • Next step: have an actual inference service actually talk to the cache, likely with code from https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/995001
  • Still need to figure out long-term maintenance of Cassandra server-side config (users, passwords, namespaces, schemas); may hand off/soft-donate the machines to Data Persistence Team

Update:

  • Trying to fix up a Calico networking issue in Kubernetes
  • After credentials, will send patched revert risk server to ml-staging

Current state:

Three changes open/wip: