Event Timeline
Comment Actions
Update:
- Merged puppet machinery to allow network policies to be generated for assorted cluster. So we can automatically generated the network policy without the 60 lines of istio config.
- Will merge change to network policy to allow Istio to talk to Cassandra.
After this we should have a minimal test service live.
Comment Actions
Update:
- Rebased code after prototype.
- Waiting for istio change for making a new service, which is imminent
- Need to add new visual service that is tcp
Comment Actions
Update:
- Working on plumbing on staging, should be done within week
- Feeling good about it
Comment Actions
Update:
- Connections from isvc namespaces on staging to the Cassandra machines now work, including TLS certs and SNI
- Next step: have an actual inference service actually talk to the cache, likely with code from https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/995001
- Still need to figure out long-term maintenance of Cassandra server-side config (users, passwords, namespaces, schemas); may hand off/soft-donate the machines to Data Persistence Team
Comment Actions
Update:
- Trying to fix up a Calico networking issue in Kubernetes
- After credentials, will send patched revert risk server to ml-staging
Comment Actions
Current state:
Three changes open/wip:
- https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/995001 -- Actual code that would add cache to RRML
- private repo where the actual credentials live
- not ready yet deployment charts for credentials (depends on the two above).