2024 Q4 Goal: Revert Risk models are supported by caching in production
Open, Needs TriagePublic
Actions

Assigned To

Authored By

	calbon
	Apr 16 2024, 2:50 PM

Update:

Merged puppet machinery to allow network policies to be generated for assorted cluster. So we can automatically generated the network policy without the 60 lines of istio config.
Will merge change to network policy to allow Istio to talk to Cassandra.

After this we should have a minimal test service live.

Update:

Update:

Update:

Connections from isvc namespaces on staging to the Cassandra machines now work, including TLS certs and SNI
Next step: have an actual inference service actually talk to the cache, likely with code from https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/995001
Still need to figure out long-term maintenance of Cassandra server-side config (users, passwords, namespaces, schemas); may hand off/soft-donate the machines to Data Persistence Team

Update:

Current state:

Three changes open/wip:

https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/995001 -- Actual code that would add cache to RRML
private repo where the actual credentials live
not ready yet deployment charts for credentials (depends on the two above).

2024 Q4 Goal: Revert Risk models are supported by caching in productionOpen, Needs TriagePublicActions