[go: up one dir, main page]

Page MenuHomePhabricator

calbon (Chris Albon)
Director of Machine Learning

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Jun 25 2020, 6:43 PM (229 w, 1 d)
Availability
Available
IRC Nick
chrisalbon
LDAP User
Calbon
MediaWiki User
CAlbon (WMF) [ Global Accounts ]

Recent Activity

Thu, Oct 24

calbon closed T377067: vscode remote ssh into ml-lab freezes as Resolved.
Thu, Oct 24, 3:41 PM · Machine-Learning-Team
calbon added a comment to T377067: vscode remote ssh into ml-lab freezes.

I checked again this morning. Cursor and VSCode work fine. Reading the VS Code forum, I think there was an update in 0.41 that fixed the issue.

Thu, Oct 24, 3:40 PM · Machine-Learning-Team
calbon added a comment to T376585: Access to deploy recommendation API ML service for kartik.

Approved

Thu, Oct 24, 2:56 PM · Machine-Learning-Team, SRE, LPL Essential (LPL Essential 2024 Jul-Oct), SRE-Access-Requests
calbon added a comment to T377067: vscode remote ssh into ml-lab freezes.

I tested this with vscode and cursor today. Both seem to work. I wonder if the install time was just really long

Screenshot 2024-10-23 at 19.17.23.png (3×3 px, 1 MB)

Thu, Oct 24, 2:19 AM · Machine-Learning-Team

Oct 12 2024

calbon updated the task description for T377067: vscode remote ssh into ml-lab freezes.
Oct 12 2024, 5:37 PM · Machine-Learning-Team
calbon created T377067: vscode remote ssh into ml-lab freezes.
Oct 12 2024, 5:12 PM · Machine-Learning-Team

Oct 11 2024

calbon created T376974: ml-lab should have documentation.
Oct 11 2024, 2:18 AM · Machine-Learning-Team

Oct 10 2024

calbon created T376967: ml-lab can't install rocm torch.
Oct 10 2024, 10:58 PM · Machine-Learning-Team

Sep 24 2024

calbon moved T359066: Add Licensing and Open Source requirement/strong preference to Lift Wing model deployment documentations from Ready To Go to 2024-2025 Q2 Done on the Machine-Learning-Team board.
Sep 24 2024, 2:50 PM · Documentation, Software-Licensing, Machine-Learning-Team
calbon closed T359066: Add Licensing and Open Source requirement/strong preference to Lift Wing model deployment documentations as Resolved.
Sep 24 2024, 2:50 PM · Documentation, Software-Licensing, Machine-Learning-Team
calbon moved T353974: LLM that specializes in assisting Wikimedia/MediaWiki technical contributors from In Progress to Backlog/Lift Wing on the Machine-Learning-Team board.
Sep 24 2024, 2:43 PM · artificial-intelligence, Machine-Learning-Team
calbon added a comment to T353974: LLM that specializes in assisting Wikimedia/MediaWiki technical contributors.

Update: Right now we don't have the resources to prioritize this. I'm moving it to the backlog.

Sep 24 2024, 2:43 PM · artificial-intelligence, Machine-Learning-Team

Aug 27 2024

calbon added a comment to T371398: Goal 4: Support product teams in deploying production models..
Aug 27 2024, 2:28 PM · Goal, Machine-Learning-Team
calbon added a comment to T371397: Goal 3: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services..
  • Slow revscoring, started logging queries on the pod side, so that is gone when the pod is killed.
  • Answer "Is there a reason we are not logging the query into logstash?"
Aug 27 2024, 2:18 PM · Goal, Machine-Learning-Team
calbon added a comment to T371396: Goal 2: People outside the ML team can ssh into an ml-lab machine, run a Jupyter Notebook, and run PyTorch powered by a GPU..
  • machines are racked but not set up. Will set up one first to figure out disk layout and then the other one. Then will release to the research team
Aug 27 2024, 2:15 PM · Data-Platform-SRE, Goal, Machine-Learning-Team
calbon added a comment to T371395: Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that is fast in production..
  • GPU hosts are racked but not set up yet
  • Software side slower
Aug 27 2024, 2:11 PM · Goal, Machine-Learning-Team

Aug 13 2024

calbon added a comment to T371398: Goal 4: Support product teams in deploying production models..

Update

  • Modernized recommendation API has been deployed to production
  • API gateway setup underway
  • Article quality LA: Ready on staging and want to bring it into production. Should we group models into common namespaces? Suggestion: create namespaces per area where the model is used: articles, revisions, images, etc.
Aug 13 2024, 2:56 PM · Goal, Machine-Learning-Team
calbon added a comment to T371396: Goal 2: People outside the ML team can ssh into an ml-lab machine, run a Jupyter Notebook, and run PyTorch powered by a GPU..

Update:

  • Waiting for ml-lab machines to be delivered to the eqiad data center.
Aug 13 2024, 2:35 PM · Data-Platform-SRE, Goal, Machine-Learning-Team
calbon renamed T371395: Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that is fast in production. from Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that uses an inference optimization engine in production. to Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that is fast in production..
Aug 13 2024, 2:33 PM · Goal, Machine-Learning-Team
calbon added a comment to T371395: Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that is fast in production..

Infra

  • Setting up the puppet roles
  • Can't commit puppet roles until the machines are there
  • Reached out to vendor
Aug 13 2024, 2:32 PM · Goal, Machine-Learning-Team

Jul 31 2024

calbon added a project to T371398: Goal 4: Support product teams in deploying production models.: Goal.
Jul 31 2024, 3:15 PM · Goal, Machine-Learning-Team
calbon added a project to T371397: Goal 3: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services.: Goal.
Jul 31 2024, 3:14 PM · Goal, Machine-Learning-Team
calbon added a project to T371396: Goal 2: People outside the ML team can ssh into an ml-lab machine, run a Jupyter Notebook, and run PyTorch powered by a GPU.: Goal.
Jul 31 2024, 3:14 PM · Data-Platform-SRE, Goal, Machine-Learning-Team
calbon added a project to T371395: Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that is fast in production.: Goal.
Jul 31 2024, 3:14 PM · Goal, Machine-Learning-Team

Jul 30 2024

calbon moved T369712: Request to update Readability model on Lift Wing from Unsorted to Ready To Go on the Machine-Learning-Team board.
Jul 30 2024, 2:56 PM · Lift-Wing, Machine-Learning-Team
calbon assigned T369712: Request to update Readability model on Lift Wing to AikoChou.
Jul 30 2024, 2:56 PM · Lift-Wing, Machine-Learning-Team
calbon created T371398: Goal 4: Support product teams in deploying production models..
Jul 30 2024, 2:28 PM · Goal, Machine-Learning-Team
calbon created T371397: Goal 3: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services..
Jul 30 2024, 2:28 PM · Goal, Machine-Learning-Team
calbon created T371396: Goal 2: People outside the ML team can ssh into an ml-lab machine, run a Jupyter Notebook, and run PyTorch powered by a GPU..
Jul 30 2024, 2:28 PM · Data-Platform-SRE, Goal, Machine-Learning-Team
calbon created T371395: Goal 1: Non-technical users can make a request to a Hugging Face Large Language Model that is fast in production..
Jul 30 2024, 2:28 PM · Goal, Machine-Learning-Team

Jun 18 2024

calbon moved T366528: Deployment of model updates from Unsorted to Backlog/Lift Wing on the Machine-Learning-Team board.
Jun 18 2024, 2:55 PM · Research-engineering, Machine-Learning-Team, Research
calbon assigned T366772: Solve revscoring models increased latencies for big revision sizes to AikoChou.
Jun 18 2024, 2:55 PM · Machine-Learning-Team
calbon reassigned T367293: Update blubber version in docker images from klausman to isarantopoulos.
Jun 18 2024, 2:54 PM · Machine-Learning-Team
calbon assigned T367293: Update blubber version in docker images to klausman.
Jun 18 2024, 2:53 PM · Machine-Learning-Team
calbon assigned T367537: Cloud VPS "machine-learning" project Buster deprecation to klausman.
Jun 18 2024, 2:50 PM · Machine-Learning-Team, Cloud-VPS (Debian Buster Deprecation)
calbon moved T367537: Cloud VPS "machine-learning" project Buster deprecation from Unsorted to Backlog/SRE on the Machine-Learning-Team board.
Jun 18 2024, 2:50 PM · Machine-Learning-Team, Cloud-VPS (Debian Buster Deprecation)
calbon moved T367562: Cloud VPS "wikilabels" project Buster deprecation from Unsorted to Watching on the Machine-Learning-Team board.
Jun 18 2024, 2:49 PM · Wikilabels, Machine-Learning-Team, Cloud-VPS (Debian Buster Deprecation)
calbon moved T367875: Reimage all ml-serve machines with Bookworm from Unsorted to Backlog/SRE on the Machine-Learning-Team board.
Jun 18 2024, 2:46 PM · Machine-Learning-Team

May 21 2024

calbon added a comment to T359140: 2024 Q4: Users can "pip install liftwing" and access 20% of models.

People can now pip install and use models. Right now we only have a few models - the number of models should increase over time.

May 21 2024, 2:49 PM · Goal, Machine-Learning-Team
calbon moved T363505: Pass the maximum number of uploads to the logo detection service from Unsorted to Watching on the Machine-Learning-Team board.
May 21 2024, 2:48 PM · Machine-Learning-Team, Structured-Data-Backlog
calbon moved T364089: Have problem with migrating to LiftWing from ores from Unsorted to Watching on the Machine-Learning-Team board.
May 21 2024, 2:48 PM · Machine-Learning-Team
calbon assigned T363505: Pass the maximum number of uploads to the logo detection service to kevinbazira.
May 21 2024, 2:47 PM · Machine-Learning-Team, Structured-Data-Backlog
calbon assigned T364089: Have problem with migrating to LiftWing from ores to isarantopoulos.
May 21 2024, 2:46 PM · Machine-Learning-Team
calbon moved T365226: Investigate a way to return other 2xx status code from predict in kserve from Unsorted to Backlog/Other on the Machine-Learning-Team board.
May 21 2024, 2:45 PM · Machine-Learning-Team
calbon assigned T365226: Investigate a way to return other 2xx status code from predict in kserve to achou.
May 21 2024, 2:44 PM · Machine-Learning-Team
calbon moved T365166: Update Pytorch base image to 2.3.0 from Unsorted to Ready To Go on the Machine-Learning-Team board.
May 21 2024, 2:34 PM · Machine-Learning-Team
calbon moved T365246: Upgrade Huggingface image to kserve 0.13-rc0 (torch 2.3.0 ROCm 6.0) from Unsorted to Ready To Go on the Machine-Learning-Team board.
May 21 2024, 2:34 PM · Machine-Learning-Team
calbon set the point value for T365246: Upgrade Huggingface image to kserve 0.13-rc0 (torch 2.3.0 ROCm 6.0) to 1.
May 21 2024, 2:33 PM · Machine-Learning-Team
calbon set the point value for T365166: Update Pytorch base image to 2.3.0 to 1.
May 21 2024, 2:33 PM · Machine-Learning-Team
calbon assigned T365253: Allow Kubernetes workers to be deployed on Bookworm to elukey.
May 21 2024, 2:32 PM · Machine-Learning-Team, serviceops, Kubernetes
calbon moved T365253: Allow Kubernetes workers to be deployed on Bookworm from Unsorted to Ready To Go on the Machine-Learning-Team board.
May 21 2024, 2:32 PM · Machine-Learning-Team, serviceops, Kubernetes
calbon set the point value for T365253: Allow Kubernetes workers to be deployed on Bookworm to 3.
May 21 2024, 2:31 PM · Machine-Learning-Team, serviceops, Kubernetes
calbon moved T365291: ml-serve2002 memory errors on DIMM_B1 from Unsorted to Watching on the Machine-Learning-Team board.
May 21 2024, 2:29 PM · SRE, Machine-Learning-Team, ops-codfw, DC-Ops
calbon moved T365439: Investigate why article-descriptions LiftWing API returns 404 when encoded colon is used in request URL from Unsorted to Watching on the Machine-Learning-Team board.
May 21 2024, 2:25 PM · Machine-Learning-Team
calbon added a comment to T362674: 2024 Q4 Goal: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services.
  • Calico improvements makes the whole workflow more streamlived
  • Improve our incident response procedure
  • Investigate CPU spikes
May 21 2024, 2:18 PM · Goal, Machine-Learning-Team
calbon added a comment to T362670: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU.
  • Still can't use GPU with ROCm. But we figured out what the bug is - if the control version is upgraded to Bookworm it will be fixed.
  • Next step is to upgrade ml-staging to Bookworm then test.
  • Working on upgrading HF with newer versions with ROCm 6.0. Tested them and they work and will be posting watch.
  • Goal is to utilize GPU so we can deploy models from HuggingFace.
May 21 2024, 2:16 PM · Goal, Machine-Learning-Team
calbon added a comment to T362672: 2024 Q4 Goal: Revert Risk models are supported by caching in production.
  • Trying to fix up a Calico networking issue in Kubernetes
    • After credentials, will send patched revert risk server to ml-staging
May 21 2024, 2:07 PM · Goal, Machine-Learning-Team

May 7 2024

calbon placed T360455: Add Article Quality Model to LiftWing up for grabs.
May 7 2024, 2:24 PM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team
calbon assigned T360455: Add Article Quality Model to LiftWing to kevinbazira.
May 7 2024, 2:23 PM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team
calbon added a comment to T362674: 2024 Q4 Goal: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services.
  • Narrowed down cause of symptoms of spike in CPU usage to feature extraction in revscoring isvc. Might be caused by some specific revids.
May 7 2024, 2:19 PM · Goal, Machine-Learning-Team
calbon added a comment to T362670: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU.
  • Wait for vendor (Supermicro) to finalize order of 2x for ml-staging.
    • Chris's guess is ml-staging installed at end of quarter
May 7 2024, 2:10 PM · Goal, Machine-Learning-Team
calbon added a comment to T362672: 2024 Q4 Goal: Revert Risk models are supported by caching in production.
  • Working on plumbing on staging, should be done within week
    • Feeling good about it
May 7 2024, 2:08 PM · Goal, Machine-Learning-Team

Apr 30 2024

calbon added a comment to T362674: 2024 Q4 Goal: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services.

Logging queries and logging when things are slow is the short term goal. Knowing WHY a query takes a long time is a future question

Apr 30 2024, 2:22 PM · Goal, Machine-Learning-Team
calbon added a comment to T362670: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU.

We have a theory that the ROCm drivers on the debian package is not required.

Apr 30 2024, 2:19 PM · Goal, Machine-Learning-Team
calbon added a comment to T362670: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU.

Decision point: Do we upgrade ROCm drivers?

Apr 30 2024, 2:15 PM · Goal, Machine-Learning-Team
calbon added a comment to T362670: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU.

Update: No update

Apr 30 2024, 2:14 PM · Goal, Machine-Learning-Team
calbon added a comment to T362672: 2024 Q4 Goal: Revert Risk models are supported by caching in production.
  • Rebased code after prototype.
    • Waiting for istio change for making a new service, which is imminent
    • Need to add new visual service that is tcp
Apr 30 2024, 2:13 PM · Goal, Machine-Learning-Team

Apr 25 2024

calbon moved T360455: Add Article Quality Model to LiftWing from Watching to Unsorted on the Machine-Learning-Team board.
Apr 25 2024, 5:07 PM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team

Apr 23 2024

calbon added a comment to T362670: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU.
  • GPU order for the first GPU 2x chassis is close to complete. There are some supply issues with the chassis, so the question is going to be if we want to use an upgraded chassis for the ml-staging server.
Apr 23 2024, 2:25 PM · Goal, Machine-Learning-Team
calbon added a comment to T362672: 2024 Q4 Goal: Revert Risk models are supported by caching in production.
  • Merged puppet machinery to allow network policies to be generated for assorted cluster. So we can automatically generated the network policy without the 60 lines of istio config.
  • Will merge change to network policy to allow Istio to talk to Cassandra.
Apr 23 2024, 2:18 PM · Goal, Machine-Learning-Team

Apr 16 2024

calbon renamed T359140: 2024 Q4: Users can "pip install liftwing" and access 20% of models from 2024 Q4: Lift Wing Python Package to 2024 Q4: Users can "pip install liftwing" and access 20% of models.
Apr 16 2024, 2:59 PM · Goal, Machine-Learning-Team
calbon added a project to T362674: 2024 Q4 Goal: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services: Goal.
Apr 16 2024, 2:58 PM · Goal, Machine-Learning-Team
calbon created T362674: 2024 Q4 Goal: Operational Excellence - Improve base monitoring, alerting and logging of Lift Wing services.
Apr 16 2024, 2:57 PM · Goal, Machine-Learning-Team
calbon renamed T359140: 2024 Q4: Users can "pip install liftwing" and access 20% of models from Q4: Lift Wing Python Package to 2024 Q4: Lift Wing Python Package.
Apr 16 2024, 2:57 PM · Goal, Machine-Learning-Team
calbon moved T348153: Q3 2024 Goal: Lift Wing users can request multiple predictions using a single request. from 2023-2024 Q4 Quarter Goals to Previous Quarter Goals on the Machine-Learning-Team board.
Apr 16 2024, 2:53 PM · Goal, Machine-Learning-Team
calbon moved T353333: Q3 2024 Goal: Implement caching for revertrisk-language-agnostic from 2023-2024 Q4 Quarter Goals to Previous Quarter Goals on the Machine-Learning-Team board.
Apr 16 2024, 2:53 PM · Goal, Machine-Learning-Team
calbon moved T353337: Q3 2024 Goal: Inference Optimization for Hugging face/Pytorch models from 2023-2024 Q4 Quarter Goals to Previous Quarter Goals on the Machine-Learning-Team board.
Apr 16 2024, 2:53 PM · Goal, Machine-Learning-Team
calbon moved T353338: Q3 2024 Goal: Expand Lift Wing Cluster and add GPU capacity to production from 2023-2024 Q4 Quarter Goals to Previous Quarter Goals on the Machine-Learning-Team board.
Apr 16 2024, 2:53 PM · Goal, Machine-Learning-Team
calbon moved T353814: Q3 2024 Goal: A plan for a training infrastructure from 2023-2024 Q4 Quarter Goals to Previous Quarter Goals on the Machine-Learning-Team board.
Apr 16 2024, 2:52 PM · Goal, Machine-Learning-Team
calbon renamed T348153: Q3 2024 Goal: Lift Wing users can request multiple predictions using a single request. from Goal: Lift Wing users can request multiple predictions using a single request. to Q3 2024 Goal: Lift Wing users can request multiple predictions using a single request..
Apr 16 2024, 2:52 PM · Goal, Machine-Learning-Team
calbon renamed T353333: Q3 2024 Goal: Implement caching for revertrisk-language-agnostic from Goal: Implement caching for revertrisk-language-agnostic to Q3 2024 Goal: Implement caching for revertrisk-language-agnostic.
Apr 16 2024, 2:52 PM · Goal, Machine-Learning-Team
calbon renamed T353337: Q3 2024 Goal: Inference Optimization for Hugging face/Pytorch models from Goal: Inference Optimization for Hugging face/Pytorch models to Q3 2024 Goal: Inference Optimization for Hugging face/Pytorch models.
Apr 16 2024, 2:51 PM · Goal, Machine-Learning-Team
calbon renamed T353338: Q3 2024 Goal: Expand Lift Wing Cluster and add GPU capacity to production from Goal: Expand Lift Wing Cluster and add GPU capacity to production to Q3 2024 Goal: Expand Lift Wing Cluster and add GPU capacity to production .
Apr 16 2024, 2:51 PM · Goal, Machine-Learning-Team
calbon renamed T353814: Q3 2024 Goal: A plan for a training infrastructure from Goal: A plan for a training infrastructure to Q3 2024 Goal: A plan for a training infrastructure .
Apr 16 2024, 2:51 PM · Goal, Machine-Learning-Team
calbon created T362672: 2024 Q4 Goal: Revert Risk models are supported by caching in production.
Apr 16 2024, 2:51 PM · Goal, Machine-Learning-Team
calbon moved T362671: ------ from 2023-2024 Q4 Quarter Goals to Task Archive on the Machine-Learning-Team board.
Apr 16 2024, 2:46 PM · Machine-Learning-Team
calbon closed T362671: ------ as Declined.
Apr 16 2024, 2:45 PM · Machine-Learning-Team
calbon created T362671: ------.
Apr 16 2024, 2:45 PM · Machine-Learning-Team
calbon created T362670: 2024 Q4 Goal: An HuggingFace 7B LLM is hosted on ml-staging on Lift Wing powered by GPU.
Apr 16 2024, 2:45 PM · Goal, Machine-Learning-Team

Mar 26 2024

calbon renamed T353333: Q3 2024 Goal: Implement caching for revertrisk-language-agnostic from Goal: Implement caching for revertrisk-multilingual to Goal: Implement caching for revertrisk-language-agnostic.
Mar 26 2024, 2:41 PM · Goal, Machine-Learning-Team
calbon added a comment to T353338: Q3 2024 Goal: Expand Lift Wing Cluster and add GPU capacity to production .

At risk because we don't have a GPU in the data centers yet.

Mar 26 2024, 2:40 PM · Goal, Machine-Learning-Team
calbon moved T360455: Add Article Quality Model to LiftWing from Unsorted to Watching on the Machine-Learning-Team board.
Mar 26 2024, 2:35 PM · Patch-For-Review, Content-Transform-Team, Research, Machine-Learning-Team
calbon moved T360593: Create an examples directory in the repository and add a basic README.md from Unsorted to Ready To Go on the Machine-Learning-Team board.
Mar 26 2024, 2:31 PM · Machine-Learning-Team
calbon moved T360637: Bump memory for registry[12]00[34] VMs from Unsorted to Ready To Go on the Machine-Learning-Team board.
Mar 26 2024, 2:27 PM · Patch-For-Review, serviceops, Machine-Learning-Team
calbon moved T360638: Create a Pytorch base image from Unsorted to Ready To Go on the Machine-Learning-Team board.
Mar 26 2024, 2:27 PM · Patch-For-Review, Machine-Learning-Team
calbon set the point value for T360638: Create a Pytorch base image to 3.
Mar 26 2024, 2:23 PM · Patch-For-Review, Machine-Learning-Team
calbon assigned T360894: Investigate temporary high latency in revscoring service for wikidata to klausman.
Mar 26 2024, 2:16 PM · Machine-Learning-Team
calbon moved T360990: drafttopic has two issue trackers from Unsorted to Backlog/Revscoring on the Machine-Learning-Team board.
Mar 26 2024, 2:15 PM · drafttopic-modeling, Machine-Learning-Team
calbon assigned T360990: drafttopic has two issue trackers to isarantopoulos.
Mar 26 2024, 2:14 PM · drafttopic-modeling, Machine-Learning-Team

Mar 19 2024

calbon moved T359879: SLO dashboards for Lift Wing showing unexpected values from Unsorted to Ready To Go on the Machine-Learning-Team board.
Mar 19 2024, 2:55 PM · Machine-Learning-Team, Observability-Metrics