[go: up one dir, main page]

Page MenuHomePhabricator

[25Q1] Production monitoring – logging
Open, MediumPublic

Description

Background

The disparity in beta and production environments gives rise to issues during troubleshooting, root cause analysis and subsequent fixes.

Improvements to Wikifunctions logging will help us obtain sufficient detail of API requests and responses that could help identify production issues.

Approach

  • It will be very beneficial to add logging/metrics for wikilambda-fetch to see whether this is contributing to slowing down function executions.
  • Consistently use WMF-standard for logging and adjust log dashboards accordingly.
  • Including request IDs in every log message of API requests could help us pinpoint problem areas better.

Acceptance Criteria/Success Metrics

  • Log messages that are easily readable and searchable on Logstash that are clear and informative
    • They point to where the error or message was raised.
    • The message includes insight into what triggered the current status.
    • They clearly inform us of what log level it is, whether just a warning, debug, stack trace, or legitimate error; and this standard and pattern is visible on both BE services.

Goals & Success Metrics

  • Improved ability to debug production issues and identify root causes.
    • Find actual use cases in Phab and provide the logging needed to debug, as a measure of success
      • We can effectively detect where requests lag or hang: T364413
      • Logging improvements help greater effort to improve overall performance: T367933

Event Timeline

Jdforrester-WMF renamed this task from [25Q1] Production monitoring: logging to [25Q1] Production monitoring – logging.Jul 8 2024, 1:56 PM