[25Q1] Production monitoring – logging
Open, MediumPublic
Actions

Assigned To

Authored By

	Sharvaniharan
	Jun 19 2024, 10:20 PM

Description

Background

The disparity in beta and production environments gives rise to issues during troubleshooting, root cause analysis and subsequent fixes.

Improvements to Wikifunctions logging will help us obtain sufficient detail of API requests and responses that could help identify production issues.

Approach

It will be very beneficial to add logging/metrics for wikilambda-fetch to see whether this is contributing to slowing down function executions.
Consistently use WMF-standard for logging and adjust log dashboards accordingly.
Including request IDs in every log message of API requests could help us pinpoint problem areas better.

Acceptance Criteria/Success Metrics

Log messages that are easily readable and searchable on Logstash that are clear and informative
- They point to where the error or message was raised.
- The message includes insight into what triggered the current status.
- They clearly inform us of what log level it is, whether just a warning, debug, stack trace, or legitimate error; and this standard and pattern is visible on both BE services.

Goals & Success Metrics

Improved ability to debug production issues and identify root causes.
- Find actual use cases in Phab and provide the logging needed to debug, as a measure of success
  - We can effectively detect where requests lag or hang: T364413
  - Logging improvements help greater effort to improve overall performance: T367933

Status	Assigned	Task
Open	ecarg	T368000 [25Q1] Production monitoring – logging
Resolved	ecarg	T364413 Improve the logging we're doing in the orchestrator and evaluator to have a better idea of where the slowness is coming from
Resolved	ecarg	T369001 Logging epic nice-to-have: Adding stack trace for every log output
Resolved	ecarg	T369213 Logging epic nice-to-have: add equivalent logs for docker logs
Resolved	ecarg	T369560 Check if message form is from an object type in Logger param
Resolved	cmassaro	T369956 Add logging data when db fetch in orchestrator
Resolved	Jdforrester-WMF	T370351 Replace old console.logs/errors that log to LogStash with 'throw error'
Duplicate	ecarg	T370910 Create (or activate) alert monitoring based on logs
In Progress	ecarg	T372847 Implement and add duration times for function calls
Resolved	Jdforrester-WMF	T374737 Add env var to switch on function_orchestrator_function_duration_seconds metrics in Prod
Resolved	ecarg	T372926 Audit/adjust log severity levels to team standard
Open	None	T369770 Migrate recent logs in BE services into new metrics