Programs & Events Dashboard has been going down repeatedly in the last several weeks. Even after moving the database to a separate server, the system has encounted repeated cases of database corruption, which may be related to the data update cycle for programs with very large numbers of edits.
Current status
The system appears to be stable when updates for the largest, slowest programs are disabled. Sage Ross is investigating options for bringing back updates for these programs.
Initial report
VPS: programs-and-events-dashboard.globaleducation.eqiad.wmflabs
Error
Accessing the service yields:
"This website is under heavy load (queue full)
We're sorry, too many people are accessing this website at the same time. We're working on this problem. Please try again later."
Impact
Wikimedia program organizers can use the tool to track impact.
Notes
I understand this is a cloud service and not MW core. It is still a "production error" in my understanding, as it's not a functionality bug but an operational failure (itself possibly caused by a bug, but that's TBD) of a mature service many volunteers rely on for their daily work.