[go: up one dir, main page]

Page MenuHomePhabricator

[25Q1] Performance improvements
Open, MediumPublic

Description

Background

Long function runtimes, particularly functions which fail due to timeout are impacting user experience on Wikifunctions.

This problem is because of two contributing factors:
[1] Lack of control over runtime resources, like not being able to allocate the memory we need to execute on priority, which is beyond our control.
[2] Some functions are doing a lot of work with many rest based calls which slow them down

This epic focuses on the second issue [2], and will contain tasks to identify and improve areas within our function calls that can be done more efficiently.

This task is not for

  • Re-architecting code to completely eliminate usage of REST calls.
  • Frequency of function calls and recency are not measurable with just a cache in backend but solution might be not feasible as part of this work
  • The spikes we run could result in many possibilities for improvement. We might not have time to get to all of them as part of this work, but our spike should be thorough.

Approach

  • We will run a spike to identify potential areas of improvement
  • We have identified that there are few areas that could be improved by restructuring our code as part of this work
  • We will implement the initial ideas identified for enhancing cache management and integrate the insights gained from our Q4 metrics analysis for the same.
  • We will continue cache improvements

Acceptance Criteria

  • Based on the spike, we have a full understanding of the underlying issues - both fixable and not feasible ones, and document it.
  • We have fixed at least one major area where function execution code was doing more work than required.
  • Once fixes are in, reach out to 3 users to see if they’ve noticed a change.

Goals & Success Metrics

  • Fixing one major area where function execution is doing too much work results in X number of functions running faster.
  • We identify a set of functions (e.g. deploy tests) which will not change, and measure their runtimes before and after making the performance improvements; we would like to see the average runtimes decrease by N%
  • We have heard positive feedback from the users we reached out to about our performance improvements.

Event Timeline