Discussion: Diagnosis accuracy & how to deal with hidden state, closures and more... #33

foreseaz · 2022-09-27T07:34:55Z

Hi memlab devs, I have a question on how memlab deals with the hidden state that cannot be captured by simply taking heap snapshots. As we know that, many native methods (implemented in C++) do not expose their state to JavaScript heap snapshots. ~~For example~~ Besides, the variables in closures cannot be easily diagnosed because of scoping. BLeak paper has a solution for this by rewriting the closure variables. https://github.com/plasma-umass/BLeak/blob/master/src/lib/closure_state_transform.ts As far as I go through the memlab codebase, I haven't seen the logic related to it. Will this issue affect the ~~detection~~ diagnosis accuracy? Is this kind of feature on the roadmap?

JacksonGL · 2022-09-27T16:10:43Z

Looks like the following two statements are talking about two different problems:

As we know that, many native methods (implemented in C++) do not expose their state to JavaScript heap snapshots.

MemLab is not designed to detect memory leaks in browser/JS engine/native library. I think rewriting JavaScript probably won't expose C++ native state.

For example, the variables in closures cannot be easily diagnosed because of scoping.

Heap snapshot captures shared context (although less intuitive for debugging). Correct me if I am wrong, the purpose of rewriting the closure variables in BLeak is to collect additional information to assist debugging, not to improve detection accuracy.

Instrumentation or rewriting may bring subtle correctness issue, introduce new memory leaks that doesn't exist in the original code, or hide memory leaks from the original code. This actually decreases detection accuracy. I would be interested to see a more concrete example where the heap snapshot fails to capture a memory leak in JavaScript heap. If that happens, we might want to file a bug report or feature request to Chromium or V8.

foreseaz · 2022-09-27T23:52:30Z

Thank @JacksonGL for the clarification and answers. Sorry for the confusion and let me rephrase my questions a little bit, my questions are in the context of a more accurate diagnosis, instead of detection.

JacksonGL · 2022-09-28T00:10:15Z

@foreseaz If you are talking specifically about the closure scope, I agree the closure context in heap snapshot is less intuitive for debugging. But in terms of accuracy, the same argument also applies to debugging, since the heap snapshot faithfully records what's happening in JS heap. Instrumentation may change heap layout and bring extra noises that confuse the debugging process.

Most of the time shared context in the retainer trace actually includes the information to narrow down to the source location. For example, each closure object has a context field, which contains a scope_info field, it contains the closure variable defined in the closure scope that defines the closure instance, there is also a previous field pointing to the previous closure context in the closure scope chain.

About JS instrumentation, can you share a small code example on what kind of instrumentation is on your mind and explain why it would help debugging?

mdblr · 2022-09-28T19:04:31Z

Yes, that will increase the memory foot print of the program. However, Bleak collects heap snapshots, detects leaks, and finds the leaks paths prior to changing the memory of the program in order to avoid this.

Although we are aware that Memlab focuses on front-end javascript, we are exploring how we may be able to leverage it to use with nodejs programs.

In this context, code rewriting and memory mirroring would have been especially helpful for leaks obfuscated by objects like anonymous or external. In theory, we would see something like what the paper includes:

37 body.jQuery223056319336220622061.events.mouseup in closure of $widgetContent.__proto__.mwheelIntent

I do not currently have a code block or heap snapshot that I can share with you since they are work related. Nevertheless, the BLeak paper reports that it outperformed other metrics used for growth reduction at the time, which the metaprogramming contributed to.

All this is to say, we're just interested in leveraging Memlab if we can to make it easier to debug the situations I mentioned earlier.

Thanks for sharing your insights into this.

JacksonGL · 2022-09-30T21:00:23Z

@mdblr Bleak seems to intercept and rewrite JavaScript with mitmproxy. MemLab could potentially do that by intercepting and rewriting JavaScript in puppeteer for front-end JS leak detection and debugging. One ideas is to inject source information as a special closure variable in closure context, so that it's easier to link back to the source location when debugging.

For Node.js that's a different story, you may need to take heap snapshots in Node.js by yourself and then pass the snapshots to MemLab for heap analysis. One potential idea is to overwrite require so that you can intercept all require calls and rewrite JavaScript before the modules were loaded by node.js (I did something similar before for a different project) and it's feasible.

MemLab doesn't have any "front-end" that drives a node.js program (like something using puppeteer to drive browser). Let us know what kind of APIs you would like for better Node.js support if you have any ideas.

foreseaz changed the title ~~Discussion: Detection accuracy & how to deal with hidden state~~ Discussion: Diagnosis accuracy & how to deal with hidden state Sep 27, 2022

foreseaz changed the title ~~Discussion: Diagnosis accuracy & how to deal with hidden state~~ Discussion: Diagnosis accuracy & how to deal with hidden state, closures and more... Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Diagnosis accuracy & how to deal with hidden state, closures and more... #33

Discussion: Diagnosis accuracy & how to deal with hidden state, closures and more... #33

Discussion: Diagnosis accuracy & how to deal with hidden state, closures and more... #33

Discussion: Diagnosis accuracy & how to deal with hidden state, closures and more... #33

Comments