-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding Worker and BackgroundService instances #96
Comments
Hello, first of all, thank you for your interest in the project and for taking the time to ask these kind of questions.
You may run several services on the same worker, but indeed, a worker is a single thread so only one function call can do work at a given time. The exception would be yielding, which may occur at any
Disposing the worker kills the background services, so disposing them should only be neccessary if you plan on keeping the worker. The need for worker disposal is crucial for applications that load dynamic code (an
Like stated in the first answer, it's not a 1-1 relationship, and I think it would be unwise to change that. Even if its a niche use case, this entire project is very niche. That said, most of the start up code is already stowed away in extension methods and creating a new one that creates a worker implicitly would not be very difficult, even in user code. But may require some small intermediate object I guess.
Indeed serializing is very expensive. There is always the option of using the core module and optimize serialization for your application. But the general rule I think should be that you must avoid freezing the UI at all cost, so any task that has a cycle slower than 200ms could be a candidate, but like you said, you must be able to serialize in less time than that. Also the 2x serialize/deserialize adds to overall execution time of course. The top layer (backgroundservice) is developed with usability rather than speed in mind. Serialization of expressions is nice to use but slow and the messages have a lot of overhead. It has a lot of alternatives, I'm planning to create a second, alternative layer that uses Roslyn at compile time instead, and use message pack for serialization. Regarding caching, you would have to use something universally available like indexed DB or the |
Hi! Thanks for the detailed response. You answered my questions quite clearly and I don't really have any follow ups on that. Regarding practical problems, indeed serializing is very expensive. I very much like what this library brings and the ease of use, but I may have underestimated the limitations that even the underlaying Javascript Web Worker API enforces. Such limitations are of course the need for serialization + deserialization everytime, no access to DOM and/or global window/document object of javascript environment. These things vastly limit what use we can get out of this library unfortunately. Let me explain briefly our scenario. To Summarize:
Even if using the Core library may give some less overhead, we still cannot avoid the issue mentioned above. I'm not looking for a magical solution but if you have any suggestions on how I could mediate any of the issues, please, I am all ears. |
I'm curious to why you say that data must be deserialized on the main thread and why it cannot be avoided. You can try optimizing by downloading data directly to something that can be shared by everyone (indexed db or Cache api). That way, you could theoretically do the deserialiation in parallel (on worker(s) and main thread) But I cannot tell if it's too much work or even if its going to pay off. Also indexeddb is not super easy to work with. |
Sorry I may have left out some details. It is essential for the functionality of our web app that we have this data in memory as it may need to be accessed depending on user interactions. This data of ours contains both meta data and geometrical primitives. These primitives are in 2D in our data but with a simple offset we may calculate the extruded 3D shapes, which is what we visualize in our web app and which is what takes quite long to calculate.
I am not familiar with either of these at all so I don't think at the moment we have the time to investigate it further unfortunately with our release coming up rather quickly. Ultimately I believe the best solution for us, if we would like to get our solution working with workers, would be to perhaps split our model, either physically or "virtually" (for example geometrical data separately delivered from meta data from API) in order for us to avoid some deserialization+serialization overhead. That way we could perhaps, as you say,only deserialize the heavy stuff on the workers where the data would be needed, and never deserialize the complex shapes on the main thread. The main thread would just need to receive the calculated 3D geometry data in the end in order for it to be passed on to JS. A lot of hoops to jump through in the end, not impossible but not trivial |
You could probably serialize from the worker and send directly to the js main thread without doing yet another deserialize - serialize on the dotnet running on main. Not sure how much it will save you though. have you tried to measure the time spent in (de/)serialization ? Might be some flag I could provide for debugging in a future version, I'm not sure how easy it is to set up performance counters |
Hmm yea that is true, probably would not need to deserialize it in between on dotnet main as long as I send it in simple enough form for it to be easily deserializable in javascript (float arrays). I have tried measuring the deserialization time and serialization using Stopwatch. It depends quite a lot on the data as I grouped it by "layer" (PCB CAD layers). One layer may contain a few shapes to thousands and thousands. The general trend though was more than 200ms, sometimes up to 700ms for deserialization in worker. And that is of course without the transfer overhead (if such exists?). In the end, I have not yet managed to group it in any way that would actually improve the speed of the loading, probably the overheads overweigh the benefit from parallisation. |
I was wondering if you'd considered using MemoryPack for serialization? - it's an order of magnitude faster that JSON (it essentially sends over something similar to the underlying in-memory representation of the data). As you'll be opening it up in .Net you'll also get all the advantages of the speed increase upon deserialization too. The transfer overhead remains of course. |
I have never actually heard of MemoryPack if I am being completely honest with you. I will surely look it up no that you mentioned i, but I must admit that the use of workers in Blazor is a project that we have shelved rn ahead of other more important topics.. Kind of hoping that some Blazor WASM "native" multithreading will get released before we get around to it XD |
I have a few questions regarding the use of
IWorker
instances andIWorkerBackgroundService
instances.Worker
instance, is there any point in keeping the worker instance around? Can I for example spawn multiple BackgroundServices from the same Worker instance? If so, I suppose they can only run once at a time, since they are on the same Worker?IWorker
instance sufficient or do I separately need to dispose of theIWorkerBackgroundService
s? (BTW is there an actual need for disposing workers in practise?)Had high hopes to improve performance in our application using workers but I am starting to see some of the limitations really becoming problems. For example, the cost of serialization + deserialization to pass data may at times overweigh the benefit of doing calculations in parallel, probably depending on the complexitiy of the task. I got some PoC code working with our app but managed to decrease performance by quite a lot... I guess due to massive amounts of serialization work as well as unability to utilize our cache well enough (as it would be global and workers cannot access global things)
The text was updated successfully, but these errors were encountered: