We can probably just store them in a table, perhaps:
- function ZID
- implementation ZID
- implementation revision
- tester ZID
- tester revision
- result bool
- result details (full returned JSON object, with e.g. timing data or failure reason)
This way we can quickly index on passing/not passing, and when the implementation or tester is edited we can dump the old data once the new revision has an updated result. I don't think we'd need to expose the tester meta-data (e.g. how fast it ran) in a query – "show me all tests that took longer than 100ms" or whatever can be an offline query, doesn't need to be a production feature.