Introduce reusable query buffer for client reads #13488

sundb · 2024-08-26T08:41:24Z

This PR is based on the commits from PR valkey-io/valkey#258, valkey-io/valkey#593, valkey-io/valkey#639

This PR optimizes client query buffer handling in Redis by introducing
a reusable query buffer that is used by default for client reads. This
reduces memory usage by ~20KB per client by avoiding allocations for
most clients using short (<16KB) complete commands. For larger or
partial commands, the client still gets its own private buffer.

The primary changes are:

Adding a reusable query buffer thread_shared_qb that clients use by default.
Modifying client querybuf initialization and reset logic.
Freeing idle client query buffers when empty to allow reuse of the reusable query buffer.
Master client query buffers are kept private as their contents need to be preserved for replication stream.
When nested commands is executed, only the first user uses the reuse buffer, and subsequent users will still use the private buffer.

In addition to the memory savings, this change shows a 3% improvement in
latency and throughput when running with 1000 active clients.

The memory reduction may also help reduce the need to evict clients when
reaching max memory limit, as the query buffer is the main memory
consumer per client.

This PR is different from valkey-io/valkey#258

When a client is in the mid of requiring a reused buffer and returning it, regardless of whether the query buffer has changed (expanded), we do not update the reused query buffer in the middle, but return the reused query buffer (expanded or with data remaining) or reset it at the end.
Adding a new thread variable thread_shared_qb_used to avoid multiple clients requiring the reusable query buffer at the same time.

Signed-off-by: Uri Yagelnik uriy@amazon.com
Signed-off-by: Madelyn Olson madelyneolson@gmail.com
Co-authored-by: Uri Yagelnik uriy@amazon.com
Co-authored-by: Madelyn Olson madelyneolson@gmail.com

This PR optimizes client query buffer handling in Valkey by introducing a shared query buffer that is used by default for client reads. This reduces memory usage by ~20KB per client by avoiding allocations for most clients using short (<16KB) complete commands. For larger or partial commands, the client still gets its own private buffer. The primary changes are: * Adding a shared query buffer `shared_qb` that clients use by default * Modifying client querybuf initialization and reset logic * Copying any partial query from shared to private buffer before command execution * Freeing idle client query buffers when empty to allow reuse of shared buffer * Master client query buffers are kept private as their contents need to be preserved for replication stream In addition to the memory savings, this change shows a 3% improvement in latency and throughput when running with 1000 active clients. The memory reduction may also help reduce the need to evict clients when reaching max memory limit, as the query buffer is the main memory consumer per client. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>

redis#593) Test `query buffer resized correctly` start to fail (https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after valkey-io/valkey#258 PR. With Jemalloc we allocate ~20K for the query buffer, in the test we read 1 byte in the first read, in the second read we make sure we have at least 16KB free place in the query buffer and we have as Jemalloc allocated 20KB, But with non jemalloc we allocate in the first read exactly 16KB. in the second read we check and see that we don't have 16KB free space as we already read 1 byte hence we reallocate this time greedly (*2 of the requested size of 16KB+1) hence the test condition that the querybuf size is < 32KB is no longer true The `query buffer resized correctly test` starts [failing](https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after PR #258 . With jemalloc, we allocate ~20KB for the query buffer. In the test, we read 1 byte initially and then ensure there is at least 16KB of free space in the buffer for the second read, which is satisfied by jemalloc's 20KB allocation. However, with non-jemalloc allocators, the first read allocates exactly 16KB. When we check again, we don't have 16KB free due to the 1 byte already read. This triggers a greedy reallocation (doubling the requested size of 16KB+1), causing the query buffer size to exceed the 32KB limit, thus failing the test condition. This PR adjusted the test query buffer upper limit to be 32KB +2. Signed-off-by: Uri Yagelnik <uriy@amazon.com>

…s using

…_qb_used" ProcessingEventsWhileBlocked is not thread safe This reverts commit 9d70fa3.

Co-authored-by: oranagra <oran@redislabs.com>

We've been seeing some pretty consistent failures from `test-valgrind-test` and `test-sanitizer-address` because of the querybuf test periodically failing. I tracked it down to the test periodically taking too long and the client cron getting triggered. A simple solution is to just disable the cron during the key race condition. I was able to run this locally for 100 iterations without seeing a failure. Example: https://github.com/valkey-io/valkey/actions/runs/9474458354/job/26104103514 and https://github.com/valkey-io/valkey/actions/runs/9474458354/job/26104106830. Signed-off-by: Madelyn Olson <matolson@amazon.com>

fcostaoliveira · 2024-08-26T11:53:53Z

CE Performance Automation : step 1 of 2 (build) DONE.

This comment was automatically generated given a benchmark was triggered.
Started building at 2024-08-26 13:49:05.938948 and took 86 seconds.
You can check each build/benchmark progress in grafana:

git hash: 95a5d9e
git branch: sundb:shared_qb
commit date and time: n/a
commit summary: n/a
test filters:
- command priority lower limit: 0
- command priority upper limit: 10000
- test name regex: .*
- command group regex: .*

You can check a comparison in detail via the grafana link

fcostaoliveira · 2024-08-26T11:55:19Z

CE Performance Automation : step 2 of 2 (benchmark) FINISHED.

This comment was automatically generated given a benchmark was triggered.

Started benchmark suite at 2024-10-19 17:38:24.397375 and took 5012.352847 seconds to finish.
Status: [################################################################################] 100.0% completed.

In total will run 135 benchmarks.
- 0 pending.
- 135 completed:
- 0 successful.
- 135 failed.
You can check a the status in detail via the grafana link

oranagra · 2024-09-04T10:41:47Z

@sundb i already reviewed this content, right? is there something different than what i already approved?
what's holding this back?

sundb · 2024-09-04T10:46:00Z

@oranagra there are no changes after your last review, we can merge it.

### New Features in binary distributions - 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter, Count-min sketch, Top-k, t-digest - Redis scalable query engine (including vector search) ### Potentially breaking changes - #12272 `GETRANGE` returns an empty bulk when the negative end index is out of range - #12395 Optimize `SCAN` command when matching data type ### Bug fixes - #13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed - #13489 `ACL CAT` - return module commands - #13476 Fix a race condition in the `cache_memory` of `functionsLibCtx` - #13473 Fix incorrect lag due to trimming stream via `XTRIM` command - #13338 Fix incorrect lag field in `XINFO` when tombstone is after the `last_id` of the consume group - #13470 On `HDEL` of last field - update the global hash field expiration data structure - #13465 Cluster: Pass extensions to node if extension processing is handled by it - #13443 Cluster: Ensure validity of myself when loading cluster config - #13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array ### Modules API - #13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and `RM_RegisterDefragCallbacks` - defrag API to allocate and free raw memory ### Performance and resource utilization improvements - #13503 Avoid overhead of comparison function pointer calls in listpack `lpFind` - #13505 Optimize `STRING` datatype write commands - #13499 Optimize `SMEMBERS` command - #13494 Optimize `GEO*` commands reply - #13490 Optimize `HELLO` command - #13488 Optimize client query buffer - #12395 Optimize `SCAN` command when matching data type - #13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands - #13516 Optimize `LRANGE` and other commands that perform several writes to client buffers per call - #13431 Avoid `used_memory` contention when updating from multiple threads ### Other general improvements - #13495 Reply `-LOADING` on replica while flushing the db ### CLI tools - #13411 redis-cli: Fix wrong `dbnum` showed after the client reconnected ### Notes - No backward compatibility for replication or persistence. - Additional distributions, upgrade paths, features, and improvements will be introduced in upcoming pre-releases. - With the GA release of 8.0 we will deprecate Redis Stack.

uriyage and others added 19 commits August 20, 2024 15:14

New way to use shared query buffer

4723b01

Add license

c4b3090

Fix missing clear c->querybuf

908268d

Skip shared query buffer for client list info

ebf45de

Revert code style

b50f5cc

Allocate PROTO_IOBUF_LEN for querybuf that can't use the shared qb

0569168

If the client is using shared qb, we also count it as the memory it i…

7125bc6

…s using

Stablize querybuffer test

1c37cc5

Add test

9b8d07e

Use ProcessingEventsWhileBlocked instead adding thread_shared_qb_used

9d70fa3

Revert "Use ProcessingEventsWhileBlocked instead adding thread_shared…

0b64a50

…_qb_used" ProcessingEventsWhileBlocked is not thread safe This reverts commit 9d70fa3.

Improve tests

e5a4a67

Co-authored-by: oranagra <oran@redislabs.com>

Add comment for big argv

3ebb1b3

Co-authored-by: oranagra <oran@redislabs.com>

Replace shared concept with reusable

e472780

Rename remain shared to reusable in comments

630c739

Rename remain shared to reusable in comments

5ea62d0

sundb added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Aug 26, 2024

sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Aug 26, 2024

Rename thread_shared_qb* to thread_reusable_qb*

95a5d9e

oranagra approved these changes Sep 4, 2024

View reviewed changes

sundb merged commit ea3e8b7 into redis:unstable Sep 4, 2024
14 checks passed

sundb mentioned this pull request Sep 5, 2024

Improve performance by using a static argv array in client structure #12730

Open

YaacovHazan mentioned this pull request Sep 11, 2024

Redis 8.0 M01 #13538

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce reusable query buffer for client reads #13488

Introduce reusable query buffer for client reads #13488

Introduce reusable query buffer for client reads #13488

Introduce reusable query buffer for client reads #13488

Conversation

CE Performance Automation : step 1 of 2 (build) DONE.

CE Performance Automation : step 2 of 2 (benchmark) FINISHED.