[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce grpclb re-resolution with fix #14438

Conversation

AspirinSJL
Copy link
Member
@AspirinSJL AspirinSJL commented Feb 15, 2018

Reverts #14412

The new commit fixes several issues with the old test:

  1. When testing the re-resolution caused by a balancer disconnection, we want the old balancer to have sent an initial response before the backend failure, so that the backend failure doesn't cause any unexpected re-resolution. But by default, the balancer in our test doesn't send initial response unless the client load reporting is enabled (which should be fixed in another PR), so I have added a new test config UpdatesWithClientLoadReportingTest for ReresolveDeadBalancer.
  2. Since re-resolution is introduced, it may happen in the test that a single balancer has two pending BalanceLoad() invocations, each in a separate thread but sharing the same balancer instance. During shutdown, we call NotifyDoneWithServerlists() to notify the condition change to servserverlist_ready_ to unblock BalanceLoad(). But resetting servserverlist_ready_ in BalanceLoad() after the notification will cause the second BalanceLoad() thread waiting forever. So the resetting is removed. I have run GRPC_POLL_STRATEGY=epoll1 tools/run_tests/run_tests.py -l c++ -r 'grpclb_end2end_test' -n inf -c opt -a 8 overnight (>239k runs). No new flake is introduced.

===============================================

Debugging this timeout issue is like demystifying how condition variable works. Below is the reasoning about the fix, for future reference.

Basically, we should remove the resetting anyway, because it makes more sense to not reset. Once we are done with the serverlist, we are done with the serverlist.

If we remove the resetting, the timeout issue can also be fixed.

And the reason why using notify_all() but keeping the resetting can't fix the timeout flake is that the threads waiting for the notification still need to wake up one by one, since the first one resets the condition before releasing the mutex, the second one will wake up with the condition being false.

Also, if we keep the resetting, but increase the number of NotifyDoneWithServerlists() which (sets condition + notifies), the flakiness will decrease but still exist. That's because of the idempotency of NotifyDoneWithServerlists(). No matter how many NotifyDoneWithServerlists() we add, it's possible that the first thread wakes up after all of them. And unfortunately, the first thread resets the condition before releasing the mutex. Even though the second thread wakes up multiple times, the condition is always false when it wakes up.

@grpc-testing
Copy link
****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +9.1%     +45 src/core/ext/filters/client_channel/lb_policy.cc                                         +45  +9.1%
      +883%     +53 grpc_lb_policy_set_reresolve_closure_locked                                              +53  +883%

 -------------- SHRINKING                                                                            --------------
  -0.0%    -117 [None]                                                                               -1.28Ki  -0.0%
      -0.0%     -93 [Unmapped]                                                                           -1.25Ki  -0.0%
      -9.1%      -8 glb_lb_policy_vtable                                                                      -8  -9.1%
      -9.1%      -8 pick_first_lb_policy_vtable                                                               -8  -9.1%
      -9.1%      -8 round_robin_lb_policy_vtable                                                              -8  -9.1%
  -2.1%    -112 src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc                  -112  -2.1%
      [DEL]    -138 start_picking_locked(pick_first_lb_policy*) [clone .isra.1]                             -138  [DEL]
      [DEL]    -112 pf_set_reresolve_closure_locked                                                         -112  [DEL]
      [DEL]     -89 destroy_unselected_subchannels_locked(pick_first_lb_policy*) [clone .isra.2]             -89  [DEL]
  -1.7%    -112 src/core/ext/filters/client_channel/lb_policy/round_robin/round_robin.cc                -112  -1.7%
      [DEL]    -112 rr_set_reresolve_closure_locked                                                         -112  [DEL]
      [DEL]    -105 start_picking_locked(round_robin_lb_policy*) [clone .isra.1]                            -105  [DEL]
  -0.5%     -80 src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc                           -80  -0.5%
      [DEL]    -801 build_lb_channel_args(grpc_lb_addresses const*, grpc_core::FakeResolverResponseGener    -801  [DEL]
      [DEL]    -303 update_lb_connectivity_status_locked                                                    -303  [DEL]
      [DEL]    -210 extract_backend_addresses_locked(grpc_lb_addresses const*) [clone .isra.2]              -210  [DEL]
      [DEL]    -145 lb_call_data_shutdown(glb_lb_policy*) [clone .isra.4]                                   -145  [DEL]
      [DEL]    -143 glb_set_reresolve_closure_locked                                                        -143  [DEL]
      [DEL]     -98 on_rr_connectivity_changed_locked                                                        -98  [DEL]
      -2.6%     -40 rr_handover_locked(glb_lb_policy*) [clone .part.17]                                      -40  -2.6%
      -3.9%     -32 glb_shutdown_locked                                                                      -32  -3.9%
      -2.2%     -16 glb_update_locked                                                                        -16  -2.2%
      -3.0%      -8 glb_destroy                                                                               -8  -3.0%
      -2.0%      -8 glb_pick_locked                                                                           -8  -2.0%
      -7.4%      -8 glb_ping_one_locked                                                                       -8  -7.4%
      -2.9%      -8 glb_lb_channel_on_connectivity_changed_cb                                                 -8  -2.9%
      -0.9%      -4 [Unmapped]                                                                                -4  -0.9%
     -25.0%      -3 glb_check_connectivity_locked                                                             -3 -25.0%
     -25.0%      -3 glb_notify_on_state_change_locked                                                         -3 -25.0%

  -0.0%    -376 TOTAL                                                                                -1.53Ki  -0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



Copy link
Member
@vjpai vjpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix PR title to indicate that this is the fix PR, and update the PR description to say the changes since the last attempt. Approval for changes in test/cpp/end2end

@grpc-testing
Copy link
[trickle] No significant performance differences

@AspirinSJL AspirinSJL changed the title Revert "Revert "Revert "Revert "grpclb re-resolution"""" Readd grpclb re-resolution with fix Feb 15, 2018
@grpc-testing
Copy link
****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +9.1%     +45 src/core/ext/filters/client_channel/lb_policy.cc                                         +45  +9.1%
      +883%     +53 grpc_lb_policy_set_reresolve_closure_locked                                              +53  +883%

 -------------- SHRINKING                                                                            --------------
  -0.0%    -117 [None]                                                                               -1.28Ki  -0.0%
      -0.0%     -93 [Unmapped]                                                                           -1.25Ki  -0.0%
      -9.1%      -8 glb_lb_policy_vtable                                                                      -8  -9.1%
      -9.1%      -8 pick_first_lb_policy_vtable                                                               -8  -9.1%
      -9.1%      -8 round_robin_lb_policy_vtable                                                              -8  -9.1%
  -2.1%    -112 src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc                  -112  -2.1%
      [DEL]    -138 start_picking_locked(pick_first_lb_policy*) [clone .isra.1]                             -138  [DEL]
      [DEL]    -112 pf_set_reresolve_closure_locked                                                         -112  [DEL]
      [DEL]     -89 destroy_unselected_subchannels_locked(pick_first_lb_policy*) [clone .isra.2]             -89  [DEL]
  -1.7%    -112 src/core/ext/filters/client_channel/lb_policy/round_robin/round_robin.cc                -112  -1.7%
      [DEL]    -112 rr_set_reresolve_closure_locked                                                         -112  [DEL]
      [DEL]    -105 start_picking_locked(round_robin_lb_policy*) [clone .isra.1]                            -105  [DEL]
  -0.5%     -80 src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc                           -80  -0.5%
      [DEL]    -801 build_lb_channel_args(grpc_lb_addresses const*, grpc_core::FakeResolverResponseGener    -801  [DEL]
      [DEL]    -303 update_lb_connectivity_status_locked                                                    -303  [DEL]
      [DEL]    -210 extract_backend_addresses_locked(grpc_lb_addresses const*) [clone .isra.2]              -210  [DEL]
      [DEL]    -145 lb_call_data_shutdown(glb_lb_policy*) [clone .isra.4]                                   -145  [DEL]
      [DEL]    -143 glb_set_reresolve_closure_locked                                                        -143  [DEL]
      [DEL]     -98 on_rr_connectivity_changed_locked                                                        -98  [DEL]
      -2.6%     -40 rr_handover_locked(glb_lb_policy*) [clone .part.17]                                      -40  -2.6%
      -3.9%     -32 glb_shutdown_locked                                                                      -32  -3.9%
      -2.2%     -16 glb_update_locked                                                                        -16  -2.2%
      -3.0%      -8 glb_destroy                                                                               -8  -3.0%
      -2.0%      -8 glb_pick_locked                                                                           -8  -2.0%
      -7.4%      -8 glb_ping_one_locked                                                                       -8  -7.4%
      -2.9%      -8 glb_lb_channel_on_connectivity_changed_cb                                                 -8  -2.9%
      -0.9%      -4 [Unmapped]                                                                                -4  -0.9%
     -25.0%      -3 glb_check_connectivity_locked                                                             -3 -25.0%
     -25.0%      -3 glb_notify_on_state_change_locked                                                         -3 -25.0%

  -0.0%    -376 TOTAL                                                                                -1.53Ki  -0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
[trickle] No significant performance differences

@grpc-testing
Copy link
[microbenchmarks] Performance differences noted:
Benchmark                            cpu_time    real_time
-----------------------------------  ----------  -----------
BM_ErrorStringOnNewError<ErrorNone>  +9%         +9%

@dgquintas dgquintas changed the title Readd grpclb re-resolution with fix Reintroduce grpclb re-resolution with fix Feb 15, 2018
@AspirinSJL
Copy link
Member Author

@vjpai Done.

@grpc-testing
Copy link
[microbenchmarks] Performance differences noted:
Benchmark                                                                              cpu_time    real_time
-------------------------------------------------------------------------------------  ----------  -----------
BM_ErrorGetPresentInt                                                                  +12%        +12%
BM_HasClearGrpcStatus<ErrorWithGrpcStatus>                                             -10%        -10%
BM_StreamingPingPong<InProcess, NoOpMutator, NoOpMutator>/262144/2                     -4%         -4%
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/262144/2/0  -5%         -5%

@grpc-testing
Copy link
****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +9.1%     +45 src/core/ext/filters/client_channel/lb_policy.cc                                         +45  +9.1%
      +883%     +53 grpc_lb_policy_set_reresolve_closure_locked                                              +53  +883%

 -------------- SHRINKING                                                                            --------------
  -0.0%    -117 [None]                                                                               -1.29Ki  -0.0%
      -0.0%     -93 [Unmapped]                                                                           -1.26Ki  -0.0%
      -9.1%      -8 glb_lb_policy_vtable                                                                      -8  -9.1%
      -9.1%      -8 pick_first_lb_policy_vtable                                                               -8  -9.1%
      -9.1%      -8 round_robin_lb_policy_vtable                                                              -8  -9.1%
  -2.1%    -112 src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc                  -112  -2.1%
      [DEL]    -138 start_picking_locked(pick_first_lb_policy*) [clone .isra.1]                             -138  [DEL]
      [DEL]    -112 pf_set_reresolve_closure_locked                                                         -112  [DEL]
      [DEL]     -89 destroy_unselected_subchannels_locked(pick_first_lb_policy*) [clone .isra.2]             -89  [DEL]
  -1.7%    -112 src/core/ext/filters/client_channel/lb_policy/round_robin/round_robin.cc                -112  -1.7%
      [DEL]    -112 rr_set_reresolve_closure_locked                                                         -112  [DEL]
      [DEL]    -105 start_picking_locked(round_robin_lb_policy*) [clone .isra.1]                            -105  [DEL]
  -0.5%     -80 src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc                           -80  -0.5%
      [DEL]    -801 build_lb_channel_args(grpc_lb_addresses const*, grpc_core::FakeResolverResponseGener    -801  [DEL]
      [DEL]    -303 update_lb_connectivity_status_locked                                                    -303  [DEL]
      [DEL]    -210 extract_backend_addresses_locked(grpc_lb_addresses const*) [clone .isra.2]              -210  [DEL]
      [DEL]    -145 lb_call_data_shutdown(glb_lb_policy*) [clone .isra.4]                                   -145  [DEL]
      [DEL]    -143 glb_set_reresolve_closure_locked                                                        -143  [DEL]
      [DEL]     -98 on_rr_connectivity_changed_locked                                                        -98  [DEL]
      -2.6%     -40 rr_handover_locked(glb_lb_policy*) [clone .part.17]                                      -40  -2.6%
      -3.9%     -32 glb_shutdown_locked                                                                      -32  -3.9%
      -2.2%     -16 glb_update_locked                                                                        -16  -2.2%
      -3.0%      -8 glb_destroy                                                                               -8  -3.0%
      -2.0%      -8 glb_pick_locked                                                                           -8  -2.0%
      -7.4%      -8 glb_ping_one_locked                                                                       -8  -7.4%
      -2.9%      -8 glb_lb_channel_on_connectivity_changed_cb                                                 -8  -2.9%
      -0.9%      -4 [Unmapped]                                                                                -4  -0.9%
     -25.0%      -3 glb_check_connectivity_locked                                                             -3 -25.0%
     -25.0%      -3 glb_notify_on_state_change_locked                                                         -3 -25.0%

  -0.0%    -376 TOTAL                                                                                -1.54Ki  -0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
[trickle] No significant performance differences

@grpc-testing
Copy link
[microbenchmarks] No significant performance differences

@AspirinSJL AspirinSJL force-pushed the revert-14412-revert-14376-revert-14356-revert-13671-grpclb_reresolution branch from baedcfb to 10142e4 Compare February 16, 2018 22:29
@grpc-testing
Copy link
****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +9.1%     +45 src/core/ext/filters/client_channel/lb_policy.cc                                         +45  +9.1%
      +883%     +53 grpc_lb_policy_set_reresolve_closure_locked                                              +53  +883%

 -------------- SHRINKING                                                                            --------------
  -0.0%    -117 [None]                                                                               -1.29Ki  -0.0%
      -0.0%     -93 [Unmapped]                                                                           -1.26Ki  -0.0%
      -9.1%      -8 glb_lb_policy_vtable                                                                      -8  -9.1%
      -9.1%      -8 pick_first_lb_policy_vtable                                                               -8  -9.1%
      -9.1%      -8 round_robin_lb_policy_vtable                                                              -8  -9.1%
  -2.1%    -112 src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc                  -112  -2.1%
      [DEL]    -138 start_picking_locked(pick_first_lb_policy*) [clone .isra.1]                             -138  [DEL]
      [DEL]    -112 pf_set_reresolve_closure_locked                                                         -112  [DEL]
      [DEL]     -89 destroy_unselected_subchannels_locked(pick_first_lb_policy*) [clone .isra.2]             -89  [DEL]
  -1.7%    -112 src/core/ext/filters/client_channel/lb_policy/round_robin/round_robin.cc                -112  -1.7%
      [DEL]    -112 rr_set_reresolve_closure_locked                                                         -112  [DEL]
      [DEL]    -105 start_picking_locked(round_robin_lb_policy*) [clone .isra.1]                            -105  [DEL]
  -0.5%     -80 src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc                           -80  -0.5%
      [DEL]    -801 build_lb_channel_args(grpc_lb_addresses const*, grpc_core::FakeResolverResponseGener    -801  [DEL]
      [DEL]    -303 update_lb_connectivity_status_locked                                                    -303  [DEL]
      [DEL]    -210 extract_backend_addresses_locked(grpc_lb_addresses const*) [clone .isra.2]              -210  [DEL]
      [DEL]    -145 lb_call_data_shutdown(glb_lb_policy*) [clone .isra.4]                                   -145  [DEL]
      [DEL]    -143 glb_set_reresolve_closure_locked                                                        -143  [DEL]
      [DEL]     -98 on_rr_connectivity_changed_locked                                                        -98  [DEL]
      -2.6%     -40 rr_handover_locked(glb_lb_policy*) [clone .part.17]                                      -40  -2.6%
      -3.9%     -32 glb_shutdown_locked                                                                      -32  -3.9%
      -2.2%     -16 glb_update_locked                                                                        -16  -2.2%
      -3.0%      -8 glb_destroy                                                                               -8  -3.0%
      -2.0%      -8 glb_pick_locked                                                                           -8  -2.0%
      -7.4%      -8 glb_ping_one_locked                                                                       -8  -7.4%
      -2.9%      -8 glb_lb_channel_on_connectivity_changed_cb                                                 -8  -2.9%
      -0.9%      -4 [Unmapped]                                                                                -4  -0.9%
     -25.0%      -3 glb_check_connectivity_locked                                                             -3 -25.0%
     -25.0%      -3 glb_notify_on_state_change_locked                                                         -3 -25.0%

  -0.0%    -376 TOTAL                                                                                -1.54Ki  -0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
****************************************************************

libgrpc.so

     VM SIZE                                                                                            FILE SIZE
 ++++++++++++++ GROWING                                                                              ++++++++++++++
  +9.1%     +45 src/core/ext/filters/client_channel/lb_policy.cc                                         +45  +9.1%
      +883%     +53 grpc_lb_policy_set_reresolve_closure_locked                                              +53  +883%

 -------------- SHRINKING                                                                            --------------
  -0.0%    -117 [None]                                                                               -1.29Ki  -0.0%
      -0.0%     -93 [Unmapped]                                                                           -1.26Ki  -0.0%
      -9.1%      -8 glb_lb_policy_vtable                                                                      -8  -9.1%
      -9.1%      -8 pick_first_lb_policy_vtable                                                               -8  -9.1%
      -9.1%      -8 round_robin_lb_policy_vtable                                                              -8  -9.1%
  -2.1%    -112 src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc                  -112  -2.1%
      [DEL]    -138 start_picking_locked(pick_first_lb_policy*) [clone .isra.1]                             -138  [DEL]
      [DEL]    -112 pf_set_reresolve_closure_locked                                                         -112  [DEL]
      [DEL]     -89 destroy_unselected_subchannels_locked(pick_first_lb_policy*) [clone .isra.2]             -89  [DEL]
  -1.7%    -112 src/core/ext/filters/client_channel/lb_policy/round_robin/round_robin.cc                -112  -1.7%
      [DEL]    -112 rr_set_reresolve_closure_locked                                                         -112  [DEL]
      [DEL]    -105 start_picking_locked(round_robin_lb_policy*) [clone .isra.1]                            -105  [DEL]
  -0.5%     -80 src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc                           -80  -0.5%
      [DEL]    -801 build_lb_channel_args(grpc_lb_addresses const*, grpc_core::FakeResolverResponseGener    -801  [DEL]
      [DEL]    -303 update_lb_connectivity_status_locked                                                    -303  [DEL]
      [DEL]    -210 extract_backend_addresses_locked(grpc_lb_addresses const*) [clone .isra.2]              -210  [DEL]
      [DEL]    -145 lb_call_data_shutdown(glb_lb_policy*) [clone .isra.4]                                   -145  [DEL]
      [DEL]    -143 glb_set_reresolve_closure_locked                                                        -143  [DEL]
      [DEL]     -98 on_rr_connectivity_changed_locked                                                        -98  [DEL]
      -2.6%     -40 rr_handover_locked(glb_lb_policy*) [clone .part.17]                                      -40  -2.6%
      -3.9%     -32 glb_shutdown_locked                                                                      -32  -3.9%
      -2.2%     -16 glb_update_locked                                                                        -16  -2.2%
      -3.0%      -8 glb_destroy                                                                               -8  -3.0%
      -2.0%      -8 glb_pick_locked                                                                           -8  -2.0%
      -7.4%      -8 glb_ping_one_locked                                                                       -8  -7.4%
      -2.9%      -8 glb_lb_channel_on_connectivity_changed_cb                                                 -8  -2.9%
      -0.9%      -4 [Unmapped]                                                                                -4  -0.9%
     -25.0%      -3 glb_check_connectivity_locked                                                             -3 -25.0%
     -25.0%      -3 glb_notify_on_state_change_locked                                                         -3 -25.0%

  -0.0%    -376 TOTAL                                                                                -1.54Ki  -0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
[trickle] No significant performance differences

@grpc-testing
Copy link
[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
[microbenchmarks] Performance differences noted:
Benchmark                       cpu_time    real_time
------------------------------  ----------  -----------
BM_ErrorGetStatus<SimpleError>  +4%         +4%

@AspirinSJL AspirinSJL merged commit 84f94c1 into master Feb 20, 2018
@AspirinSJL AspirinSJL deleted the revert-14412-revert-14376-revert-14356-revert-13671-grpclb_reresolution branch February 20, 2018 03:19
@AspirinSJL AspirinSJL mentioned this pull request Feb 20, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Jan 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants