[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout: c++_linux_dbg_native.bins/dbg/json_run_localhost --scenarios_json '{"scenarios": [{"name": "cpp_protobuf_sync_streaming_from_server_qps_unconstrained_secure" ... #12510

Closed
matt-kwong opened this issue Sep 12, 2017 · 15 comments · Fixed by #12856
Assignees
Labels
disposition/BUILDNURSE For all buildnurse related build/test failures and flakes

Comments

@matt-kwong
Copy link
Contributor
2017-09-12 01:27:42,642 START: bins/dbg/json_run_localhost --scenarios_json '{"scenarios": [{"name": "cpp_protobuf_sync_streaming_from_server_qps_unconstrained_secure", "warmup_seconds": 0, "benchmark_seconds": 1, "num_servers": 1, "server_config": {"async_server_threads": 0, "channel_args": [{"str_value": "throughput", "name": "grpc.optimization_target"}], "security_params": {"use_test_ca": true, "server_host_override": "foo.test.google.fr"}, "threads_per_cq": 0, "server_type": "SYNC_SERVER"}, "num_clients": 0, "client_config": {"security_params": {"use_test_ca": true, "server_host_override": "foo.test.google.fr"}, "channel_args": [{"str_value": "throughput", "name": "grpc.optimization_target"}], "async_client_threads": 0, "outstanding_rpcs_per_channel": 16, "rpc_type": "STREAMING_FROM_SERVER", "payload_config": {"simple_params": {"resp_size": 0, "req_size": 0}}, "client_channels": 64, "threads_per_cq": 0, "load_params": {"closed_loop": {}}, "client_type": "SYNC_CLIENT", "histogram_params": {"max_possible": 60000000000.0, "resolution": 0.01}}}]}' GRPC_POLL_STRATEGY=poll
2017-09-12 01:33:42,684 D0912 01:27:42.647937913   30753 ev_posix.c:107]             Using polling engine: poll
D0912 01:27:42.648066175   30753 dns_resolver.c:301]         Using native dns resolver
D0912 01:27:42.649613830   30753 ev_posix.c:107]             Using polling engine: poll
D0912 01:27:42.649645486   30753 dns_resolver.c:301]         Using native dns resolver
D0912 01:27:42.651490019   30757 ev_posix.c:107]             Using polling engine: poll
D0912 01:27:42.651583876   30757 dns_resolver.c:301]         Using native dns resolver
I0912 01:27:42.651648499   30757 server_builder.cc:269]      Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
D0912 01:27:42.656477672   30764 ev_posix.c:107]             Using polling engine: poll
D0912 01:27:42.656558016   30764 dns_resolver.c:301]         Using native dns resolver
I0912 01:27:42.656617989   30764 server_builder.cc:269]      Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
RUNNING SCENARIO: cpp_protobuf_sync_streaming_from_server_qps_unconstrained_secure
I0912 01:27:42.660069853   30765 driver.cc:266]              Starting server on localhost:4067 (worker #0)
D0912 01:27:42.660280920   30765 ev_posix.c:107]             Using polling engine: poll
D0912 01:27:42.660332687   30765 dns_resolver.c:301]         Using native dns resolver
I0912 01:27:42.661436973   30762 qps_worker.cc:231]          RunServerBody: about to create server
I0912 01:27:42.661800807   30762 qps_worker.cc:69]           Starting server of type SYNC_SERVER
I0912 01:27:42.662968858   30762 server_builder.cc:269]      Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
I0912 01:27:42.665868817   30762 qps_worker.cc:236]          RunServerBody: server created
I0912 01:27:42.665934784   30762 qps_worker.cc:243]          RunServerBody: creation status reported
I0912 01:27:42.666033884   30765 driver.cc:313]              Starting client on localhost:20636 (worker #1)
D0912 01:27:42.666098323   30765 driver.cc:330]              Client 0 gets 64 channels
I0912 01:27:42.666709832   30768 qps_worker.cc:188]          RunClientBody: about to create client
I0912 01:27:42.667073032   30768 qps_worker.cc:50]           Starting client of type SYNC_CLIENT STREAMING_FROM_SERVER 0
I0912 01:27:42.667295069   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:42.675746768   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:42.682866116   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:42.690043794   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:42.697300793   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:42.704236195   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:42.711199861   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:42.718277784   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:42.725411422   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:42.732483353   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.670595940   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.677888970   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.685030670   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.692102836   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.699069991   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.705999560   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.713012928   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.719934641   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.726994452   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.733997144   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.741127291   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.748215814   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.755327070   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.762395545   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.769591329   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.776927561   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.783883342   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.791027872   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.798092631   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.805141247   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.812189824   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.819325801   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.826458589   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.834241189   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.841351111   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.848344503   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.855282185   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.862224272   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.869138543   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.876056742   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.883169201   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.890136701   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.897165576   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.904159268   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.911193529   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.918158879   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.925116066   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.932109826   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.939176661   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.946086370   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.953101181   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.960191464   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.967275504   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:43.974307399   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:44.015770812   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:44.023077325   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:44.064233024   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:44.071289996   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:44.078353668   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:44.085374641   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:44.092526596   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:44.099688771   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:44.106820338   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:44.114084602   30768 client.h:416]               Connecting to localhost:11888
I0912 01:27:45.121982321   30777 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.122066367   30778 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.122113737   30780 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.122142836   30779 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.122564429   30781 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.122690724   30785 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.122673807   30783 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.122662039   30786 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.122682409   30782 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.122701099   30787 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.122724367   30784 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123112371   30788 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123130077   30789 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123244321   30790 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123183926   30791 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123273414   30794 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123292004   30793 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123283581   30792 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123381119   30795 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123422646   30799 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123638553   30808 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123585329   30804 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123598021   30802 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123610624   30800 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123649229   30801 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123659056   30806 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123696072   30797 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123678385   30805 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123725308   30796 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123812526   30798 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123922815   30814 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.123940700   30803 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.124103310   30811 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.124125246   30812 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.124444710   30824 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.124431242   30815 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.124467670   30825 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.124478242   30817 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.124522496   30826 client.h:306]               Waiting for benchmark to start
I0912 01:27:45.124541096   30828 client.h:306]               Waiting for benchmark to start

It looks like the test gets stuck waiting for benchmark to start before timing out.

@matt-kwong matt-kwong added the disposition/BUILDNURSE For all buildnurse related build/test failures and flakes label Sep 12, 2017
@jtattermusch
Copy link
Contributor

This seems to be one of the causes why sanitizers and C++ tests are taking too long.
I routinely see runs of this test case to take >4000 sec (and up to 7600 sec), and it happens on opt, asan, tsan and possibly others.

Two things need to be done:

  1. set reasonable timeouts for these tests (so they get killed if they run for too long).
  2. fix the reason why they are taking so long.

this is a high priority issue and needs to be addressed ASAP.

@dgquintas
Copy link
Contributor

Fails 100% of the time. See dashboard for details.

@matt-kwong
Copy link
Contributor Author

One reason for the increase in failures in this test recent is probably related to #12419. As seen from the logs,

Update CPU cost for json_run_localhost:cpp_protobuf_sync_streaming_qps_unconstrained_secure GRPC_POLL_STRATEGY=epollsig: 1024.000000 -> 3.610000
Update CPU cost for json_run_localhost:cpp_protobuf_sync_streaming_qps_unconstrained_secure GRPC_POLL_STRATEGY=epoll1: 1024.000000 -> 2.910000
Update CPU cost for json_run_localhost:cpp_protobuf_sync_streaming_qps_unconstrained_secure GRPC_POLL_STRATEGY=poll: 1024.000000 -> 3.610000
Update CPU cost for json_run_localhost:cpp_protobuf_sync_streaming_qps_unconstrained_secure GRPC_POLL_STRATEGY=poll-cv: 1024.000000 -> 1.310000

this test no longer runs exclusively and is likely running with 15 other tests concurrently. I'm not sure if it's a resource issue or the test expects to be run exclusively, but if it's the former, I'm planning on upgrading the VMs today or tomorrow and that might help.

@vjpai
Copy link
Member
vjpai commented Nov 28, 2017

@jtattermusch / @dgquintas : Suggesting removing priority on this issue since current dashboard indicates 0.11% failure for streaming_from_server tests

@dgquintas
Copy link
Contributor

SGTM

@apolcyn
Copy link
Contributor
apolcyn commented Dec 6, 2017

cpp_protobuf_sync_streaming_from_server_qps_unconstrained_secure_low_thread_count GRPC_POLL_STRATEGY=epollsig

scenario timeout in https://sponge.corp.google.com/target?id=59f1c0ba-d1b3-47a2-adeb-f2d44e2d2c8a&target=github/grpc/c%2B%2B_linux_tsan_native&searchFor=&show=ALL&sortBy=STATUS

@murgatroid99
Copy link
Member

Asan failure json_run_localhost:cpp_protobuf_sync_streaming_from_server_qps_unconstrained_secure_low_thread_count GRPC_POLL_STRATEGY=epoll1: https://sponge.corp.google.com/target?id=3206a61e-b6fb-49d4-b9df-e0afb8fbf95f&target=github/grpc/c%2B%2B_linux_asan_native&searchFor=&show=ALL&sortBy=STATUS

@jtattermusch
Copy link
Contributor

Not seen since 2018-01-23.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
disposition/BUILDNURSE For all buildnurse related build/test failures and flakes
Projects
None yet
8 participants