Skip to content

Conversation

@olavloite
Copy link

Adds a trace for waiting for a session from the session pool. The maximum wait time is default 30 seconds, and this time can be changed through SessionPoolOptions#setBlockForSessionTimeoutMillis(long).

@olavloite olavloite requested a review from a team as a code owner July 8, 2019 16:45
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 8, 2019

private PooledSession take() throws SpannerException {
SessionOrError s = Uninterruptibles.takeUninterruptibly(waiter);
SessionOrError s = takeUninterruptiblyWithTimeout();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this needs to change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Uninterruptibles.takeUninterruptibly(waiter) will block indefinitely if the resource pool is exhausted because of a session leak. In order to be able to log that a session request is blocked, we need a trigger that will end the trace span and do the actual logging. That is what this timeout does.

This timeout is currently also propagated to the user as a DEADLINE_EXCEEDED SpannerException. We could choose to handle this internally and retry the session request after logging the timeout. That would retain the current behavior where the application will potentially block indefinitely in case of a session leak. My opinion is however that it would be better to propagate this error to the user in order to give the user more information on what is going on.

kolea2
kolea2 previously requested changes Jul 8, 2019
this.writeSessionsFraction = builder.writeSessionsFraction;
this.actionOnExhaustion = builder.actionOnExhaustion;
this.actionOnSessionNotFound = builder.actionOnSessionNotFound;
this.waitForSessionTimeoutMillis = builder.blockForSessionTimeoutMillis;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make these variables all 'wait...' or all 'block..'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 'wait...'.

@codecov
Copy link

codecov bot commented Jul 9, 2019

Codecov Report

Merging #5703 into master will increase coverage by 0.71%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #5703      +/-   ##
============================================
+ Coverage     46.71%   47.43%   +0.71%     
- Complexity    24630    27180    +2550     
============================================
  Files          2351     2522     +171     
  Lines        256127   274264   +18137     
  Branches      29325    31355    +2030     
============================================
+ Hits         119650   130089   +10439     
- Misses       127558   134565    +7007     
- Partials       8919     9610     +691
Impacted Files Coverage Δ Complexity Δ
...n/java/com/google/cloud/bigquery/BigQueryImpl.java 66.78% <0%> (-13.22%) 57% <0%> (ø)
.../java/com/google/cloud/testing/CommandWrapper.java 87.87% <0%> (-9.1%) 13% <0%> (ø)
...a/com/google/cloud/firestore/FirestoreOptions.java 36.97% <0%> (-7.82%) 7% <0%> (ø)
.../cloud/datastore/testing/LocalDatastoreHelper.java 80.59% <0%> (-4.48%) 17% <0%> (ø)
...ud/storage/contrib/nio/testing/FakeStorageRpc.java 60.94% <0%> (-3.65%) 44% <0%> (+2%)
...toring/v3/stub/NotificationChannelServiceStub.java 7.69% <0%> (-2.31%) 1% <0%> (ø)
...itoring/v3/NotificationChannelServiceSettings.java 12.28% <0%> (-2.31%) 2% <0%> (ø)
.../cloud/compute/v1/InsertTargetPoolHttpRequest.java 22.08% <0%> (-1.85%) 8% <0%> (-1%)
...able/gaxx/reframing/ReframingResponseObserver.java 88.99% <0%> (-1.84%) 29% <0%> (-1%)
...c/main/java/com/google/cloud/logging/SinkInfo.java 82.03% <0%> (-1.78%) 19% <0%> (-2%)
... and 713 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5d709c...7681477. Read the comment docs.

@olavloite olavloite requested a review from kolea2 July 12, 2019 15:41
throw newSpannerException(s.e);
long currentTimeout = options.getInitialWaitForSessionTimeoutMillis();
while (true) {
Tracer tracer = Tracing.getTracer();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: tracer can be made a private static field.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

<groupId>io.opencensus</groupId>
<artifactId>opencensus-impl</artifactId>
<version>${opencensus.version}</version>
<scope>test</scope>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be runtime only.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to runtime.

// End the span with status DEADLINE_EXCEEDED and retry.
tracer
.getCurrentSpan()
.end(EndSpanOptions.builder().setStatus(Status.DEADLINE_EXCEEDED).build());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.end(EndSpanOptions.builder().setStatus(Status.DEADLINE_EXCEEDED).build());
.setStatus(Status.DEADLINE_EXCEEDED);

You don't need to end the span here. When scope is closed, span will be ended automatically. See https://github.com/census-instrumentation/opencensus-java/blob/916adcbaba580cb223cb37ca34cf1f902903ede9/api/src/main/java/io/opencensus/trace/CurrentSpanUtils.java#L97-L103.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

if (s.e != null) {
throw newSpannerException(s.e);
}
tracer.getCurrentSpan().end();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. You don't need to call this explicitly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

long currentTimeout = options.getInitialWaitForSessionTimeoutMillis();
while (true) {
Tracer tracer = Tracing.getTracer();
Scope waitScope = tracer.spanBuilder(WAIT_FOR_SESSION).startScopedSpan();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard way to create a scoped span is in a try block and then you don't explicitly have to close the scope [example].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to try-with-resources block.

SessionOrError s = pollUninterruptiblyWithTimeout(currentTimeout);
if (s == null) {
// End the span with status DEADLINE_EXCEEDED and retry.
tracer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a span each time pollUninterruptiblyWithTimeout() is called, an alternative is to create a single span in take() then each time around the loop, add a trace annotation to indicate that the operation timed out. You can also add attributes to the annotation and which gives you the option to record the current timeout at each failure [example].

There might be a good reason why you chose to do multiple spans and I'd like to understand it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that I chose to create multiple spans, is that the take() method may block indefinitely, which would mean that the span would never be ended. Does a span that never ends, but that does add a trace annotation every 30 seconds, get flushed and sent to the server each time an annotation is added? If so, then I agree that having a single span in the take() would be the better solution. But to my (limited) knowledge of OpenCensus, the span needs to be closed before it is flushed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with @g-easy and you're right. The span will never be flushed if take() can block indefinitely so your implementation is the only way to get the trace information in this case.

if (s == null) {
// End the span with status DEADLINE_EXCEEDED and retry.
tracer
.getCurrentSpan()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Here and below, instead of calling tracer.getCurrentSpan() repeatedly, you can save it in a variable above all the if/else logic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the other changes after the review, the number of calls to tracer.getCurrentSpan() is now reduced to two, and both are only called if an exception occurs, so I would prefer to keep these two calls. But I agree with you that in the previous version of the code, that would have been better.

@Override
public void export(Collection<SpanData> spanDataList) {
for (SpanData sd : spanDataList) {
if (sd.getStatus() == Status.DEADLINE_EXCEEDED) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this counter only be incremented for spans that match WAIT_FOR_SESSION?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, that makes the test case less prone to false failures.

// finish.
checkedOutSession.close();
fut.get();
executor.shutdown();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close() and shutdown() should be done in a finally as we should be cleaning up regardless of whether the timeout happened or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Copy link

@skuruppu skuruppu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @olavloite for making the changes. This PR looks good from my perspective but I'll let @kolea2 do the final approval.

@kolea2 kolea2 dismissed their stale review August 15, 2019 15:34

stale

Copy link

@skuruppu skuruppu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolea2 gave me the go ahead to approve this change.

@olavloite olavloite requested review from kolea2 and removed request for kolea2 August 16, 2019 07:10
@olavloite
Copy link
Author

@kolea2 Apparently, GitHub requires your explicit approval of this PR to allow merging :-)

@olavloite olavloite removed the request for review from kolea2 August 20, 2019 08:28
@olavloite olavloite merged commit 9bfc925 into googleapis:master Aug 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants