Skip to content

Conversation

@Perulish8
Copy link
Contributor

We have been experiencing poor performance with our graphql-java application under heavy load during scale testing. During these tests, several other backends have also struggled which results in timeout or circuit breaker exceptions being thrown and mapped to GraphQL errors added to the response. We started profiling with Java Flight Recorder and reviewing the results in Mission Control. One finding is that there is significant blocking on the ExecutionContext.addErrors() method, which is backed by a CopyOnWriteArrayList. Researching CopyOnWriteArrayList vs. other options, CopyOnWriteArrayList blocks on write operations and is expensive because it has to create a new copy of the array on every write. It is ideal for situations with limited concurrent writes and many concurrent reads. For how the list is being used in this context, I would expect many concurrent writes and single threaded reads. This PR updates the ExecutionContext.error field to use Collections.synchronizedList(new ArrayList()), which is more performant way to support a synchronized list with concurrent writes. We ran a perf test with this change off of a local build and Mission Control was no longer reporting blocking in ExecutionContext.addErrors().

Sources:

@bbakerman
Copy link
Member

CopyOnWriteArrayList blocks on write operations and is expensive because it has to create a new copy of the array on every write. It is ideal for situations with limited concurrent writes and many concurrent reads.

You have me intrigued here - how many concurrent DataFetchers do you have in this system and how many of them are producing errors.

And ExecutionContext object is created per request - so contention can only be within threads on working on the same request I guess - we would expect that not that many errors get written in graphql processing to be a problem but do you generate a lot of errors??

Can you post any of the Flight Recorder details on this please for my own edification>

I am inclined at this stage to accept this PR based on your evidence that "it was slow and then it wasnt" but I would love to see more detailed evidence

@bbakerman bbakerman added this to the 16.0 milestone Aug 25, 2020
@bbakerman
Copy link
Member

The general read write patterns in graphql-java is

  • one request thread reading the results back into the ExecutionResult
  • many data fetching threads that could write errors into the error list *and hence needing synchronization)

While we expect N errors, we dont expect it to be in the 100s or more - so number of writes per request should be low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants