Elasticsearch for Java Developers: Elasticsearch from Java
This article is part of our Academy Course titled Elasticsearch Tutorial for Java Developers.
In this course, we provide a series of tutorials so that you can develop your own Elasticsearch based applications. We cover a wide range of topics, from installation and operations, to Java API Integration and reporting. With our straightforward tutorials, you will be able to get your own projects up and running in minimum time. Check it out here!
1. Introduction
In the previous part of the tutorial we mastered the skills of establishing meaningful conversations with Elasticsearch by leveraging its numerous RESTful APIs, using the command line tools only. It is very handful knowledge, however when you are developing Java / JVM applications, you would need better options than command line. Luckily, Elasticsearch has more than one offering in this area.
Table Of Contents
Along this part of the tutorial we are going learn how to talk to Elasticsearch by means of native Java APIs. Our approach to that would be to code and to work on a couple of Java applications, using Apache Maven for build management, terrific Spring Framework for dependency wiring and inversion of control, and awesome JUnit / AssertJ as test scaffolding.
2. Using Java Client API
Since the early versions, Elasticsearch distributes a dedicated Java client API with each release, also known as transport client. It talks Elasticsearch native transport protocol and as such, imposes the constraint that the version of the client library should at least match the major version of Elasticsearch distribution you are using (ideally, the client should have exactly the same version).
As we are using Elasticsearch version 5.2.0, it would make sense to add the respective client version dependency to our pom.xml file.
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>5.2.0</version>
</dependency>
Since we have chosen Spring Framework to power our application, literally the only thing we need is a transport client configuration.
@Configuration
public class ElasticsearchClientConfiguration {
@Bean(destroyMethod = "close")
TransportClient transportClient() throws UnknownHostException {
return new PreBuiltTransportClient(
Settings.builder()-
.put(ClusterName.CLUSTER_NAME_SETTING.getKey(), "es-catalog")
.build()
)
.addTransportAddress(new InetSocketTransportAddress(
InetAddress.getByName("localhost"), 9300));
}
}
The PreBuiltTransportClient follows the builder pattern (as most of the classes as we are going to see soon) to construct TransportClient instance, and once it is there, we could use the injection techniques supported by Spring Framework to access it:
@Autowired private TransportClient client;
The CLUSTER_NAME_SETTING is worth of our attention: it should match exactly the name of the Elasticsearch cluster we are connecting to, which in our case is es-catalog.
Great, we have our transport client initialized, so what can we do with it? Essentially, the transport client exposes a whole bunch of methods (following the fluent interface style) to open the access to all Elasticsearch APIs from the Java code. To get one step ahead, it should be noted that transport client has explicit separation between regular APIs and admin APIs. The latter is available by invoking admin() method on the transport client instance.
Before rolling the sleeves and getting our hands dirty, it is necessary to mention that Elasticsearch Java APIs are designed to be fully asynchronous and as such, are centered around two key abstractions: ActionFuture<?> and ListenableActionFuture<?>. In fact, ActionFuture<?> is just a plain old Java Future<?> with a couple of handful methods added, stay tuned on that. From the other side, ListenableActionFuture<?> is more powerful abstraction with the ability to take the callbacks and notify the caller about the result of the execution.
Picking one style over the other is totally dictated by the needs of your applications, as both of them do have own pros and cons. Without further ado, let us go ahead and make sure our Elasticsearch cluster is healthy and is ready to rock.
final ClusterHealthResponse response = client
.admin()
.cluster()
.health(
Requests
.clusterHealthRequest()
.waitForGreenStatus()
.timeout(TimeValue.timeValueSeconds(5))
)
.actionGet();
assertThat(response.isTimedOut())
.withFailMessage("The cluster is unhealthy: %s", response.getStatus())
.isFalse();
The example is pretty simple and straightforward. What we do is inquiring Elasticsearch cluster about its status while explicitly asking to wait at most 5 seconds for the status to become green (if it is not the case yet). Under the hood, client.admin().cluster().health(...) returns ActionFuture<?> so we have to call one of the actionGet methods to get the response.
Here is another, slightly different way to use Elasticsearch Java API, this time employing the prepareXxx methods family.
final ClusterHealthResponse response = client
.admin()
.cluster()
.prepareHealth()
.setWaitForGreenStatus()
.setTimeout(TimeValue.timeValueSeconds(5))
.execute()
.actionGet();
assertThat(response.isTimedOut())
.withFailMessage("The cluster is unhealthy: %s", response.getStatus())
.isFalse();
Although both code snippets lead to absolutely identical results, the latter one is calling client.admin().cluster().prepareHealth().execute() method at the end of the chain, which returns ListenableActionFuture<?>. It does not make a lot of difference in this example but please keep it in mind as we are going to see more interesting use cases where such a detail becomes really a game changer.
And finally, last but not least, the asynchronous nature of any API (and Elasticsearch Java API is not an exception) assumes that invocation of the operation will take some time and it becomes the responsibility of the caller to decide how to deal with that. What we have used so far is just calling actionGet on the instance of ActionFuture<?>, which effectively transforms the asynchronous execution into a blocking (or, to say it the other way, synchronous) call. Moreover, we did not specify the expectations in terms of how long we would agree to wait for the execution to be completed before giving up. We could do better than that and in the rest of this section we are going to address both of these points.
Once we have our Elasticsearch cluster status all green, it is time to create some indices, much like we have done in the previous part of the tutorial but this time using Java APIs only. It would be good idea to ensure that catalog index does not exist yet before creating one.
final IndicesExistsResponse response = client
.admin()
.indices()
.prepareExists("catalog")
.get(TimeValue.timeValueMillis(100));
if (!response.isExists()) {
...
}
Please notice that in the snippet above we provided the explicit timeout for the operation to complete, get(TimeValue.timeValueMillis(100)), which is essentially the shortcut to execute().actionGet(TimeValue.timeValueMillis(100)).
For the catalog index settings and mapping types we are going to use exactly the same JSON file, catalog-index.json, which we had been using in the previous part of the tutorial. We are going to place it into src/test/resources folder, following Apache Maven conventions.
@Value("classpath:catalog-index.json")
private Resource index;
Fortunately Spring Framework simplifies a lot the injection of the classpath resources so not much we need to do here to gain the access to catalog-index.json content and feed it directly to Elasticsearch Java API.
try (final ByteArrayOutputStream out = new ByteArrayOutputStream()) {
Streams.copy(index.getInputStream(), out);
final CreateIndexResponse response = client
.admin()
.indices()
.prepareCreate("catalog")
.setSource(out.toByteArray())
.setTimeout(TimeValue.timeValueSeconds(1))
.get(TimeValue.timeValueSeconds(2));
assertThat(response.isAcknowledged())
.withFailMessage("The index creation has not been acknowledged")
.isTrue();
}
This code block illustrates yet another way to approach the Elasticsearch Java APIs by utilizing the setSource method call. In a nutshell, we just supply the request payload ourselves in a form of opaque blob (or string) and it is going to be sent to Elasticsearch node(s) as is. However, we could have used a pure Java data structures instead, for example:
final CreateIndexResponse response = client
.admin()
.indices()
.prepareCreate("catalog")
.setSettings(...)
.setMapping("books", ...)
.setMapping("authors", ...)
.setTimeout(TimeValue.timeValueSeconds(1))
.get(TimeValue.timeValueSeconds(2));
Good, with that we are going to conclude the transport client admin APIs and switch over to document and search APIs, as those would be the ones you would use most of the time. As we remember, Elasticsearch speaks JSON so we have to somehow convert books and authors to JSON representation using Java. In fact, Elasticsearch Java API helps with that by supporting the generic abstraction over the content named XContent, for example:
final XContentBuilder source = JsonXContent
.contentBuilder()
.startObject()
.field("title", "Elasticsearch: The Definitive Guide. ...")
.startArray("categories")
.startObject().field("name", "analytics").endObject()
.startObject().field("name", "search").endObject()
.startObject().field("name", "database store").endObject()
.endArray()
.field("publisher", "O'Reilly")
.field("description", "Whether you need full-text search or ...")
.field("published_date", new LocalDate(2015, 02, 07).toDate())
.field("isbn", "978-1449358549")
.field("rating", 4)
.endObject();
Having the document representation, we could send it over to Elasticsearch for indexing. To keep the promises, this time we would like to go truly asynchronous way and do not wait for the response, providing the notification callback in a shape of ActionListener<IndexResponse> instead.
client
.prepareIndex("catalog", "books")
.setId("978-1449358549")
.setContentType(XContentType.JSON)
.setSource(source)
.setOpType(OpType.INDEX)
.setRefreshPolicy(RefreshPolicy.WAIT_UNTIL)
.setTimeout(TimeValue.timeValueMillis(100))
.execute(new ActionListener() {
@Override
public void onResponse(IndexResponse response) {
LOG.info("The document has been indexed with the result: {}",
response.getResult());
}
@Override
public void onFailure(Exception ex) {
LOG.error("The document has been not been indexed", ex);
}
});
Nice, so we have our first document in the books collection! What about authors though? Well, just as reminder, the book in question has more than one author so it is a perfect occasion to use document bulk indexing.
final XContentBuilder clintonGormley = JsonXContent
.contentBuilder()
.startObject()
.field("first_name", "Clinton")
.field("last_name", "Gormley")
.endObject();
final XContentBuilder zacharyTong = JsonXContent
.contentBuilder()
.startObject()
.field("first_name", "Zachary")
.field("last_name", "Tong")
.endObject();
The XContent part is clear enough and frankly, you may never use such an option, preferring to model real classes and use one of the terrific Java libraries for automatic to / from JSON conversions. But the following snippet is really interesting.
final BulkResponse response = client
.prepareBulk()
.add(
Requests
.indexRequest("catalog")
.type("authors")
.id("1")
.source(clintonGormley)
.parent("978-1449358549")
.opType(OpType.INDEX)
)
.add(
Requests
.indexRequest("catalog")
.type("authors")
.id("2")
.source(zacharyTong)
.parent("978-1449358549")
.opType(OpType.INDEX)
)
.setRefreshPolicy(RefreshPolicy.WAIT_UNTIL)
.setTimeout(TimeValue.timeValueMillis(500))
.get(TimeValue.timeValueSeconds(1));
assertThat(response.hasFailures())
.withFailMessage("Bulk operation reported some failures: %s",
response.buildFailureMessage())
.isFalse();
We are sending two index requests for authors collection in one single batch. You might be wondering what this parent("978-1449358549") means and to answer this question we have to recall that books and authors are modeled using parent / child relationships. So the parent key in this case is the reference (by the _id property) to the respective parent document in books collection.
Well done, so we know how to work with indices and how to index the documents using Elasticsearch transport client Java APIs. It is search time now!
final SearchResponse response = client
.prepareSearch("catalog")
.setTypes("books")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.matchAllQuery())
.setFrom(0)
.setSize(10)
.setTimeout(TimeValue.timeValueMillis(100))
.get(TimeValue.timeValueMillis(200));
assertThat(response.getHits().hits())
.withFailMessage("Expecting at least one book to be returned")
.isNotEmpty();
The simplest search criterion one can come up with is to match all documents and this is what we have done in the snippet above (please notice that we explicitly limited the number of results returned to 10 documents).
To our luck, Elasticsearch Java API has full-fledged implementation of Query DSL in a form of QueryBuilders and QueryBuilder classes so writing (and maintaining) the complex queries is exceptionally easy. As an exercise, we are going to build the same compound query which we came up with in the previous part of the tutorial:
final QueryBuilder query = QueryBuilders
.boolQuery()
.must(
QueryBuilders
.rangeQuery("rating")
.gte(4)
)
.must(
QueryBuilders
.nestedQuery(
"categories",
QueryBuilders.matchQuery("categories.name", "analytics"),
ScoreMode.Total
)
)
.must(
QueryBuilders
.hasChildQuery(
"authors",
QueryBuilders.termQuery("last_name", "Gormley"),
ScoreMode.Total
)
);
The code looks pretty, concise and human-readable. If you are keen on using static imports feature of the Java programming language, the query is going to look even more compact.
final SearchResponse response = client
.prepareSearch("catalog")
.setTypes("books")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query)
.setFrom(0)
.setSize(10)
.setFetchSource(
new String[] { "title", "publisher" }, /* includes */
new String[0] /* excludes */
)
.setTimeout(TimeValue.timeValueMillis(100))
.get(TimeValue.timeValueMillis(200));
assertThat(response.getHits().hits())
.withFailMessage("Expecting at least one book to be returned")
.extracting("sourceAsString", String.class)
.hasOnlyOneElementSatisfying(source -> {
assertThat(source).contains("Elasticsearch: The Definitive Guide.");
});To keep both versions of the query identical, we also hinted the search request through setFetchSource method that we are interested only in returning title and publisher properties of the document source.
The curious readers might be wondering how to use aggregations along with the search requests. This is excellent topic to cover so let us talk about that for a moment. Along with Query DSL, Elasticsearch Java API also supplies aggregations DSL, revolving around AggregationBuilders and AggregationBuilder classes. For example, this is how we could build the bucketed aggregation by publisher property.
final AggregationBuilder aggregation = AggregationBuilders
.terms("publishers")
.field("publisher")
.size(10);
Having the aggregations defined, we could inject them into search request using addAggregation method call as is shown in the code snippet below:
final SearchResponse response = client
.prepareSearch("catalog")
.setTypes("books")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.matchAllQuery())
.addAggregation(aggregation)
.setFrom(0)
.setSize(10)
.setTimeout(TimeValue.timeValueMillis(100))
.get(TimeValue.timeValueMillis(200));
final StringTerms publishers = response.getAggregations().get("publishers");
assertThat(publishers.getBuckets())
.extracting("keyAsString", String.class)
.contains("O'Reilly");
The results of the aggregations are available in the response and could be retrieved by referencing the aggregation name, for example publishers in our case. However be cautious and carefully use the proper aggregation types in order to not get surprises in a form of ClassCastException. Because our publishers aggregation has been defined to group terms into buckets, we are safe by casting the one from the response to StringTerms class instance.
3. Using Java Rest Client
One of the drawbacks related to the usage of the Elasticsearch Java client API is the requirement to be binary compatible with the version of Elasticsearch (either standalone or cluster) you are running.
Fortunately, since the first release of 5.0.0 branch, Elasticsearch brings another option on the table: Java REST client. It uses HTTP protocol to talk to Elasticsearch by invoking its RESTful API endpoints and is oblivious to the version of Elasticsearch (literally, it is compatible with all Elasticsearch versions).
It should be noted though that Java REST client is pretty low level and is not as convenient to use as Java client API, far from that in fact. However, there are quite a few reasons why one may prefer to use Java REST client over Java client API to communicate with Elasticsearch so it is worth its own discussion. To start off, let us include the respective dependency into our Apache Maven pom.xml file.
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>rest</artifactId>
<version>5.2.0</version>
</dependency>
From the configuration perspective we only need to construct the instance of RestClient by calling RestClient.builder method.
@Configuration
public class ElasticsearchClientConfiguration {
@Bean(destroyMethod = "close")
RestClient transportClient() {
return RestClient
.builder(new HttpHost("localhost", 9200))
.setRequestConfigCallback(new RequestConfigCallback() {
@Override
public Builder customizeRequestConfig(Builder builder) {
return builder
.setConnectTimeout(1000)
.setSocketTimeout(5000);
}
})
.build();
}
}
We are jumping a bit ahead here but please pay particular attention to configuration of the proper timeouts because Java REST client does not provide a way (at least, at the moment) to specify those on per-request level basis. With that, we can inject the RestClient instance anywhere, using the same wiring techniques Spring Framework is kindly providing to us:
@Autowired private RestClient client;
To make a fair comparison between Java client API and Java REST client, we are going to dissect a couple of examples we have looked at in the previous section, setting out the stage by checking the Elasticsearch cluster health.
@Test
public void esClusterIsHealthy() throws Exception {
final Response response = client
.performRequest(HttpGet.METHOD_NAME, "_cluster/health", emptyMap());
final Object json = defaultConfiguration()
.jsonProvider()
.parse(EntityUtils.toString(response.getEntity()));
assertThat(json, hasJsonPath("$.status", equalTo("green")));
}
Indeed, the difference is obvious. As you may guess, Java REST client is actually a thin wrapper around the more generic, well-known and respected Apache Http Client library. The response is returned as a string or byte array and it becomes the responsibility of the caller to transform it to JSON and extract the necessary pieces of data. To deal with that in test assertions, we have on-boarded the wonderful JsonPath library, but you are free to make a choice here.
A family of performRequest methods is the typical way for synchronous (or blocking) communication using Java REST client API. Alternatively, there is a family of performRequestAsync methods which are supposed to be used in fully asynchronous flows. In the next example we are going to use one of those in order to index the document into books collection.
The simplest way to represent JSON-like structure in Java language is using plain old Map<String, Object> as it is demonstrated in the code fragment below.
final Map<String, Object> source = new LinkedHashMap<>();
source.put("title", "Elasticsearch: The Definitive Guide. ...");
source.put("categories",
new Map[] {
singletonMap("name", "analytics"),
singletonMap("name", "search"),
singletonMap("name", "database store")
}
);
source.put("publisher", "O'Reilly");
source.put("description", "Whether you need full-text search or ...");
source.put("published_date", "2015-02-07");
source.put("isbn", "978-1449358549");
source.put("rating", 4);
Now we need to convert this Java structure into valid JSON string. There are dozens of way to do so but we are going to leverage the json-smart library, for the reason that it is already available as a transitive dependency of JsonPath library.
final HttpEntity payload = new NStringEntity(JSONObject.toJSONString(source),
ContentType.APPLICATION_JSON);
Having the payload ready, nothing prevents us from invoking Indexing API of the Elasticsearch to add a book into books collection.
client.performRequestAsync(
HttpPut.METHOD_NAME,
"catalog/books/978-1449358549",
emptyMap(),
payload,
new ResponseListener() {
@Override
public void onSuccess(Response response) {
LOG.info("The document has been indexed successfully");
}
@Override
public void onFailure(Exception ex) {
LOG.error("The document has been not been indexed", ex);
}
});
This time we decided to not wait for the response but supply a callback (instance of ResponseListener) instead, keeping the flow truly asynchronous. To finish up, it would be great to understand what it takes to perform more or less realistic search request and parse the results.
As you may expect, the Java REST client does not provide any fluent APIs around Query DSL so we have to fallback to Map<String, Object> one more time in order to construct the search criteria.
final Map<String, Object> authors = new LinkedHashMap<>();
authors.put("type", "authors");
authors.put("query",
singletonMap("term",
singletonMap("last_name", "Gormley")
)
);
final Map<String, Object> categories = new LinkedHashMap<>();
categories.put("path", "categories");
categories.put("query",
singletonMap("match",
singletonMap("categories.name", "search")
)
);
final Map<String, Object> query = new LinkedHashMap<>();
query.put("size", 10);
query.put("_source", new String[] { "title", "publisher" });
query.put("query",
singletonMap("bool",
singletonMap("must", new Map[] {
singletonMap("range",
singletonMap("rating",
singletonMap("gte", 4)
)
),
singletonMap("has_child", authors),
singletonMap("nested", categories)
})
)
);
The price to pay by tackling the problem openly is a lot of cumbersome and error-prone code to write. In this regard, the consistency and conciseness of Java client API really makes a huge difference. You may argue that in reality one may rely on much simpler and safer techniques, like data transfer object, value objects, or even to have JSON search query templates with placeholders, but the point is a little help is offered by Java REST client at the moment.
final HttpEntity payload = new NStringEntity(JSONObject.toJSONString(query),
ContentType.APPLICATION_JSON);
final Response response = client
.performRequest(HttpPost.METHOD_NAME, "catalog/books/_search",
emptyMap(), payload);
final Object json = defaultConfiguration()
.jsonProvider()
.parse(EntityUtils.toString(response.getEntity()));
assertThat(json, hasJsonPath("$.hits.hits[0]._source.title",
containsString("Elasticsearch: The Definitive Guide.")));
Not much to add here, just consult the Download Now



How are the three projects, elasticsearch-client-rest, elasticsearch-testing, elasticsearch-client-java run?
Are they to be run as spring-boot apps with mvn spring-boot:run…?
That’s right, elasticsearch-client-rest and elasticsearch-client-java are typical Spring Boot applications (which could be run using mvn spring-boot:run). While elasticsearch-testing is just a set of JUnit tests, not really a runnable application. Thanks.
Best Regards,
Andriy Redko
Thanks for your response.
So, we need to create a main class (@SpringBootApplication) for elasticsearch-client-rest and elasticsearch-client-java. What exactly should be there in the main method..?
My apologies, my bad, the elasticsearch-client-rest and elasticsearch-client-java are NOT Spring Boot applications, just a set of JUnit tests. However here is how you could convert them to Spring Boot application (for example, in case of elasticsearch-client-rest):
@SpringBootApplication
public class ElasticsearchClientApp {
public static void main(String[] args) {
try(ConfigurableApplicationContext context = SpringApplication.run(ElasticsearchClientConfiguration.class, args)) {
final RestClient client = context.getBean(RestClient.class);
client.performRequest(…);
}
}
}
Thanks.
In ElasticsearchClientTest in test esSearch() you use POST instead of get. And you forgot the / for catalog (which makes http://localhost:9200catalog/books/_search in stead of http://localhost:9200/catalog/books/_search). When I fix these things it still does not work. I get:
org.elasticsearch.client.ResponseException: GET http://localhost:9200/catalog/books/_search: HTTP/1.1 400 Bad Request {“error”:{“root_cause”:[{“type”:”query_shard_exception”,”reason”:”[has_child] no join field has been configured”,”index_uuid”:”tF_VhtchRlWXmH2lGbbBhg”,”index”:”catalog”}],”type”:”search_phase_execution_exception”,”reason”:”all shards failed”,”phase”:”query”,”grouped”:true,”failed_shards”:[{“shard”:0,”index”:”catalog”,”node”:”Lf08SYusSrWShlLYZIVxJw”,”reason”:{“type”:”query_shard_exception”,”reason”:”[has_child] no join field has been configured”,”index_uuid”:”tF_VhtchRlWXmH2lGbbBhg”,”index”:”catalog”}}]},”status”:400}
when copy and paste in browser I see the results like expected.
{“took”:2,”timed_out”:false,”_shards”:{“total”:5,”successful”:5,”skipped”:0,”failed”:0},”hits”:{“total”:1,”max_score”:1.0,”hits”:[{“_index”:”catalog”,”_type”:”books”,”_id”:”978-1449358549″,”_score”:1.0,”_source”:{“title”:”Elasticsearch: The Definitive Guide. A Distribute etc etc
Hi marcel,
Thanks a lot for your comment. I assume that you meant Elasticsearch REST client example. Indeed, we use POST because we are submitting a query (although you could use GET as well, in this case filling query string with the search criteria). The
catalogpart is also present, just pasting the snippet from theelasticsearch-client-rest:@wong wong
public void esSearch() throws IOException {
…
final Response response = client
.performRequest(HttpPost.METHOD_NAME, “catalog/books/_search”, emptyMap(), payload);
…
}
Would be good to get a bit more details about the version of Elasticsearch you are using. Thank you.
Best Regards,
Andriy Redko
When using the SearchResponse class, how to access the data fields which are returned from the search?
Hi,
The SearchResponse has getHits() method which returns another object, SearchHits. It holds the array of the SearchHit object instances (returned by hits() method) corresponding to the documents matched the search criteria. The data is right in there. Thanks.
Best Regards,
Andriy Redko