Introduction to elasticsearch

Introduction to
By Melvyn Peignon

What will I cover?
- Company and products presentation
- Elasticsearch architecture
- Presentation of Kibana
- Presentation of the search API
- Analyzer
- TF/IDF and relevance
- Elasticsearch use case
- Conclusion

Elastic
Founded in 2012
- Is behind:
- Kibana
- Elasticsearch
- Logstash
- Beats

What is elasticsearch?
- Full text search engine
- Based on Lucene
- Highly available
- Distributed
- Scalable
- RESTful
- Open Source
Shay
Bannon

Trending between search-engine (ES is blue)

CRUD
CREATE
READ
UPDATE
DELETE

Some concepts to know
- Near real time (NRT)
- Cluster
- Node
- Index
- Document
- Shards and Replicas

Documents, Types, indexes
- An index is a collection of documents that share similar
properties.
- A document is the basic piece of information that can be
indexed.
- A type is a logical partition of the data in your index

Cluster, Nodes, Shards and Replicas
Cluster
Node 1
S1 S2
S3 S4

Cluster
Node 1 Node 2
S3 S4S1 S2

Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3

Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Ping
PongPing

Responsibilities of the master
- Cluster health
- All the creation of index
- Repartition of the Shards
- Repartition of the Replicas

Cluster recommendation
- Your servers in the same data center
- Your machines on different Rack
- Keeping at least 3 eligible master node (Quorum of 2 is 2)

What’s Kibana?
- Another elastic product
- A tool allowing you to communicate in a more “human”
way to your elasticsearch
- A product that allow you to do dashboard and data
visualization

Let’s go for a demonstration

Demonstration done on Kibana
Query can be found on Github:

The analyzer
{“a”: [id_0], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0]}
Standard Analyzer

The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0],
“wood”: [id_0], “probability”:[id_1], “complete”:[id_1],
“guide”:[id_1]}
Standard Analyzer

The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0],
“the”: [id_0], “wood”: [id_0], “probability”:[id_1],
“complete”:[id_1], “guide”:[id_1]}
[id_0, id_1]

The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0],
“the”: [id_0], “wood”: [id_0],
“probability”:[id_1], “complete”:[id_1],
“guide”:[id_1]}
[]

The english analyzer
English Analyzer
{“walk”: [id_0], “wood”: [id_0]}

The english analyzer
{ “walk”: [id_0], “wood”: [id_0]}
[]

What is relevance?
Two theories to know:
- Boolean model
- Space vector model

Boolean model
O0 = “Eric is ... always feeding”
O1 = “Jherez is ... with the friends”
….
O6 = “Manage Idea… to Melvyn)”
QT= {“lab”, “manager”} QO = “OR”
T = {t1:”lab”, t2:”manager”, t3:”Idea”, …, “t4”:
feeding}
D = {D0, D1, …, D6}
D0 = {Eric, is, …, feeding}
D1 = {Jherez, is, …, friends}
D6 = {Manage, idea, …,
Melvyn}
S1 = {D0, D1, D6}
S2 = {D0, D6}
SF = S1 ∪ S2 = S1

Space vector model
S1 = {D0, D1, D6}
T0 = D0 ∩ QT (“lab”, “manager”) ⇒ V0 = (L0, M0)
T1 = D1 ∩ QT (“lab”) ⇒ V1 = (L1, 0)
T6 = D6 ∩ QT (“lab”, “manager”) ⇒ V6 = (L6, M6)

Weight of a token in a document
- Term frequency
TF = √Frequency
- Inverse Document Frequency
IDF = 1 + log(1/ (docFrequency + 1))
- Field length
FL = 1 / √TokenInField
Weight = TF x IDF x FL

Relevance
Vq = [1, 1.47]
V0 = [0.81, 0.85]
V1 = [0.37, 0]
V6 = [0.8, 1.2]
Relevance(Vq, Vx) = cos(Vq, Vx) =
(Vq . Vx) / (॥Vq॥.॥Vx॥)

Let’s Kaggle with elasticsearch
https://www.kaggle.com/c/whats-cooking

Results of our “Classifier”
Explanation of the methodology:
http://melvyn.pythonanywhere.com/posts/1/

Last advices?
- Mapping (I highly recommend having a mapping. You cannot update the type
defined in a field in the mapping)
- Elasticsearch as a database (I prefer having both, easier for reindexation,
having a back up, do my search and analytics on ES and use my database for
identification, etc ...)
- Elasticsearch as a NOSQL database (I wouldn’t do it on a serious project, but
nice to have if you wanna do a quick implementation for a POC)

Hope you enjoyed the presentation!
Thank you for your attention!
Questions?

Introduction to elasticsearch

More Related Content

What's hot

Similar to Introduction to elasticsearch

Recently uploaded

Introduction to elasticsearch