ElasticSearch + python
Getting started with ElasticSearch
Valeria Chemtai
Software Developer, Andela
@valeriachemtai
Search…
What is search?
What is the main objective of search?
How search works
Technologies involved - web crawlers, inverted index, scoring, search
Inverted Index
Some image from stackoverflow
Expectations:
1. Understand Elasticsearch and its’ basic
concepts
2. Install and setup a node server
3. Index, update and delete documents
4. Incorporate Elasticsearch into a simple python
application.
Prerequisites
1. Command line knowledge
2. Familiarity with RESTful APIs
Presentation materials
Github https://github.com/valeria-
chemtai/python_meetup
Elasticsearch is a highly scalable open-source full-
text search and analytics engine.
It can be considered as a nosql distributed full text
database.
Elasticsearch is built on top of Apache Lucene
which is free (open sourced)
Other Technologies similar to Elasticsearch
1. Apache Solr
2. Nutch
3. CrateDB - open source SQL distributed DB
● Speed - Operates near real time
● Easy to use with REST API calls
● Scalable
● Robust search - Flexible Query DSL
● Offers statistical analysis tool.
● Many extensions - cloud services, client libraries in
many languages
Why Use Elasticsearch
1. Very poor documentation
2. Not good for use with relational data
3. Only supports JSON data
Limitations of Elasticsearch
Where can I use Elasticsearch
Elasticsearch is mainly used for non-relational
data:
● Blogs
● Data analytics
● Documents with schemaless structure
1. Document - Information to be indexed in JSON form.
2. Type - Logical grouping of documents
3. Index - Collection of types of documents with some similarities.
4. Node - Single server that takes part in indexing
5. Shards - Multiple elements within a node
6. Replicas - Copies of shards
7. Cluster - A collection of nodes
Basic Concepts
SQL Database vs Elasticsearch
1. Install Java version 7 and above from:
https://www.java.com/en/
2. Install Elasticsearch version 5 and above:
https://www.elastic.co/downloads/elasticsearch
3. Then configure system variables.
Installation and Setup
1. Check cluster health
2. Create an index
3. Add documents to an index
4. Retrieve documents
5. Update documents
6. Delete documents and an entire index
Exploring Elasticsearch with curl
Code Implementation
Create a virtual environment
Install elasticsearch in the environment:
● $ pip install elasticsearch
Invoke python shell:
● $ python
Exploring Elasticsearch with Python
1. Create an index
2. Add documents to an index
3. Retrieve documents
4. Update documents
5. Delete documents and an entire index
Exploring Elasticsearch with python
Code Implementation
Code Implementation
Wrapping it all up into a simple
command line app
Q
U
E
S
T
I
O
N
S
?
More on Elasticsearch
Getting started with ElasticSearch-Python :: Part One
Getting started with ElasticSearch-Python :: Part Two
Elastic Website
More
Questions?
valeriachemtai28@gmail.com
gitter.im/valeria-chemtai
medium.com/@valeriachemtai28

Elasticsearch python

  • 1.
    ElasticSearch + python Gettingstarted with ElasticSearch Valeria Chemtai Software Developer, Andela @valeriachemtai
  • 2.
    Search… What is search? Whatis the main objective of search? How search works Technologies involved - web crawlers, inverted index, scoring, search
  • 3.
    Inverted Index Some imagefrom stackoverflow
  • 4.
    Expectations: 1. Understand Elasticsearchand its’ basic concepts 2. Install and setup a node server 3. Index, update and delete documents 4. Incorporate Elasticsearch into a simple python application.
  • 5.
    Prerequisites 1. Command lineknowledge 2. Familiarity with RESTful APIs
  • 6.
  • 7.
    Elasticsearch is ahighly scalable open-source full- text search and analytics engine. It can be considered as a nosql distributed full text database. Elasticsearch is built on top of Apache Lucene which is free (open sourced)
  • 8.
    Other Technologies similarto Elasticsearch 1. Apache Solr 2. Nutch 3. CrateDB - open source SQL distributed DB
  • 9.
    ● Speed -Operates near real time ● Easy to use with REST API calls ● Scalable ● Robust search - Flexible Query DSL ● Offers statistical analysis tool. ● Many extensions - cloud services, client libraries in many languages Why Use Elasticsearch
  • 10.
    1. Very poordocumentation 2. Not good for use with relational data 3. Only supports JSON data Limitations of Elasticsearch
  • 11.
    Where can Iuse Elasticsearch Elasticsearch is mainly used for non-relational data: ● Blogs ● Data analytics ● Documents with schemaless structure
  • 12.
    1. Document -Information to be indexed in JSON form. 2. Type - Logical grouping of documents 3. Index - Collection of types of documents with some similarities. 4. Node - Single server that takes part in indexing 5. Shards - Multiple elements within a node 6. Replicas - Copies of shards 7. Cluster - A collection of nodes Basic Concepts
  • 13.
    SQL Database vsElasticsearch
  • 16.
    1. Install Javaversion 7 and above from: https://www.java.com/en/ 2. Install Elasticsearch version 5 and above: https://www.elastic.co/downloads/elasticsearch 3. Then configure system variables. Installation and Setup
  • 17.
    1. Check clusterhealth 2. Create an index 3. Add documents to an index 4. Retrieve documents 5. Update documents 6. Delete documents and an entire index Exploring Elasticsearch with curl
  • 18.
  • 19.
    Create a virtualenvironment Install elasticsearch in the environment: ● $ pip install elasticsearch Invoke python shell: ● $ python Exploring Elasticsearch with Python
  • 20.
    1. Create anindex 2. Add documents to an index 3. Retrieve documents 4. Update documents 5. Delete documents and an entire index Exploring Elasticsearch with python
  • 21.
  • 22.
    Code Implementation Wrapping itall up into a simple command line app
  • 23.
  • 24.
    More on Elasticsearch Gettingstarted with ElasticSearch-Python :: Part One Getting started with ElasticSearch-Python :: Part Two Elastic Website
  • 25.

Editor's Notes

  • #3 Find most relevant documents with our search terms. Search has to know of the documents existence, index the document, know the relevancy of the document, present/retrieve searched document by level of relevancy. Data is tokenized to individual terms extracted (from text to words) and stored in a data structure called the inverted index.
  • #4 Inverted index is the heart of every search engine
  • #8 Nosql - document based. Has no tables and columns, no schemas or structured query language for searching. Apache Lucene - High performance indexing and search library with full text search engine
  • #10 Easy to scale horizontally
  • #13 Index is like a database Clustering is what makes elasticsearch easy to scale horizontally
  • #14 SQL table with schema Documents in JSON format
  • #16 Scaling elasticsearch