基本介绍和使用

参考

全文搜索引擎 Elasticsearch 入门教程 - 阮一峰的网络日志 (ruanyifeng.com) (opens new window)

# 安装

我们到官网下载 Download Elasticsearch Free | Get Started Now | Elastic | Elastic (opens new window)

自己下载最新版，上传到服务器，然后解压，进入解压目录后我们启动一下

 ./bin/elasticsearch

注意，不能用 root 用户启动，启动完毕后我们再开一个shell来发送curl请求

如果返回了数据，那么就说明启动成功！

如果我们需要远程访问的话，可以修改配置文件，修改 Elastic 安装目录的config/elasticsearch.yml文件，去掉network.host的注释，将它的值改成0.0.0.0，然后重新启动 Elastic（如果报错，我们就改成下面这样）

network.host: 0.0.0.0
http.port: 9200
transport.host: localhost
transport.tcp.port: 9300

1
2
3
4

这样我们远程也可以访问了

# 基本概念

# Node 与 Cluster

Elastic 本质上是一个分布式数据库，允许多台服务器协同工作，每台服务器可以运行多个 Elastic 实例。

单个 Elastic 实例称为一个节点（node）。一组节点构成一个集群（cluster）。

# Index和Document

index相当于MongoDB中的collection。Document相当于MongoDB中document

Elastic 会索引所有字段，经过处理后写入一个反向索引（Inverted Index）。查找数据的时候，直接查找该索引。

所以，Elastic 数据管理的顶层单位就叫做 Index（索引）。它是单个数据库的同义词。每个 Index （即数据库）的名字必须是小写。下面的命令可以查看当前节点的所有 Index。

curl -X GET 'http://localhost:9200/_cat/indices?v'

Index 里面单条的记录称为 Document（文档）。许多条 Document 构成了一个 Index。Document 使用 JSON 格式表示，下面是一个例子。

{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}

1
2
3
4
5

同一个 Index 里面的 Document，不要求有相同的结构（scheme），但是最好保持相同，这样有利于提高搜索效率。

# 实际操作

# 新建和删除 Index

比如我们新建一个叫 xiaoyou 的index

xiaoyou@xiaoyou:~$ curl -X PUT 'localhost:9200/xiaoyou'

{"acknowledged":true,"shards_acknowledged":true,"index":"xiaoyou"}

我们使用delete命令来删除index

$ curl -X DELETE 'localhost:9200/weather'

# 中文分词

当一个文档被存储时，ES会使用分词器从文档中提取出若干词元（token）来支持索引的存储和搜索。我们可以使用中文分词插件来进行分词操作

# 新增记录

向指定的 /Index/Type 发送 PUT 请求，就可以在 Index 里面新增一条记录。比如，向/accounts/person发送请求，就可以新增一条人员记录。

$ curl -X PUT 'localhost:9200/accounts/person/1' -d '
{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}'

1
2
3
4
5
6

我们可以直接使用Apifox来进行调试

服务器返回如下数据

新增记录的时候，也可以不指定 Id，这时要改成 POST 请求。

$ curl -X POST 'localhost:9200/accounts/person' -d '
{
  "user": "李四",
  "title": "工程师",
  "desc": "系统管理"
}'

1
2
3
4
5
6

这样会返回一个随机的字符串，比如我这里返回

{
    "_index": "xiaoyou",
    "_type": "person",
    "_id": "5_j0QXkB5D9ZXiqHo1tI",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 1
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14

# 查询记录

向/Index/Type/Id发出 GET 请求，就可以查看这条记录。

curl 'localhost:9200/accounts/person/1?pretty=true'

上面代码请求查看/accounts/person/1这条记录，URL 的参数pretty=true表示以易读的格式返回。返回的数据中，found字段表示查询成功，_source字段返回原始记录。

# 删除记录

直接发出delete请求就可以了

$ curl -X DELETE 'localhost:9200/accounts/person/1'

# 更新记录

使用put命令就可以更新了

$ curl -X PUT 'localhost:9200/accounts/person/1' -d '
{
    "user" : "张三",
    "title" : "工程师",
    "desc" : "数据库管理，软件开发"
}' 
# 返回下面的格式
{
  "_index":"accounts",
  "_type":"person",
  "_id":"1",
  "_version":2,
  "result":"updated",
  "_shards":{"total":2,"successful":1,"failed":0},
  "created":false
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

上面代码中，我们将原始数据从"数据库管理"改成"数据库管理，软件开发"。返回结果里面，有几个字段发生了变化。

"_version" : 2,
"result" : "updated",
"created" : false

1
2
3

可以看到，记录的 Id 没变，但是版本（version）从1变成2，操作类型（result）从created变成updated，created字段变成false，因为这次不是新建记录。

# 数据查询

使用 GET 方法，直接请求/Index/Type/_search，就会返回所有记录

curl 'http://192.168.123.64:9200/xiaoyou/person/_search'

{
    "took": 1097,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1,
        "hits": [
            {
                "_index": "xiaoyou",
                "_type": "person",
                "_id": "5_j0QXkB5D9ZXiqHo1tI",
                "_score": 1,
                "_source": {
                    "name": "小游",
                    "title": "这个是标题"
                }
            },
            {
                "_index": "xiaoyou",
                "_type": "person",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "name": "小游",
                    "title": "这个是标题1"
                }
            }
        ]
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

上面代码中，返回结果的 took字段表示该操作的耗时（单位为毫秒），timed_out字段表示是否超时，hits字段表示命中的记录，里面子字段的含义如下。

total：返回记录数，本例是2条。
max_score：最高的匹配程度，本例是1.0。
hits：返回的记录组成的数组。

返回的记录中，每条记录都有一个_score字段，表示匹配的程序，默认是按照这个字段降序排列。

# 全文搜索

Elastic 的查询非常特别，使用自己的查询语法 (opens new window)，要求 GET 请求带有数据体。

$ curl 'localhost:9200/accounts/person/_search'  -d '
{
  "query" : { "match" : { "desc" : "软件" }}
}'

1
2
3
4

上面代码使用 Match 查询 (opens new window)，指定的匹配条件是desc字段里面包含"软件"这个词。返回结果如下。

# 可视化工具

可以使用这个

360EntSecGroup-Skylar/ElasticHD: Elasticsearch 可视化DashBoard, 支持Es监控、实时搜索，Index template快捷替换修改，索引列表信息查看， SQL converts to DSL等 (github.com) (opens new window)

感觉还行吧~

← 基本介绍和使用软件安装→