Skip to content

Commit 5b49061

Browse files
zwangshengulysses-you
authored andcommitted
[KYUUBI apache#815] [DOC] [KUBERNETES] Doc for spark-block-cleaner
### _Why are the changes needed?_ Add Docs for kyuubi tools spark-block-cleaner. * Explain the parameters * Introduction to basic startup * Give an example ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [X] [Run test](https://kyuubi.readthedocs.io/en/latest/tools/testing.html#running-tests) locally before make a pull request Closes apache#815 from zwangsheng/doc/spark_block_cleaner. Closes apache#815 1ec6795 [Binjie Yang] delete todo bbf4d6e [Binjie Yang] make it common 9cf3e15 [Binjie Yang] format 0803995 [Binjie Yang] straighten out the article f834b38 [Binjie Yang] refactor 25be318 [Binjie Yang] fix 7304e59 [Binjie Yang] docs for spark-block-cleaner Authored-by: Binjie Yang <2213335496@qq.com> Signed-off-by: ulysses-you <ulyssesyou18@gmail.com>
1 parent df8da82 commit 5b49061

File tree

11 files changed

+139
-10
lines changed

11 files changed

+139
-10
lines changed
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

docs/develop_tools/index.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
.. image:: ../imgs/kyuubi_logo.png
2+
:align: center
3+
4+
Develop Tools
5+
===========
6+
7+
.. toctree::
8+
:maxdepth: 2
9+
:numbered: 3
10+
11+
building
12+
distribution
13+
build_document
14+
testing
15+
debugging
16+
community
17+
developer
File renamed without changes.

docs/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ Kyuubi provides both high availability and load balancing solutions based on Zoo
9090
integrations/index
9191
monitor/index
9292
sql/index
93+
tools/index
9394

9495
.. toctree::
9596
:caption: Kyuubi Insider
@@ -101,7 +102,7 @@ Kyuubi provides both high availability and load balancing solutions based on Zoo
101102
:caption: Contributing
102103
:maxdepth: 2
103104

104-
tools/index
105+
develop_tools/index
105106
community/index
106107

107108
.. toctree::

docs/tools/index.rst

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,11 @@
11
.. image:: ../imgs/kyuubi_logo.png
22
:align: center
33

4-
Develop Tools
4+
Tools
55
===========
66

77
.. toctree::
88
:maxdepth: 2
99
:numbered: 3
1010

11-
building
12-
distribution
13-
build_document
14-
testing
15-
debugging
16-
community
17-
developer
11+
spark_block_cleaner

docs/tools/spark_block_cleaner.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
<div align=center>
2+
3+
![](../imgs/kyuubi_logo.png)
4+
5+
</div>
6+
7+
# Kubernetes Tools Spark Block Cleaner
8+
9+
## Requirements
10+
11+
You'd better have cognition upon the following things when you want to use spark-block-cleaner.
12+
13+
* Read this article
14+
* An active Kubernetes cluster
15+
* [Kubectl](https://kubernetes.io/docs/reference/kubectl/overview/)
16+
* [Docker](https://www.docker.com/)
17+
18+
## Scenes
19+
20+
When you're using Spark On Kubernetes with Client mode and don't use `emptyDir` for Spark `local-dir` type, you may face the same scenario that executor pods deleted without clean all the Block files. It may cause disk overflow.
21+
22+
Therefore, we chose to use Spark Block Cleaner to clear the block files accumulated by Spark.
23+
24+
## Principle
25+
26+
When deploying Spark Block Cleaner, we will configure volumes for the destination folder. Spark Block Cleaner will perceive the folder by the parameter `CACHE_DIRS`.
27+
28+
Spark Block Cleaner will clear the perceived folder in a fixed loop(which can be configured by `SCHEDULE_INTERVAL`). And Spark Block Cleaner will select folder start with `blockmgr` and `spark` for deletion using the logic Spark uses to create those folders.
29+
30+
Before deleting those files, Spark Block Cleaner will determine whether it is a recently modified file(depending on whether the file has not been acted on within the specified time which configured by `FILE_EXPIRED_TIME`). Only delete files those beyond that time interval.
31+
32+
And Spark Block Cleaner will check the disk utilization after clean, if the remaining space is less than the specified value(control by `FREE_SPACE_THRESHOLD`), will trigger deep clean(which file expired time control by `DEEP_CLEAN_FILE_EXPIRED_TIME`).
33+
34+
## Usage
35+
36+
Before you start using Spark Block Cleaner, you should build its docker images.
37+
38+
### Build Block Cleaner Docker Image
39+
40+
In the `KYUUBI_HOME` directory, you can use the following cmd to build docker image.
41+
```shell
42+
docker build ./tools/spark-block-cleaner/kubernetes/docker
43+
```
44+
45+
### Modify spark-block-cleaner.yml
46+
47+
You need to modify the `${KYUUBI_HOME}/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml` to fit your current environment.
48+
49+
In Kyuubi tools, we recommend using `DaemonSet` to start , and we offer default yaml file in daemonSet way.
50+
51+
Base file structure :
52+
```yaml
53+
apiVersion
54+
kind
55+
metadata
56+
name
57+
namespace
58+
spec
59+
select
60+
template
61+
metadata
62+
spce
63+
containers
64+
- image
65+
- volumeMounts
66+
- env
67+
volumes
68+
```
69+
70+
You can use affect the performance of Spark Block Cleaner through configure parameters in containers env part of `spark-block-cleaner.yml`.
71+
```yaml
72+
env:
73+
- name: CACHE_DIRS
74+
value: /data/data1,/data/data2
75+
- name: FILE_EXPIRED_TIME
76+
value: 604800
77+
- name: DEEP_CLEAN_FILE_EXPIRED_TIME
78+
value: 432000
79+
- name: FREE_SPACE_THRESHOLD
80+
value: 60
81+
- name: SCHEDULE_INTERVAL
82+
value: 3600
83+
```
84+
85+
The most important thing, configure volumeMounts and volumes corresponding to Spark local-dirs.
86+
87+
For example, Spark use /spark/shuffle1 as local-dir, you can configure like:
88+
```yaml
89+
volumes:
90+
- name: block-files-dir-1
91+
hostPath:
92+
path: /spark/shuffle1
93+
```
94+
```yaml
95+
volumeMounts:
96+
- name: block-files-dir-1
97+
mountPath: /data/data1
98+
```
99+
```yaml
100+
env:
101+
- name: CACHE_DIRS
102+
value: /data/data1
103+
```
104+
105+
### Start daemonSet
106+
107+
After you finishing modifying the above, you can use the following command `kubectl apply -f ${KYUUBI_HOME}/tools/spark-block-cleaner/kubernetes/spark-block-cleaner.yml` to start daemonSet.
108+
109+
## Related parameters
110+
111+
Name | Default | unit | Meaning
112+
--- | --- | --- | ---
113+
CACHE_DIRS | /data/data1,/data/data2| | The target dirs in container path which will clean block files.
114+
FILE_EXPIRED_TIME | 604800 | seconds | Cleaner will clean the block files which current time - last modified time more than the fileExpiredTime.
115+
DEEP_CLEAN_FILE_EXPIRED_TIME | 432000 | seconds | Deep clean will clean the block files which current time - last modified time more than the deepCleanFileExpiredTime.
116+
FREE_SPACE_THRESHOLD | 60 | % | After first clean, if free Space low than threshold trigger deep clean.
117+
SCHEDULE_INTERVAL | 3600 | seconds | Cleaner sleep between cleaning.

0 commit comments

Comments
 (0)