Skip to content

Commit febea9d

Browse files
committed
better proftrigger situation
* use new version which uses bytes consumed by process see Dieterbe/profiletrigger@e8a1450 * set default threshold of 25GB which should work great for 32GB systems
1 parent 0951546 commit febea9d

9 files changed

Lines changed: 23 additions & 20 deletions

File tree

docs/config.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ proftrigger-path = /tmp
109109
# minimum time between triggered profiles
110110
proftrigger-min-diff = 1h
111111
# if this many bytes allocated, trigger a heap profile
112-
proftrigger-heap-thresh = 10000000
112+
proftrigger-heap-thresh = 25000000000
113113
# only log incoming requests if their timerange is at least this duration. Use 0 to disable
114114
log-min-dur = 5min
115115
# only log log-level and higher. 0=TRACE|1=DEBUG|2=INFO|3=WARN|4=ERROR|5=CRITICAL|6=FATAL

docs/metrics.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,9 @@ your (infrequent) updates. The primary won't add them to its in-memory chunks,
1414
a counter of total amount of bytes allocated during process lifetime. (incl freed data)
1515
* `bytes_alloc.not_freed`:
1616
a gauge of currently allocated (within the runtime) memory.
17-
it does not include freed data and drops at every GC run.
18-
this is what is inspected by the profiletrigger
19-
note that total memory used by the process can be about 2x this.
17+
it does not include freed data so it drops at every GC run.
2018
* `bytes_sys`:
21-
the amount of bytes currently obtained from the system
19+
the amount of bytes currently obtained from the system by the process. This is what the profiletrigger looks at.
2220
* `cluster.promotion_wait`:
2321
how long a candidate (secondary node) has to wait until it can become a primary
2422
When the timer becomes 0 it means the in-memory buffer has been able to fully populate so that if you stop a primary

docs/operations.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,10 @@ Metrictank crashed. What to do?
4141
1) Check `dmesg` to see if it was killed by the kernel, maybe it was consuming too much RAM
4242
If it was, check the grafana dashboard which may explain why. (sudden increase in ingested data? increase in requests or the amount of data requested? slow requests?)
4343
Tips:
44-
* The [profiletrigger](https://github.com/raintank/metrictank/blob/master/docs/config.md#profiling-instrumentation-and-logging) functionality can automatically trigger a memory profile and save it to disk. This can be very helpful if suddently memory usage spikes up and then metrictank gets killed in seconds or minutes. It helps diagnose problems in the code base that may lead to memory savings. The profiletrigger looks at the `bytes_alloc.not_freed` metric which is just memory allocated within the runtime. Amount of memory consumed by the process may be twice this. So as a rule of thumb, if you have a system running just metrictank, set it to about 40% of available memory.
44+
* The [profiletrigger](https://github.com/raintank/metrictank/blob/master/docs/config.md#profiling-instrumentation-and-logging) functionality can automatically trigger
45+
a memory profile and save it to disk. This can be very helpful if suddently memory usage spikes up and then metrictank gets killed in seconds or minutes.
46+
It helps diagnose problems in the codebase that may lead to memory savings. The profiletrigger looks at the `bytes_sys` metric which is
47+
the amount of memory consumed by the process.
4548
* Use [rollups](https://github.com/raintank/metrictank/blob/master/docs/consolidation.md#rollups) to be able to answer queries for long timeframes with less data
4649
2) Check the metrictank log.
4750
If it exited due to a panic, you should probably open a [ticket](https://github.com/raintank/metrictank/issues) with the output of `metrictank --version`, the panic, and perhaps preceeding log data.

metrictank-sample.ini

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,9 @@ proftrigger-freq = 60s
9494
proftrigger-path = /tmp
9595
# minimum time between triggered profiles
9696
proftrigger-min-diff = 1h
97-
# if this many bytes allocated, trigger a heap profile
98-
proftrigger-heap-thresh = 10000000
97+
# if process consumes this many bytes (see bytes_sys in dashboard), trigger a heap profile for developer diagnosis
98+
# set it higher than your typical memory usage, but lower than how much RAM the process can take before its get killed
99+
proftrigger-heap-thresh = 25000000000
99100

100101
# only log incoming requests if their timerange is at least this duration. Use 0 to disable
101102
log-min-dur = 5min

metrictank.go

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ var (
105105
proftrigPath = flag.String("proftrigger-path", "/tmp", "path to store triggered profiles")
106106
proftrigFreqStr = flag.String("proftrigger-freq", "60s", "inspect status frequency. set to 0 to disable")
107107
proftrigMinDiffStr = flag.String("proftrigger-min-diff", "1h", "minimum time between triggered profiles")
108-
proftrigHeapThresh = flag.Int("proftrigger-heap-thresh", 10000000, "if this many bytes allocated, trigger a profile")
108+
proftrigHeapThresh = flag.Int("proftrigger-heap-thresh", 25000000000, "if this many bytes allocated, trigger a profile")
109109

110110
logMinDurStr = flag.String("log-min-dur", "5min", "only log incoming requests if their timerange is at least this duration. Use 0 to disable")
111111

@@ -124,13 +124,11 @@ var (
124124
points met.Gauge
125125

126126
// metric bytes_alloc.not_freed is a gauge of currently allocated (within the runtime) memory.
127-
// it does not include freed data and drops at every GC run.
128-
// this is what is inspected by the profiletrigger
129-
// note that total memory used by the process can be about 2x this.
127+
// it does not include freed data so it drops at every GC run.
130128
alloc met.Gauge
131129
// metric bytes_alloc.incl_freed is a counter of total amount of bytes allocated during process lifetime. (incl freed data)
132130
totalAlloc met.Gauge
133-
// metric bytes_sys is the amount of bytes currently obtained from the system
131+
// metric bytes_sys is the amount of bytes currently obtained from the system by the process. This is what the profiletrigger looks at.
134132
sysBytes met.Gauge
135133
clusterPrimary met.Gauge
136134

scripts/config/metrictank-docker.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ proftrigger-path = /tmp
9292
# minimum time between triggered profiles
9393
proftrigger-min-diff = 1h
9494
# if this many bytes allocated, trigger a heap profile
95-
proftrigger-heap-thresh = 10000000
95+
proftrigger-heap-thresh = 25000000000
9696

9797
# only log incoming requests if their timerange is at least this duration. Use 0 to disable
9898
log-min-dur = 5min

scripts/config/metrictank-package.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ proftrigger-path = /tmp
9292
# minimum time between triggered profiles
9393
proftrigger-min-diff = 1h
9494
# if this many bytes allocated, trigger a heap profile
95-
proftrigger-heap-thresh = 10000000
95+
proftrigger-heap-thresh = 25000000000
9696

9797
# only log incoming requests if their timerange is at least this duration. Use 0 to disable
9898
log-min-dur = 5min

vendor/github.com/Dieterbe/profiletrigger/heap/heap.go

Lines changed: 4 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vendor/vendor.json

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,11 @@
99
"revisionTime": "2015-09-30T14:07:41Z"
1010
},
1111
{
12-
"checksumSHA1": "GbgPwWPwxTliU7wwIwp93GcU+E4=",
12+
"checksumSHA1": "P8h2SWEK3NsJleSPe+mVNNpLG6Y=",
13+
"origin": "github.com/raintank/metrictank/vendor/github.com/Dieterbe/profiletrigger/heap",
1314
"path": "github.com/Dieterbe/profiletrigger/heap",
14-
"revision": "49951b329b2f0508075c2e8505599470bf7b20e3",
15-
"revisionTime": "2016-05-24T13:14:35Z"
15+
"revision": "d90c4b0cfeed756381675e85cc6e6b8a02cb01a6",
16+
"revisionTime": "2016-10-07T15:24:48Z"
1617
},
1718
{
1819
"checksumSHA1": "AAXMx9vb6vmVZF2ieqNepAfeJFM=",

0 commit comments

Comments
 (0)