0

I am running a Jenkins Controller in kubernetes. I have noticed that the controller has been restarting ALOT.

kgp jkmaster-0
NAME                  READY   STATUS    RESTARTS   AGE
jkmaster-0            1/1     Running   8          30m

The memory allocation to the pod is as follows

    Limits:
      memory:  2500M
    Requests:
      cpu:      300m
      memory:   1G

As long as the controller is idle, I dont see any spikes occurring. But as soon as I start spawning jobs, I notice that there are spikes and each spike results in a OOMError and a restart happens

enter image description here

kgp jkmaster-0
NAME                  READY   STATUS      RESTARTS   AGE
jkmaster-0            0/1     OOMKilled   3          3h8m

Inorder to look into this further, I would like to generate a Heap Dump. So what I am done is to add the following

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/srv/jenkins/

to JAVA_OPTS. I am expecting that the next time when the jenkins controller hits OOM, it should generate a Heapdump with under /srv/jenkins/ but there is none. Any idea if there is something I have missed ?

There is no file of the type java_pid.hprof under /srv/jenkins/ after a restart.

All JAVA_OPTS

JAVA_OPTS: -Djava.awt.headless=true -XX:InitialRAMPercentage=10.0 -XX:MaxRAMPercentage=60.0 -server -XX:NativeMemoryTracking=summary -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication \
-XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -XX:+PrintFlagsFinal -Djenkins.install.runSetupWizard=false -Dhudson.DNSMultiCast.disabled=true \
-Dhudson.slaves.NodeProvisioner.initialDelay=5000 -Dsecurerandom.source=file:/dev/urandom \
-Xlog:gc:file=/srv/jenkins/gc-%t.log -Xlog:gc*=debug -XX:+AlwaysPreTouch -XX:+DisableExplicitGC \
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/srv/jenkins/ -Dhudson.model.ParametersAction.keepUndefinedParameters=true -Dhudson.model.DownloadService.noSignatureCheck=true
6
  • How much ram do your nodes have? Commented Feb 2, 2021 at 18:08
  • @paltaa 25GB memory allocatable Commented Feb 2, 2021 at 18:11
  • 1
    Well, unless you have volume mounted /srv/jenkins to a hostPath: or a PVC, it is very likely the Pod bounce is resetting the root FS in your container Commented Feb 3, 2021 at 3:50
  • @mdaniel /srv/jenkins is mounted on a PVC. Commented Feb 3, 2021 at 15:45
  • 1
    I just now realized the disconnect: OOMKilled is something kubelet does to your container, and not something that the JVM does to itself. That process was kill -9-ed (in fact, I don't know of any "warning shot" k8s offers the Pod); if you are interested in having the JVM participate in the OOM triage, you'll want to lower the Xmx below the Pod's resource boundary, so the JVM exhausts itself before k8s steps in with a more violent outcome Commented Feb 3, 2021 at 16:58

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.