Skip to content

netdata/helmchart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Netdata Helm chart for Kubernetes deployments

Artifact HUB

Version: 3.7.105

AppVersion: v1.47.3

Based on the work of varyumin (https://github /varyumin/netdata).

Introduction

This chart bootstraps aNetdatadeployment on aKubernetes cluster using theHelmpackage manager.

By default, the chart installs:

  • A Netdata child pod on each node of a cluster, using aDaemonset
  • A Netdata k8s state monitoring pod on one node, using aDeployment.This virtual node is callednetdata-k8s-state.
  • A Netdata parent pod on one node, using aDeployment.This virtual node is callednetdata-parent.

Disabled by default:

  • A Netdata restarterCronJob.Its main purpose is to automatically update Netdata when using nightly releases.

The child pods and the state pod function as headless collectors that collect and forward all the metrics to the parent pod. The parent pod uses persistent volumes to store metrics and alarms, handle alarm notifications, and provide the Netdata UI to view metrics using an ingress controller.

Please validate that the settings are suitable for your cluster before using them in production.

Prerequisites

Installing the Helm chart

You can install the Helm chart via our Helm repository, or by cloning this repository.

Installing via our Helm repository (recommended)

To use Netdata's Helm repository, run the following commands:

helm repo add netdata https://netdata.github.io/helmchart/
helm install netdata netdata/netdata

See ourinstall Netdata on Kubernetes documentation for detailed installation and configuration instructions.The remainder of this document assumes you installed the Helm chart by cloning this repository, and thus uses slightly differenthelm install/helm upgrade commands.

Install by cloning the repository

Clone the repository locally.

git clone https://github /netdata/helmchart.git netdata-helmchart

To install the chart with the release namenetdata:

helm install netdata./netdata-helmchart/charts/netdata

The command deploys ingress on the Kubernetes cluster in the default configuration. Theconfiguration section lists the parameters that can be configured during installation.

Tip:List all releases usinghelm list.

Uninstalling the Chart

To uninstall/delete themy-releasedeployment:

helm delete netdata

The command removes all the Kubernetes components associated with the chart and deletes the release.

Configuration

The following table lists the configurable parameters of the netdata chart and their default values.

Parameter Description Default
kubeVersion Kubernetes version Autodetected
replicaCount Number ofreplicasfor the parent netdataDeployment 1
imagePullSecrets An optional list of references to secrets in the same namespace to use for pulling any of the images []
image.repository Container image repo netdata/netdata
image.tag Container image tag Latest stable netdata release
image.pullPolicy Container image pull policy Always
service.type Parent service type ClusterIP
service.port Parent service port 19999
service.loadBalancerIP Static LoadBalancer IP, only to be used with service type=LoadBalancer ""
service.loadBalancerSourceRanges List of allowed IPs for LoadBalancer []
service.externalTrafficPolicy Denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints Cluster
service.healthCheckNodePort Specifies the health check node port Allocated a port from your cluster's NodePort range
service.clusterIP Specific cluster IP when service type is cluster IP. UseNonefor headless service Allocated an IP from your cluster's service IP range
service.annotations Additional annotations to add to the service {}
ingress.enabled Create Ingress to access the netdata web UI true
ingress.apiVersion apiVersion for the Ingress Depends on Kubernetes version
ingress.annotations Associate annotations to the Ingress kubernetes.io/ingress.class: nginxandkubernetes.io/tls-acme: "true"
ingress.path URL path for the ingress. If changed, a proxy server needs to be configured in front of netdata to translate path from a custom one to a/ /
ingress.pathType pathType for your ingress contrller. Default value is correct for nginx. If you use yor own ingress controller, check the correct value Prefix
ingress.hosts URL hostnames for the ingress (they need to resolve to the external IP of the ingress controller) netdata.k8s.local
ingress.spec Spec section for ingress object. Everything there will be included into the object on deplyoment {}
ingress.spec.ingressClassName Ingress class declaration for Kubernetes version 1.19+. Annotation ingress.class should be removed if this type of declaration is used nginx
rbac.create if true, create & use RBAC resources true
rbac.pspEnabled Specifies whether a PodSecurityPolicy should be created. true
serviceAccount.create if true, create a service account true
serviceAccount.name The name of the service account to use. If not set and create is true, a name is generated using the fullname template. netdata
clusterrole.name Name of the cluster role linked with the service account netdata
APIKEY The key shared between the parent and the child netdata for streaming 11111111-2222-3333-4444-555555555555
restarter.enabled Install CronJob to update Netdata Pods false
restarter.schedule The schedule in Cron format 00 06 * * *
restarter.image.repository Container image repo bitnami/kubectl
restarter.image.tag Container image tag 1.25
restarter.image.pullPolicy Container image pull policy Always
restarter.image.restartPolicy Container restart policy Never
restarter.image.resources Container resources {}
restarter.concurrencyPolicy Specifies how to treat concurrent executions of a job Forbid
restarter.startingDeadlineSeconds Optional deadline in seconds for starting the job if it misses scheduled time for any reason 60
restarter.successfulJobsHistoryLimit The number of successful finished jobs to retain 3
restarter.failedJobsHistoryLimit The number of failed finished jobs to retain 3
parent.enabled Install parent Deployment to receive metrics from children nodes true
parent.port Parent's listen port 19999
parent.resources Resources for the parent deployment {}
parent.livenessProbe.initialDelaySeconds Number of seconds after the container has started before liveness probes are initiated 0
parent.livenessProbe.failureThreshold When a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container 3
parent.livenessProbe.periodSeconds How often (in seconds) to perform the liveness probe 30
parent.livenessProbe.successThreshold Minimum consecutive successes for the liveness probe to be considered successful after having failed 1
parent.livenessProbe.timeoutSeconds Number of seconds after which the liveness probe times out 1
parent.readinessProbe.initialDelaySeconds Number of seconds after the container has started before readiness probes are initiated 0
parent.readinessProbe.failureThreshold When a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready 3
parent.readinessProbe.periodSeconds How often (in seconds) to perform the readiness probe 30
parent.readinessProbe.successThreshold Minimum consecutive successes for the readiness probe to be considered successful after having failed 1
parent.readinessProbe.timeoutSeconds Number of seconds after which the readiness probe times out 1
parent.terminationGracePeriodSeconds Duration in seconds the pod needs to terminate gracefully 300
parent.nodeSelector Node selector for the parent deployment {}
parent.tolerations Tolerations settings for the parent deployment []
parent.affinity Affinity settings for the parent deployment {}
parent.priorityClassName Pod priority class name for the parent deployment ""
parent.database.persistence Whether the parent should use a persistent volume for the DB true
parent.database.storageclass The storage class for the persistent volume claim of the parent's database store, mounted to/var/cache/netdata the default storage class
parent.database.volumesize The storage space for the PVC of the parent database 5Gi
parent.alarms.persistence Whether the parent should use a persistent volume for the alarms log true
parent.alarms.storageclass The storage class for the persistent volume claim of the parent's alarm log, mounted to/var/lib/netdata the default storage class
parent.alarms.volumesize The storage space for the PVC of the parent alarm log 1Gi
parent.env Set environment parameters for the parent deployment {}
parent.envFrom Set environment parameters for the parent deployment from ConfigMap and/or Secrets []
parent.podLabels Additional labels to add to the parent pods {}
parent.podAnnotations Additional annotations to add to the parent pods {}
parent.dnsPolicy DNS policy for pod Default
parent.configs Manage custom parent's configs SeeConfiguration files.
parent.claiming.enabled Enable parent claiming for netdata cloud false
parent.claiming.token Claim token ""
parent.claiming.room Comma separated list of claim rooms IDs ""
parent.extraVolumeMounts Additional volumeMounts to add to the parent pods []
parent.extraVolumes Additional volumes to add to the parent pods []
k8sState.enabled Install this Deployment to gather data fr K8s cluster yes
k8sState.port Listen port service.port(Same as parent's listen port)
k8sState.resources Compute resources required by this Deployment {}
k8sState.livenessProbe.initialDelaySeconds Number of seconds after the container has started before liveness probes are initiated 0
k8sState.livenessProbe.failureThreshold When a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container 3
k8sState.livenessProbe.periodSeconds How often (in seconds) to perform the liveness probe 30
k8sState.livenessProbe.successThreshold Minimum consecutive successes for the liveness probe to be considered successful after having failed 1
k8sState.livenessProbe.timeoutSeconds Number of seconds after which the liveness probe times out 1
k8sState.readinessProbe.initialDelaySeconds Number of seconds after the container has started before readiness probes are initiated 0
k8sState.readinessProbe.failureThreshold When a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready 3
k8sState.readinessProbe.periodSeconds How often (in seconds) to perform the readiness probe 30
k8sState.readinessProbe.successThreshold Minimum consecutive successes for the readiness probe to be considered successful after having failed 1
k8sState.readinessProbe.timeoutSeconds Number of seconds after which the readiness probe times out 1
k8sState.terminationGracePeriodSeconds Duration in seconds the pod needs to terminate gracefully 30
k8sState.terminationGracePeriodSeconds Duration in seconds the pod needs to terminate gracefully 300
k8sState.nodeSelector Node selector {}
k8sState.tolerations Tolerations settings []
k8sState.affinity Affinity settings {}
k8sState.priorityClassName Pod priority class name ""
k8sState.podLabels Additional labels {}
k8sState.podAnnotations Additional annotations {}
k8sState.podAnnotationAppArmor.enabled Whether or not to include the AppArmor security annotation true
k8sState.dnsPolicy DNS policy for pod ClusterFirstWithHostNet
k8sState.persistence.enabled Whether should use a persistent volume for/var/lib/netdata true
k8sState.persistence.storageclass The storage class for the persistent volume claim of/var/lib/netdata the default storage class
k8sState.persistence.volumesize The storage space for the PVC of/var/lib/netdata 1Gi
k8sState.env Set environment parameters {}
k8sState.envFrom Set environment parameters from ConfigMap and/or Secrets []
k8sState.configs Manage custom configs SeeConfiguration files.
k8sState.claiming.enabled Enable claiming for netdata cloud false
k8sState.claiming.token Claim token ""
k8sState.claiming.room Comma separated list of claim rooms IDs ""
k8sState.extraVolumeMounts Additional volumeMounts to add to the k8sState pods []
k8sState.extraVolumes Additional volumes to add to the k8sState pods []
child.enabled Install child DaemonSet to gather data from nodes true
child.port Children's listen port service.port(Same as parent's listen port)
child.updateStrategy An update strategy to replace existing DaemonSet pods with new pods {}
child.resources Resources for the child DaemonSet {}
child.livenessProbe.initialDelaySeconds Number of seconds after the container has started before liveness probes are initiated 0
child.livenessProbe.failureThreshold When a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container 3
child.livenessProbe.periodSeconds How often (in seconds) to perform the liveness probe 30
child.livenessProbe.successThreshold Minimum consecutive successes for the liveness probe to be considered successful after having failed 1
child.livenessProbe.timeoutSeconds Number of seconds after which the liveness probe times out 1
child.readinessProbe.initialDelaySeconds Number of seconds after the container has started before readiness probes are initiated 0
child.readinessProbe.failureThreshold When a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready 3
child.readinessProbe.periodSeconds How often (in seconds) to perform the readiness probe 30
child.readinessProbe.successThreshold Minimum consecutive successes for the readiness probe to be considered successful after having failed 1
child.readinessProbe.timeoutSeconds Number of seconds after which the readiness probe times out 1
child.terminationGracePeriodSeconds Duration in seconds the pod needs to terminate gracefully 30
child.nodeSelector Node selector for the child daemonsets {}
child.tolerations Tolerations settings for the child daemonsets - operator: Existswitheffect: NoSchedule
child.affinity Affinity settings for the child daemonsets {}
child.priorityClassName Pod priority class name for the child daemonsets ""
child.env Set environment parameters for the child daemonset {}
child.envFrom Set environment parameters for the child daemonset from ConfigMap and/or Secrets []
child.podLabels Additional labels to add to the child pods {}
child.podAnnotations Additional annotations to add to the child pods {}
child.hostNetwork Usage of host networking and ports true
child.dnsPolicy DNS policy for pod. Should beClusterFirstWithHostNetifchild.hostNetwork = true ClusterFirstWithHostNet
child.podAnnotationAppArmor.enabled Whether or not to include the AppArmor security annotation true
child.persistence.hostPath Host node directory for storing child instance data /var/lib/netdata-k8s-child
child.persistence.enabled Whether or not to persist/var/lib/netdatain thechild.persistence.hostPath. true
child.podsMetadata.useKubelet Send requests to the Kubelet /pods endpoint instead of Kubernetes API server to get pod metadata false
child.podsMetadata.kubeletUrl Kubelet URL https://localhost:10250
child.configs Manage custom child's configs SeeConfiguration files.
child.claiming.enabled Enable child claiming for netdata cloud false
child.claiming.token Claim token ""
child.claiming.room Comma separated list of claim rooms IDs ""
child.extraVolumeMounts Additional volumeMounts to add to the child pods []
child.extraVolumes Additional volumes to add to the child pods []
notifications.slack.webhook_url Slack webhook URL ""
notifications.slack.recipient Slack recipient list ""
initContainersImage.repository Init containers' image repository alpine
initContainersImage.tag Init containers' image tag latest
initContainersImage.pullPolicy Init containers' image pull policy Always
sysctlInitContainer.enabled Enable an init container to modify Kernel settings false
sysctlInitContainer mand sysctl init container command to execute []
sysctlInitContainer.resources sysctl Init container CPU/Memory resource requests/limits {}
sd.image.repository Service-discovery image repo netdata/agent-sd
sd.image.tag Service-discovery image tag Latest stable release (e.g.v0.2.2)
sd.image.pullPolicy Service-discovery image pull policy Always
sd.child.enabled Add service-discovery sidecar container to the netdata child pod definition true
sd.child.resources Child service-discovery container CPU/Memory resource requests/limits {resources: {limits: {cpu: 50m, memory: 150Mi}, requests: {cpu: 50m, memory: 100Mi}}}
sd.child.configmap.name Child service-discovery ConfigMap name netdata-child-sd-config-map
sd.child.configmap.key Child service-discovery ConfigMap key config.yml
sd.child.configmap.from.file File to use for child service-discovery configuration generation sdconfig/sd-child.yml
sd.child.configmap.from.value Value to use for child service-discovery configuration generation {}

Example to set the parameters from the command line:

$helm install./netdata --name my-release \
--set notifications.slack.webhook_url=MySlackAPIURL \
--set notifications.slack.recipient= "@MyUser MyChannel"

Another example, to set a different ingress controller.

By defaultkubernetes.io/ingress.classset to usenginxas an ingress controller, but you can setTraefikas your ingress controller by settingingress.annotations.

$ helm install./netdata --name my-release \
--set ingress.annotations=kubernetes.io/ingress.class: traefik

Alternatively to passing each variable in the command line, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

$helm install./netdata --name my-release -f values.yaml

Tip:You can use the default values.yaml

Note::To opt out of anonymous statistics, set theDO_NOT_TRACK environment variable to non-zero or non-empty value in parent.env/child.envconfiguration (e.g:DO_NOT_TRACK: 1) or uncomment the line invalues.yml.

Configuration files

Parameter Description Default
parent.configs.netdata Contents of the parent'snetdata.conf memory mode = dbengine
parent.configs.stream Contents of the parent'sstream.conf Store child data, accept all connections, and issue alarms for child data.
parent.configs.health Contents ofhealth_alarm_notify.conf Email disabled, a sample of the required settings for Slack notifications
parent.configs.exporting Contents ofexporting.conf Disabled
k8sState.configs.netdata Contents ofnetdata.conf No persistent storage, no alarms
k8sState.configs.stream Contents ofstream.conf Send metrics to the parent at netdata:{{ service.port }}
k8sState.configs.exporting Contents ofexporting.conf Disabled
k8sState.configs.go.d Contents ofgo.d.conf Only k8s_state enabled
k8sState.configs.go.d-k8s_state Contents ofgo.d/k8s_state.conf k8s_state configuration
child.configs.netdata Contents of the child'snetdata.conf No persistent storage, no alarms, no UI
child.configs.stream Contents of the child'sstream.conf Send metrics to the parent at netdata:{{ service.port }}
child.configs.exporting Contents of the child'sexporting.conf Disabled
child.configs.kubelet Contents of the child'sgo.d/k8s_kubelet.confthat drives the kubelet collector Update metrics every sec, do not retry to detect the endpoint, look for the kubelet metrics athttp://127.0.0.1:10255/metrics
child.configs.kubeproxy Contents of the child'sgo.d/k8s_kubeproxy.confthat drives the kubeproxy collector Update metrics every sec, do not retry to detect the endpoint, look for the coredns metrics athttp://127.0.0.1:10249/metrics

To deploy additional netdata user configuration files, you will need to add similar entries to either theparent.configsor thechild.configsarrays. Regardless of whether you add config files that reside directly under/etc/netdataor in a subdirectory such as/etc/netdata/go.d,you can use the already provided configurations as reference. For reference, theparent.configsthe array includes anexamplealarm that would get triggered if the Python.dexamplemodule was enabled. Whenever you pass the sensitive data to your configuration like the database credential you can take an option to put it into the Kubernetes Secret by specifyingstoredType: secretin the selected configuration. Default all the configuration will be placed in the Kubernetes configmap.

Note that with the default configuration of this chart, the parent does the health checks and triggers alarms, but does not collect much data. As a result, the only other configuration files that might make sense to add to the parent are the alarm and alarm template definitions, under/etc/netdata/health.d.

Tip:Do pay attention to the indentation of the config file contents, as it matters for the parsing of theyamlfile. Note that the first line undervar: | must be indented with two more spaces relative to the preceding line:

data: |-
config line 1 #Need those two spaces
config line 2 #No problem indenting more here

Persistent volumes

There are two different persistent volumes onparentnode by design (not counting any Configmap/Secret mounts). Both can be used, but they don't have to be. Keep in mind that whenever persistent volumes forparentare not used, all the data for specific PV is lost in case of pod removal.

  1. database (/var/cache/netdata) - all metrics data is stored here. Performance of this volume affects query timings.
  2. alarms (/var/lib/netdata) - alarm log, if not persistent pod recreation will result in parent appearing as a new node innetdata.cloud(due to./registry/and./cloud.d/being removed).

In case ofchildinstance it is a bit simpler. By default, hostPath:/var/lib/netdata-k8s-childis mounted on child in:/var/lib/netdata.You can disable it but this option is pretty much required in a real life scenario, as without it each pod deletion will result in new replication node for a parent.

Service discovery and supported services

Netdata'sservice discovery,which is installed as part of the Helm chart installation, finds what services are running on a cluster's pods, converts that into configuration files, and exports them, so they can be monitored.

Applications

Service discovery currently supports the following applications via their associated collector:

Prometheus endpoints

Service discovery supports Prometheus endpoints via thePrometheuscollector.

Annotations on pods allow a fine control of the scraping process:

  • prometheus.io/scrape:The default configuration will scrape all pods and, if set to false, this annotation excludes the pod from the scraping process.
  • prometheus.io/path:If the metrics path is not/metrics,define it with this annotation.
  • prometheus.io/port:Scrape the pod on the indicated port instead of the pod’s declared ports.

Configure service discovery

If your cluster runs services on non-default ports or uses non-default names, you may need to configure service discovery to start collecting metrics from your services. You have to edit thedefault ConfigMapthat is shipped with the Helmchart and deploy that to your cluster.

First, copynetdata-helmchart/sdconfig/child.ymlto a new location outside thenetdata-helmchartdirectory. The destination can be anywhere you like, but the following examples assume it resides next to thenetdata-helmchart directory.

cp netdata-helmchart/sdconfig/child.yml.

Edit the newchild.ymlfile according to your needs. See theHelm chart configurationand the file itself for details. You can then run helm install/helm upgradewith the--set-fileargument to use your configuredchild.ymlfile instead of the default, changing the path if you copied it elsewhere.

helm install --set-file sd.child.configmap.from.value=./child.yml netdata./netdata-helmchart/charts/netdata
helm upgrade --set-file sd.child.configmap.from.value=./child.yml netdata./netdata-helmchart/charts/netdata

Now that you pushed an edited ConfigMap to your cluster, service discovery should find and set up metrics collection from your non-default service.

Custom pod labels and annotations

Occasionally, you will want to add specificlabels andannotationsto the parent and/or child pods. You might want to do this to tell other applications on the cluster how to treat your pods, or simply to categorize applications on your cluster. You can label and annotate the parent and child pods by using thepodLabels andpodAnnotationsdictionaries under theparentandchildobjects, respectively.

For example, suppose you're installing Netdata on all your database nodes, and you'd like the child pods to be labeled withworkload: databaseso that you're able to recognize this.

At the same time, say you've configuredchaoskubeto kill all pods annotated withchaoskube.io/enabled: true,and you'd like chaoskube to be enabled for the parent pod but not the childs.

You would do this by installing as:

$helm install \
--set child.podLabels.workload=database \
--set 'child.podAnnotations.chaoskube\.io/enabled=false' \
--set 'parent.podAnnotations.chaoskube\.io/enabled=true' \
netdata./netdata-helmchart/charts/netdata

Contributing

If you want to contribute, we are humbled!