Suppress Nomad's logging overhead

When managing Docker containers with Nomad, logs are handled in this way

This is very inefficient, especially because of the logmon and docker_logger process : each of them consume arround 50 or 60MB, so a 100 to 120MB of overhead per container ! If you run a few big containers, it might not be a big problem, but when you use a lot of small sidecars (like I do, not only envoys for the service mesh, but also a lot of pgbouncer, redis valkey, memcached, nginx etc.), this is insane. On some of my nodes, this overhead represented ~30% of the total used RAM.

Nomad lets you disable log collection, either globaly, in the agent config

plugin "docker" {
  config {
    disable_log_collection = true
  }
}

Or for individual task, in the job file

    task "metrics-proxy" {
      driver = "docker"

      # Reduce Docker logs collection (huge) overhead
      logs {
        disabled = true
      }
[...]

This removes the overhead, but also disable any log collection. You can collect logs through other means, but you won't be able to display them from the Nomad API or the web interface, which can make debugging harder.

Here I'll present how I removed the overhead, while still being able to check logs.

plugin "docker" {
  config {
    disable_log_collection = true
    logging {
      type = "fluentd"
      config {
        # Send logs to a local fluentd service, running on port 4224
        fluentd-address = "127.0.0.1:4224"
        fluentd-async = true
        # This is important for the fluentd service to access the metadata
        env = "NOMAD_JOB_NAME,NOMAD_GROUP_NAME,NOMAD_DC,NOMAD_REGION,NOMAD_TASK_NAME,NOMAD_ALLOC_INDEX,NOMAD_ALLOC_ID,NOMAD_NAMESPACE"
      }
    }
  }
}

You will have to restart nomad agent and all running container for the change to take effect

sources:
  in_fluent:
    type: fluent
    address: 127.0.0.1:4224

transforms: 
  transform_fluent:
    type: remap
    inputs: ["in_fluent"]
    # Map .log content in .message so we can use the text codec in sinks
    source: |
      .message = del(.log)
    
sinks:
  # Duplicate fluentd logs to files so Nomad API can read it
  out_nomad_files:
    type: file
    path: /opt/nomad/data/alloc/{{ .NOMAD_ALLOC_ID }}/alloc/logs/{{ .NOMAD_TASK_NAME }}.{{ .source }}.0
    inputs: ["transform_fluent"]
    encoding:
      codec: text

You will have to handle log rotation yourself, for example with logrotate

With this in place, vector will create logs for Nomad, just like nomad's log collector would have done, but in a much more efficient way. You can query logs using nomad alloc logs, or from the web interface. And the nice thing is you can also send logs elsewhere (for example, a ES instance, or a Loki server, vector supports a lot of different sinks)


Revision #8
Created 12 May 2024 13:15:33 by Daniel Berteaud
Updated 31 May 2024 08:19:32 by Daniel Berteaud