Skip to content
matt maier edited this page Apr 24, 2015 · 6 revisions

Files

etc-collectd/mesos.yaml.example
etc-collectd/mesos-cli.yaml.example
etc-collectd/conf.d/mesos-exec.conf.example
etc-collectd/conf.d/mesos-plugin.conf.example

Enabling and configuring the Mesos plugin(s) is pretty straight-forward and many settings are defaulted to acceptable values. It is optional, obviously only applicable if one is running part of Mesos.

As with the Collectd collector, there are two options for running the Mesos collector.

  1. As a Python plugin
    • + More efficient use of resources.
    • - No ability to manipulate the host portion of the metric name.
  2. As an Exec Plugin
    • - Less efficient use of resources, a process is spawned to collect metrics at each interval.
    • + Ability to manipulate the host portion of the metric name to assist in maintaining metric continuity.

Basic plugin configuration options


Common configuration options

Both the Python and Exec plugins support a core set of configuration options, as with the CAdvisor plugin, in different file format.

Setting Default Description
Host "docker:gateway"
  • "docker:gateway" - when the Mesos server is running on the host, use the gateway from the cadvisor-collectd container.
  • "docker/(name|id)" - retrieve the IP from a specific docker container using its name or id. e.g. "docker/mesos"
  • "IP" - connect using an explicit IP address.
Port - Port for connections to the Mesos server (default is commented out, the port is set based on the role of the plugin using the Mesos defaults of master:5050 and slave:5051).
Separator "." Separator character to use when using Mesos metric names as Collectd type-instances
ConfigFile "/etc/collectd/mesos.yaml" Location of the main Mesos configuration file (context is the cadvisor-collectd container).
TrackingName "mesos.master" Only applicable to the active Mesos master - provides a facility whereby the host portion of the metric name is consistent, regardless of which Mesos master instance is active, for master metric continuity.

### Using the Python plugin

After copying the configuration file, edit it and uncomment the block applicable to this system and update any other configuration options.

cd etc-collectd/conf.d && cp mesos-plugin.conf.example mesos.conf
vi mesos.conf

Mesos Master system example

<LoadPlugin "python">
    Globals true
</LoadPlugin>

<Plugin "python">
    ModulePath "/opt/collectd/python/"
    Import "mesos-master"
    <Module "mesos-master">
        Host "docker:gateway"
        # Port 5050
        TrackingName "mesos.master"
        ConfigFile "/etc/collectd/mesos.yaml"
        Separator "."
    </Module>
</Plugin>

Mesos Slave system example

<LoadPlugin "python">
    Globals true
</LoadPlugin>

<Plugin "python">
    ModulePath "/opt/collectd/python/"
    Import "mesos-slave"
    <Module "mesos-slave">
        Host "docker:gateway"
        # Port 5051
        ConfigFile "/etc/collectd/mesos.yaml"
        Separator "."
    </Module>
</Plugin>

### Using the Exec plugin

There are no changes required to mesos.conf beyond copying the example file which is enough to enable the plugin when Collectd (re)starts. The Exec plugin client has one additional configuration option which must be set.

Setting Default Description
Role "master" Role of the daemon, master or slave, from which metrics will be collected. The role is implied when using the Python plugin, determined by which module is loaded. For the command line client, this must be explicitly set.
cd etc-collectd/conf.d && cp mesos-exec.conf.example mesos.conf
cd .. && cp mesos-cli.yaml.example mesos-cli.yaml
vi mesos-cli.yaml

Mesos Master example

Role: "master"
Host: "docker:gateway"
# Port: 5050
Separator: "."
ConfigFile: "/etc/collectd/mesos.yaml"
TrackingName: "mesos.master"

Mesos Slave example

Role: "slave"
Host: "docker:gateway"
# Port: 5051
Separator: "."
ConfigFile: "/etc/collectd/mesos.yaml"

Metrics configuration

Controls the mesos to collectd metric type translations. Provides facility to ignore specific mesos metrics (information, redundant, etc.).

cd etc-collectd && cp mesos.yaml.example mesos.yaml
vi mesos.yaml

Default configuration

#
# metrics definitions
#
# the basic premise is to only define metrics which
# are *not* collectd type 'gauge'. providing a more
# dynamic collection method.
#
# 1. metrics will show up when they are in /metrics/snapshot
# 2. changes to upstream metrics do not require the plugin to
#    be changed, only the configuration.
#
# format:
#   mesos_metric_name: collectd_type
#
#   mesos_metric_name:
#     - name of the metric as returned by /metrics/snapshot
#
#   collectd_type:
#     1. as defined in Collectd's default types.db
#	    2. as defined in in mesos-types.db (custom types added for mesos)
#     3. ignore, *do not* submit metric to collectd
#        (e.g. master/elected - information,
#        system/mem_free_bytes, system/mem_total_bytes - redundant
#        if system level metrics are already being collectd by cadvisor)
#
default_metric_type: gauge

master/cpus_percent: percent
master/disk_percent: percent
master/dropped_messages: counter
master/elected: ignore
master/invalid_framework_to_executor_messages: counter
master/invalid_status_update_acknowledgements: counter
master/invalid_status_updates: counter
master/mem_percent: percent
master/messages_authenticate': counter
master/messages_deactivate_framework': counter
master/messages_exited_executor': counter
master/messages_framework_to_executor': counter
master/messages_kill_task': counter
master/messages_launch_tasks': counter
master/messages_reconcile_tasks': counter
master/messages_register_framework': counter
master/messages_register_slave': counter
master/messages_reregister_framework': counter
master/messages_reregister_slave': counter
master/messages_resource_request: counter
master/messages_revive_offers': counter
master/messages_status_update': counter
master/messages_status_update_acknowledgement': counter
master/messages_unregister_framework': counter
master/messages_unregister_slave': counter
master/recovery_slave_removals': counter
master/slave_registrations': counter
master/slave_removals': counter
master/slave_reregistrations': counter
master/tasks_failed': counter
master/tasks_finished': counter
master/tasks_killed': counter
master/tasks_lost': counter
master/uptime_secs': uptime
master/valid_framework_to_executor_messages': counter
master/valid_status_update_acknowledgements': counter
master/valid_status_updates': counter
registrar/registry_size_bytes: bytes
slave/cpus_percent': percent
slave/disk_percent': percent
slave/executors_terminated': counter
slave/executors_terminated': counter
slave/invalid_framework_messages': counter
slave/invalid_status_updates': counter
slave/mem_percent': percent
slave/recovery_errors': counter
slave/tasks_failed': counter
slave/tasks_finished': counter
slave/tasks_killed': counter
slave/tasks_lost': counter
slave/valid_framework_messages': counter
slave/valid_status_updates': counter
system/cpus_total: ignore
system/load_15min: ignore
system/load_1min: ignore
system/load_5min: ignore
system/mem_free_bytes': bytes
system/mem_free_bytes: ignore
system/mem_total_bytes': bytes
system/mem_total_bytes: ignore