Metrics#

Thread pool and read/write latency statistics#

https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsThreadPoolStats.html

Cassandra maintains distinct thread pools for different stages of execution. Each of the thread pools provide statistics on the number of tasks that are active, pending, and completed. Trends on these pools for increases in the pending tasks column indicate when to add additional capacity. After a baseline is established, configure alarms for any increases above normal in the pending tasks column. Use nodetool tpstats on the command line to view the thread pool details shown in the following table.

Thread Pool

Description

AntiEntropyStage

Tasks related to repair

CacheCleanupExecutor

Tasks related to cache maintenance (counter cache, row cache)

CompactionExecutor

Tasks related to compaction

CounterMutationStage

Tasks related to leading counter writes

GossipStage

Tasks related to the gossip protocol

HintsDispatcher

Tasks related to sending hints

InternalResponseStage

Tasks related to miscellaneous internal task responses

MemtableFlushWriter

Tasks related to flushing memtables

MemtablePostFlush

Tasks related to maintenance after memtable flush completion

MemtableReclaimMemory

Tasks related to reclaiming memtable memory

MigrationStage

Tasks related to schema maintenance

MiscStage

Tasks related to miscellaneous tasks, including snapshots and removing hosts

MutationStage

Tasks related to writes

Native-Transport-Requests

Tasks related to client requests from CQL

PendingRangeCalculator

Tasks related to recalculating range ownership after bootstraps/decommissions

PerDiskMemtableFlushWriter_*

Tasks related to flushing memtables to a given disk

ReadRepairStage

Tasks related to performing read repairs

ReadStage

Tasks related to reads

RequestResponseStage

Tasks for callbacks from intra-node requests

Sampler

Tasks related to sampling statistics

SecondaryIndexManagement

Tasks related to secondary index maintenance

ValidationExecutor

Tasks related to validation compactions

ViewMutationStage

Tasks related to maintaining materialized views

Metrics of a thread pool#

https://www.eginnovations.com/documentation/Cassandra-Database/Cassandra-Thread-Pools-Test.htm

Measurement

Description

Measurement Unit

Interpretation

Active tasks

Indicates the number of active tasks in this thread pool.

Number

Compare the value of this measure across the thread pools to figure out the thread pool that contains the maximum number of threads that are active.

Completed tasks

Indicates the rate at which tasks were completed in this thread pool.

Tasks/sec

Compare the value of this measure across the thread pools to figure out the thread pool that has completed the maximum number of tasks per second.

Current blocked tasks

Indicates the number of tasks that are currently blocked in this thread pool.

Number

A high value for this measure indicates that there are no more threads in the thread pool to service the tasks. To avoid the tasks from being blocked, administrators can calibrate the thread pool to accommodate enough threads to service the tasks.

Pending tasks

Indicates the number of tasks that are pending in this thread pool.

Number

Compare the value across the thread pools to identify the thread pool on which the maximum number of tasks are pending.A sudden/gradual increase in the measure is an indication for the administrators to add additional threads to the thread pool.

Total blocked tasks

Indicates the rate at which tasks were blocked in this thread pool during the last measurement period.

Tasks/sec

A low value is desired for this measure.

https://www.instaclustr.com/support/documentation/cassandra/cassandra-monitoring/thread-pool-metrics/

Dropped Messages#

The Dropped Messages metric represents the total number of dropped messages from all stages in the SEDA.

https://cassandra.apache.org/doc/3.11/cassandra/operating/metrics.html#dropped-metrics

Metrics specific to tracking dropped messages for different types of requests. Dropped writes are stored and retried by Hinted Handoff

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.DroppedMessage.<MetricName>.<Type>

  • JMX MBean

    org.apache.cassandra.metrics:type=DroppedMessage scope=<Type> name=<MetricName>

Name

Type

Description

CrossNodeDroppedLatency

Timer

The dropped latency across nodes.

InternalDroppedLatency

Timer

The dropped latency within node.

Dropped

Meter

Number of dropped messages.

The different types of messages tracked are:

Name

Description

BATCH_STORE

Batchlog write

BATCH_REMOVE

Batchlog cleanup (after succesfully applied)

COUNTER_MUTATION

Counter writes

HINT

Hint replay

MUTATION

Regular writes

READ

Regular reads

READ_REPAIR

Read repair

PAGED_SLICE

Paged read

RANGE_SLICE

Token range read

REQUEST_RESPONSE

RPC Callbacks

_TRACE

Tracing writes

Read/Write latency metrics#

Cassandra tracks latency (averages and totals) of read, write, and slicing operations at the server level through StorageProxyMBean.

Table Metrics#

https://murukeshm.github.io/cassandra/3.10/operating/metrics.html#table-metrics

Each table in Cassandra has metrics responsible for tracking its state and performance.

The metric names are all appended with the specific Keyspace and Table name.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.Table.<MetricName>.<Keyspace>.<Table>

  • JMX MBean

    org.apache.cassandra.metrics:type=Table keyspace=<Keyspace> scope=<Table> name=<MetricName>

Note

There is a special table called ‘all‘ without a keyspace. This represents the aggregation of metrics across all tables and keyspaces on the node.

Name

Type

Description

MemtableOnHeapSize

Gauge

Total amount of data stored in the memtable that resides on-heap, including column related overhead and partitions overwritten.

MemtableOffHeapSize

Gauge

Total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten.

MemtableLiveDataSize

Gauge

Total amount of live data stored in the memtable, excluding any data structure overhead.

AllMemtablesOnHeapSize

Gauge

Total amount of data stored in the memtables (2i and pending flush memtables included) that resides on-heap.

AllMemtablesOffHeapSize

Gauge

Total amount of data stored in the memtables (2i and pending flush memtables included) that resides off-heap.

AllMemtablesLiveDataSize

Gauge

Total amount of live data stored in the memtables (2i and pending flush memtables included) that resides off-heap, excluding any data structure overhead.

MemtableColumnsCount

Gauge

Total number of columns present in the memtable.

MemtableSwitchCount

Counter

Number of times flush has resulted in the memtable being switched out.

CompressionRatio

Gauge

Current compression ratio for all SSTables.

EstimatedPartitionSizeHistogram

Gauge<long[]>

Histogram of estimated partition size (in bytes).

EstimatedPartitionCount

Gauge

Approximate number of keys in table.

EstimatedColumnCountHistogram

Gauge<long[]>

Histogram of estimated number of columns.

SSTablesPerReadHistogram

Histogram

Histogram of the number of sstable data files accessed per read.

ReadLatency

Latency

Local read latency for this table.

RangeLatency

Latency

Local range scan latency for this table.

WriteLatency

Latency

Local write latency for this table.

CoordinatorReadLatency

Timer

Coordinator read latency for this table.

CoordinatorScanLatency

Timer

Coordinator range scan latency for this table.

PendingFlushes

Counter

Estimated number of flush tasks pending for this table.

BytesFlushed

Counter

Total number of bytes flushed since server [re]start.

CompactionBytesWritten

Counter

Total number of bytes written by compaction since server [re]start.

PendingCompactions

Gauge

Estimate of number of pending compactions for this table.

LiveSSTableCount

Gauge

Number of SSTables on disk for this table.

LiveDiskSpaceUsed

Counter

Disk space used by SSTables belonging to this table (in bytes).

TotalDiskSpaceUsed

Counter

Total disk space used by SSTables belonging to this table, including obsolete ones waiting to be GC’d.

MinPartitionSize

Gauge

Size of the smallest compacted partition (in bytes).

MaxPartitionSize

Gauge

Size of the largest compacted partition (in bytes).

MeanPartitionSize

Gauge

Size of the average compacted partition (in bytes).

BloomFilterFalsePositives

Gauge

Number of false positives on table’s bloom filter.

BloomFilterFalseRatio

Gauge

False positive ratio of table’s bloom filter.

BloomFilterDiskSpaceUsed

Gauge

Disk space used by bloom filter (in bytes).

BloomFilterOffHeapMemoryUsed

Gauge

Off-heap memory used by bloom filter.

IndexSummaryOffHeapMemoryUsed

Gauge

Off-heap memory used by index summary.

CompressionMetadataOffHeapMemoryUsed

Gauge

Off-heap memory used by compression meta data.

KeyCacheHitRate

Gauge

Key cache hit rate for this table.

TombstoneScannedHistogram

Histogram

Histogram of tombstones scanned in queries on this table.

LiveScannedHistogram

Histogram

Histogram of live cells scanned in queries on this table.

ColUpdateTimeDeltaHistogram

Histogram

Histogram of column update time delta on this table.

ViewLockAcquireTime

Timer

Time taken acquiring a partition lock for materialized view updates on this table.

ViewReadTime

Timer

Time taken during the local read of a materialized view update.

TrueSnapshotsSize

Gauge

Disk space used by snapshots of this table including all SSTable components.

RowCacheHitOutOfRange

Counter

Number of table row cache hits that do not satisfy the query filter, thus went to disk.

RowCacheHit

Counter

Number of table row cache hits.

RowCacheMiss

Counter

Number of table row cache misses.

CasPrepare

Latency

Latency of paxos prepare round.

CasPropose

Latency

Latency of paxos propose round.

CasCommit

Latency

Latency of paxos commit round.

PercentRepaired

Gauge

Percent of table data that is repaired on disk.

SpeculativeRetries

Counter

Number of times speculative retries were sent for this table.

WaitingOnFreeMemtableSpace

Histogram

Histogram of time spent waiting for free memtable space, either on- or off-heap.

DroppedMutations

Counter

Number of dropped mutations on this table.

Keyspace Metrics#

Each keyspace in Cassandra has metrics responsible for tracking its state and performance.

These metrics are the same as the Table Metrics above, only they are aggregated at the Keyspace level.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.keyspace.<MetricName>.<Keyspace>

  • JMX MBean

    org.apache.cassandra.metrics:type=Keyspace scope=<Keyspace> name=<MetricName>

Client Request Metrics#

https://murukeshm.github.io/cassandra/3.10/operating/metrics.html#client-request-metrics

Client requests have their own set of metrics that encapsulate the work happening at coordinator level.

Different types of client requests are broken down by RequestType.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.ClientRequest.<MetricName>.<RequestType>

  • JMX MBean

    org.apache.cassandra.metrics:type=ClientRequest scope=<RequestType> name=<MetricName>

RequestType:

CASRead

Description:

Metrics related to transactional read requests.

Metrics:

NameTypeDescriptionTimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of transaction failures encountered. LatencyTransaction read latency.UnavailablesCounterNumber of unavailable exceptions encountered.UnfinishedCommitCounterNumber of transactions that were committed on read.ConditionNotMetCounterNumber of transaction preconditions did not match current values.ContentionHistogramHistogramHow many contended reads were encountered

RequestType:

CASWrite

Description:

Metrics related to transactional write requests.

Metrics:

NameTypeDescriptionTimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of transaction failures encountered. LatencyTransaction write latency.UnfinishedCommitCounterNumber of transactions that were committed on write.ConditionNotMetCounterNumber of transaction preconditions did not match current values.ContentionHistogramHistogramHow many contended writes were encountered

RequestType:

Read

Description:

Metrics related to standard read requests.

Metrics:

NameTypeDescriptionTimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of read failures encountered. LatencyRead latency.UnavailablesCounterNumber of unavailable exceptions encountered.

RequestType:

RangeSlice

Description:

Metrics related to token range read requests.

Metrics:

NameTypeDescriptionTimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of range query failures encountered. LatencyRange query latency.UnavailablesCounterNumber of unavailable exceptions encountered.

RequestType:

Write

Description:

Metrics related to regular write requests.

Metrics:

NameTypeDescriptionTimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of write failures encountered. LatencyWrite latency.UnavailablesCounterNumber of unavailable exceptions encountered.

RequestType:

ViewWrite

Description:

Metrics related to materialized view write wrtes.

Metrics:

TimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of transaction failures encountered.UnavailablesCounterNumber of unavailable exceptions encountered.ViewReplicasAttemptedCounterTotal number of attempted view replica writes.ViewReplicasSuccessCounterTotal number of succeded view replica writes.ViewPendingMutationsGaugeViewReplicasAttempted - ViewReplicasSuccess.ViewWriteLatencyTimerTime between when mutation is applied to base table and when CL.ONE is achieved on view.

Cache Metrics#

Cassandra caches have metrics to track the effectivness of the caches. Though the Table Metrics might be more useful.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.Cache.<MetricName>.<CacheName>

  • JMX MBean

    org.apache.cassandra.metrics:type=Cache scope=<CacheName> name=<MetricName>

Name

Type

Description

Capacity

Gauge

Cache capacity in bytes.

Entries

Gauge

Total number of cache entries.

FifteenMinuteCacheHitRate

Gauge

15m cache hit rate.

FiveMinuteCacheHitRate

Gauge

5m cache hit rate.

OneMinuteCacheHitRate

Gauge

1m cache hit rate.

HitRate

Gauge

All time cache hit rate.

Hits

Meter

Total number of cache hits.

Misses

Meter

Total number of cache misses.

MissLatency

Timer

Latency of misses.

Requests

Gauge

Total number of cache requests.

Size

Gauge

Total size of occupied cache, in bytes.

The following caches are covered:

Name

Description

CounterCache

Keeps hot counters in memory for performance.

ChunkCache

In process uncompressed page cache.

KeyCache

Cache for partition to sstable offsets.

RowCache

Cache for rows kept in memory.

Note

Misses and MissLatency are only defined for the ChunkCache

CQL Metrics#

Metrics specific to CQL prepared statement caching.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.CQL.<MetricName>

  • JMX MBean

    org.apache.cassandra.metrics:type=CQL name=<MetricName>

Name

Type

Description

PreparedStatementsCount

Gauge

Number of cached prepared statements.

PreparedStatementsEvicted

Counter

Number of prepared statements evicted from the prepared statement cache

PreparedStatementsExecuted

Counter

Number of prepared statements executed.

RegularStatementsExecuted

Counter

Number of non prepared statements executed.

PreparedStatementsRatio

Gauge

Percentage of statements that are prepared vs unprepared.

DroppedMessage Metrics#

Metrics specific to tracking dropped messages for different types of requests. Dropped writes are stored and retried by Hinted Handoff

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.DroppedMessages.<MetricName>.<Type>

  • JMX MBean

    org.apache.cassandra.metrics:type=DroppedMetrics scope=<Type> name=<MetricName>

Name

Type

Description

CrossNodeDroppedLatency

Timer

The dropped latency across nodes.

InternalDroppedLatency

Timer

The dropped latency within node.

Dropped

Meter

Number of dropped messages.

The different types of messages tracked are:

Name

Description

BATCH_STORE

Batchlog write

BATCH_REMOVE

Batchlog cleanup (after succesfully applied)

COUNTER_MUTATION

Counter writes

HINT

Hint replay

MUTATION

Regular writes

READ

Regular reads

READ_REPAIR

Read repair

PAGED_SLICE

Paged read

RANGE_SLICE

Token range read

REQUEST_RESPONSE

RPC Callbacks

_TRACE

Tracing writes

Streaming Metrics#

Metrics reported during Streaming operations, such as repair, bootstrap, rebuild.

These metrics are specific to a peer endpoint, with the source node being the node you are pulling the metrics from.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.Streaming.<MetricName>.<PeerIP>

  • JMX MBean

    org.apache.cassandra.metrics:type=Streaming scope=<PeerIP> name=<MetricName>

Name

Type

Description

IncomingBytes

Counter

Number of bytes streamed to this node from the peer.

OutgoingBytes

Counter

Number of bytes streamed to the peer endpoint from this node.

Compaction Metrics#

https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsCompactionMetrics.html

Monitoring compaction performance is an important aspect of knowing when to add capacity to your cluster. The following attributes are exposed through CompactionManagerMBean:

Attribute

Description

BytesCompacted

Total number of bytes compacted since server [re]start

CompletedTasks

Number of completed compactions since server [re]start

PendingTasks

Estimated number of compactions remaining to perform

TotalCompactionsCompleted

Total number of compactions since server [re]start

Metrics specific to Compaction work.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.Compaction.<MetricName>

  • JMX MBean

    org.apache.cassandra.metrics:type=Compaction name=<MetricName>

Name

Type

Description

BytesCompacted

Counter

Total number of bytes compacted since server [re]start.

PendingTasks

Gauge

Estimated number of compactions remaining to perform.

CompletedTasks

Gauge

Number of completed compactions since server [re]start.

TotalCompactionsCompleted

Meter

Throughput of completed compactions since server [re]start.

PendingTasksByTableName

Gauge<Map<String, Map<String, Integer>>>

Estimated number of compactions remaining to perform, grouped by keyspace and then table name. This info is also kept in Table Metrics.

CommitLog Metrics#

Metrics specific to the CommitLog

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.CommitLog.<MetricName>

  • JMX MBean

    org.apache.cassandra.metrics:type=CommitLog name=<MetricName>

Name

Type

Description

CompletedTasks

Gauge

Total number of commit log messages written since [re]start.

PendingTasks

Gauge

Number of commit log messages written but yet to be fsync’d.

TotalCommitLogSize

Gauge

Current size, in bytes, used by all the commit log segments.

WaitingOnSegmentAllocation

Timer

Time spent waiting for a CommitLogSegment to be allocated - under normal conditions this should be zero.

WaitingOnCommit

Timer

The time spent waiting on CL fsync; for Periodic this is only occurs when the sync is lagging its sync interval.

Storage Metrics#

Metrics specific to the storage engine.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.Storage.<MetricName>

  • JMX MBean

    org.apache.cassandra.metrics:type=Storage name=<MetricName>

Name

Type

Description

Exceptions

Counter

Number of internal exceptions caught. Under normal exceptions this should be zero.

Load

Counter

Size, in bytes, of the on disk data size this node manages.

TotalHints

Counter

Number of hint messages written to this node since [re]start. Includes one entry for each host to be hinted per hint.

TotalHintsInProgress

Counter

Number of hints attemping to be sent currently.

HintedHandoff Metrics#

Metrics specific to Hinted Handoff. There are also some metrics related to hints tracked in Storage Metrics

These metrics include the peer endpoint in the metric name

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.HintedHandOffManager.<MetricName>

  • JMX MBean

    org.apache.cassandra.metrics:type=HintedHandOffManager name=<MetricName>

Name

Type

Description

Hints_created-

Counter

Number of hints on disk for this peer.

Hints_not_stored-

Counter

Number of hints not stored for this peer, due to being down past the configured hint window.

SSTable Index Metrics#

Metrics specific to the SSTable index metadata.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.Index.<MetricName>.RowIndexEntry

  • JMX MBean

    org.apache.cassandra.metrics:type=Index scope=RowIndexEntry name=<MetricName>

Name

Type

Description

IndexedEntrySize

Histogram

Histogram of the on-heap size, in bytes, of the index across all SSTables.

IndexInfoCount

Histogram

Histogram of the number of on-heap index entries managed across all SSTables.

IndexInfoGets

Histogram

Histogram of the number index seeks performed per SSTable.

BufferPool Metrics#

Metrics specific to the internal recycled buffer pool Cassandra manages. This pool is meant to keep allocations and GC lower by recycling on and off heap buffers.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.BufferPool.<MetricName>

  • JMX MBean

    org.apache.cassandra.metrics:type=BufferPool name=<MetricName>

Name

Type

Description

Size

Gauge

Size, in bytes, of the managed buffer pool

Misses

Meter

The rate of misses in the pool. The higher this is the more allocations incurred.

Client Metrics#

Metrics specifc to client managment.

Reported name format:

  • Metric Name

    org.apache.cassandra.metrics.Client.<MetricName>

  • JMX MBean

    org.apache.cassandra.metrics:type=Client name=<MetricName>

Name

Type

Description

connectedNativeClients

Counter

Number of clients connected to this nodes native protocol server

connectedThriftClients

Counter

Number of clients connected to this nodes thrift protocol server

JVM Metrics#

JVM metrics such as memory and garbage collection statistics can either be accessed by connecting to the JVM using JMX or can be exported using Metric Reporters.

BufferPool#

  • Metric Name

    jvm.buffers.<direct|mapped>.<MetricName>

  • JMX MBean

    java.nio:type=BufferPool name=<direct|mapped>

Name

Type

Description

Capacity

Gauge

Estimated total capacity of the buffers in this pool

Count

Gauge

Estimated number of buffers in the pool

Used

Gauge

Estimated memory that the Java virtual machine is using for this buffer pool

FileDescriptorRatio#

  • Metric Name

    jvm.fd.<MetricName>

  • JMX MBean

    java.lang:type=OperatingSystem name=<OpenFileDescriptorCount|MaxFileDescriptorCount>

Name

Type

Description

Usage

Ratio

Ratio of used to total file descriptors

GarbageCollector#

  • Metric Name

    jvm.gc.<gc_type>.<MetricName>

  • JMX MBean

    java.lang:type=GarbageCollector name=<gc_type>

Name

Type

Description

Count

Gauge

Total number of collections that have occurred

Time

Gauge

Approximate accumulated collection elapsed time in milliseconds

Memory#

  • Metric Name

    jvm.memory.<heap/non-heap/total>.<MetricName>

  • JMX MBean

    java.lang:type=Memory

Committed

Gauge

Amount of memory in bytes that is committed for the JVM to use

Init

Gauge

Amount of memory in bytes that the JVM initially requests from the OS

Max

Gauge

Maximum amount of memory in bytes that can be used for memory management

Usage

Ratio

Ratio of used to maximum memory

Used

Gauge

Amount of used memory in bytes

MemoryPool#

  • Metric Name

    jvm.memory.pools.<memory_pool>.<MetricName>

  • JMX MBean

    java.lang:type=MemoryPool name=<memory_pool>

Committed

Gauge

Amount of memory in bytes that is committed for the JVM to use

Init

Gauge

Amount of memory in bytes that the JVM initially requests from the OS

Max

Gauge

Maximum amount of memory in bytes that can be used for memory management

Usage

Ratio

Ratio of used to maximum memory

Used

Gauge

Amount of used memory in bytes