Metrics#

Thread pool and read/write latency statistics#

https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsThreadPoolStats.html

Cassandra maintains distinct thread pools for different stages of execution. Each of the thread pools provide statistics on the number of tasks that are active, pending, and completed. Trends on these pools for increases in the pending tasks column indicate when to add additional capacity. After a baseline is established, configure alarms for any increases above normal in the pending tasks column. Use nodetool tpstats on the command line to view the thread pool details shown in the following table.

Thread Pool	Description
AntiEntropyStage	Tasks related to repair
CacheCleanupExecutor	Tasks related to cache maintenance (counter cache, row cache)
CompactionExecutor	Tasks related to compaction
CounterMutationStage	Tasks related to leading counter writes
GossipStage	Tasks related to the gossip protocol
HintsDispatcher	Tasks related to sending hints
InternalResponseStage	Tasks related to miscellaneous internal task responses
MemtableFlushWriter	Tasks related to flushing memtables
MemtablePostFlush	Tasks related to maintenance after memtable flush completion
MemtableReclaimMemory	Tasks related to reclaiming memtable memory
MigrationStage	Tasks related to schema maintenance
MiscStage	Tasks related to miscellaneous tasks, including snapshots and removing hosts
MutationStage	Tasks related to writes
Native-Transport-Requests	Tasks related to client requests from CQL
PendingRangeCalculator	Tasks related to recalculating range ownership after bootstraps/decommissions
PerDiskMemtableFlushWriter_*	Tasks related to flushing memtables to a given disk
ReadRepairStage	Tasks related to performing read repairs
ReadStage	Tasks related to reads
RequestResponseStage	Tasks for callbacks from intra-node requests
Sampler	Tasks related to sampling statistics
SecondaryIndexManagement	Tasks related to secondary index maintenance
ValidationExecutor	Tasks related to validation compactions
ViewMutationStage	Tasks related to maintaining materialized views

Metrics of a thread pool#

https://www.eginnovations.com/documentation/Cassandra-Database/Cassandra-Thread-Pools-Test.htm

Measurement	Description	Measurement Unit	Interpretation
Active tasks	Indicates the number of active tasks in this thread pool.	Number	Compare the value of this measure across the thread pools to figure out the thread pool that contains the maximum number of threads that are active.
Completed tasks	Indicates the rate at which tasks were completed in this thread pool.	Tasks/sec	Compare the value of this measure across the thread pools to figure out the thread pool that has completed the maximum number of tasks per second.
Current blocked tasks	Indicates the number of tasks that are currently blocked in this thread pool.	Number	A high value for this measure indicates that there are no more threads in the thread pool to service the tasks. To avoid the tasks from being blocked, administrators can calibrate the thread pool to accommodate enough threads to service the tasks.
Pending tasks	Indicates the number of tasks that are pending in this thread pool.	Number	Compare the value across the thread pools to identify the thread pool on which the maximum number of tasks are pending.A sudden/gradual increase in the measure is an indication for the administrators to add additional threads to the thread pool.
Total blocked tasks	Indicates the rate at which tasks were blocked in this thread pool during the last measurement period.	Tasks/sec	A low value is desired for this measure.

https://www.instaclustr.com/support/documentation/cassandra/cassandra-monitoring/thread-pool-metrics/

Dropped Messages#

The Dropped Messages metric represents the total number of dropped messages from all stages in the SEDA.

https://cassandra.apache.org/doc/3.11/cassandra/operating/metrics.html#dropped-metrics

Metrics specific to tracking dropped messages for different types of requests. Dropped writes are stored and retried by Hinted Handoff

Reported name format:

Metric Name

org.apache.cassandra.metrics.DroppedMessage.<MetricName>.<Type>
JMX MBean

org.apache.cassandra.metrics:type=DroppedMessage scope=<Type> name=<MetricName>

Name	Type	Description
CrossNodeDroppedLatency	Timer	The dropped latency across nodes.
InternalDroppedLatency	Timer	The dropped latency within node.
Dropped	Meter	Number of dropped messages.

The different types of messages tracked are:

Name	Description
BATCH_STORE	Batchlog write
BATCH_REMOVE	Batchlog cleanup (after succesfully applied)
COUNTER_MUTATION	Counter writes
HINT	Hint replay
MUTATION	Regular writes
READ	Regular reads
READ_REPAIR	Read repair
PAGED_SLICE	Paged read
RANGE_SLICE	Token range read
REQUEST_RESPONSE	RPC Callbacks
_TRACE	Tracing writes

Read/Write latency metrics#

Cassandra tracks latency (averages and totals) of read, write, and slicing operations at the server level through StorageProxyMBean.

Table Metrics#

https://murukeshm.github.io/cassandra/3.10/operating/metrics.html#table-metrics

Each table in Cassandra has metrics responsible for tracking its state and performance.

The metric names are all appended with the specific Keyspace and Table name.

Reported name format:

Metric Name

org.apache.cassandra.metrics.Table.<MetricName>.<Keyspace>.<Table>
JMX MBean

org.apache.cassandra.metrics:type=Table keyspace=<Keyspace> scope=<Table> name=<MetricName>

Note

There is a special table called ‘all‘ without a keyspace. This represents the aggregation of metrics across all tables and keyspaces on the node.

Name	Type	Description
MemtableOnHeapSize	Gauge	Total amount of data stored in the memtable that resides on-heap, including column related overhead and partitions overwritten.
MemtableOffHeapSize	Gauge	Total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten.
MemtableLiveDataSize	Gauge	Total amount of live data stored in the memtable, excluding any data structure overhead.
AllMemtablesOnHeapSize	Gauge	Total amount of data stored in the memtables (2i and pending flush memtables included) that resides on-heap.
AllMemtablesOffHeapSize	Gauge	Total amount of data stored in the memtables (2i and pending flush memtables included) that resides off-heap.
AllMemtablesLiveDataSize	Gauge	Total amount of live data stored in the memtables (2i and pending flush memtables included) that resides off-heap, excluding any data structure overhead.
MemtableColumnsCount	Gauge	Total number of columns present in the memtable.
MemtableSwitchCount	Counter	Number of times flush has resulted in the memtable being switched out.
CompressionRatio	Gauge	Current compression ratio for all SSTables.
EstimatedPartitionSizeHistogram	Gauge<long[]>	Histogram of estimated partition size (in bytes).
EstimatedPartitionCount	Gauge	Approximate number of keys in table.
EstimatedColumnCountHistogram	Gauge<long[]>	Histogram of estimated number of columns.
SSTablesPerReadHistogram	Histogram	Histogram of the number of sstable data files accessed per read.
ReadLatency	Latency	Local read latency for this table.
RangeLatency	Latency	Local range scan latency for this table.
WriteLatency	Latency	Local write latency for this table.
CoordinatorReadLatency	Timer	Coordinator read latency for this table.
CoordinatorScanLatency	Timer	Coordinator range scan latency for this table.
PendingFlushes	Counter	Estimated number of flush tasks pending for this table.
BytesFlushed	Counter	Total number of bytes flushed since server [re]start.
CompactionBytesWritten	Counter	Total number of bytes written by compaction since server [re]start.
PendingCompactions	Gauge	Estimate of number of pending compactions for this table.
LiveSSTableCount	Gauge	Number of SSTables on disk for this table.
LiveDiskSpaceUsed	Counter	Disk space used by SSTables belonging to this table (in bytes).
TotalDiskSpaceUsed	Counter	Total disk space used by SSTables belonging to this table, including obsolete ones waiting to be GC’d.
MinPartitionSize	Gauge	Size of the smallest compacted partition (in bytes).
MaxPartitionSize	Gauge	Size of the largest compacted partition (in bytes).
MeanPartitionSize	Gauge	Size of the average compacted partition (in bytes).
BloomFilterFalsePositives	Gauge	Number of false positives on table’s bloom filter.
BloomFilterFalseRatio	Gauge	False positive ratio of table’s bloom filter.
BloomFilterDiskSpaceUsed	Gauge	Disk space used by bloom filter (in bytes).
BloomFilterOffHeapMemoryUsed	Gauge	Off-heap memory used by bloom filter.
IndexSummaryOffHeapMemoryUsed	Gauge	Off-heap memory used by index summary.
CompressionMetadataOffHeapMemoryUsed	Gauge	Off-heap memory used by compression meta data.
KeyCacheHitRate	Gauge	Key cache hit rate for this table.
TombstoneScannedHistogram	Histogram	Histogram of tombstones scanned in queries on this table.
LiveScannedHistogram	Histogram	Histogram of live cells scanned in queries on this table.
ColUpdateTimeDeltaHistogram	Histogram	Histogram of column update time delta on this table.
ViewLockAcquireTime	Timer	Time taken acquiring a partition lock for materialized view updates on this table.
ViewReadTime	Timer	Time taken during the local read of a materialized view update.
TrueSnapshotsSize	Gauge	Disk space used by snapshots of this table including all SSTable components.
RowCacheHitOutOfRange	Counter	Number of table row cache hits that do not satisfy the query filter, thus went to disk.
RowCacheHit	Counter	Number of table row cache hits.
RowCacheMiss	Counter	Number of table row cache misses.
CasPrepare	Latency	Latency of paxos prepare round.
CasPropose	Latency	Latency of paxos propose round.
CasCommit	Latency	Latency of paxos commit round.
PercentRepaired	Gauge	Percent of table data that is repaired on disk.
SpeculativeRetries	Counter	Number of times speculative retries were sent for this table.
WaitingOnFreeMemtableSpace	Histogram	Histogram of time spent waiting for free memtable space, either on- or off-heap.
DroppedMutations	Counter	Number of dropped mutations on this table.

Keyspace Metrics#

Each keyspace in Cassandra has metrics responsible for tracking its state and performance.

These metrics are the same as the Table Metrics above, only they are aggregated at the Keyspace level.

Reported name format:

Metric Name

org.apache.cassandra.metrics.keyspace.<MetricName>.<Keyspace>
JMX MBean

org.apache.cassandra.metrics:type=Keyspace scope=<Keyspace> name=<MetricName>

Client Request Metrics#

https://murukeshm.github.io/cassandra/3.10/operating/metrics.html#client-request-metrics

Client requests have their own set of metrics that encapsulate the work happening at coordinator level.

Different types of client requests are broken down by RequestType.

Reported name format:

Metric Name

org.apache.cassandra.metrics.ClientRequest.<MetricName>.<RequestType>
JMX MBean

org.apache.cassandra.metrics:type=ClientRequest scope=<RequestType> name=<MetricName>

RequestType:	CASRead
Description:	Metrics related to transactional read requests.
Metrics:	NameTypeDescriptionTimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of transaction failures encountered. LatencyTransaction read latency.UnavailablesCounterNumber of unavailable exceptions encountered.UnfinishedCommitCounterNumber of transactions that were committed on read.ConditionNotMetCounterNumber of transaction preconditions did not match current values.ContentionHistogramHistogramHow many contended reads were encountered
RequestType:	CASWrite
Description:	Metrics related to transactional write requests.
Metrics:	NameTypeDescriptionTimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of transaction failures encountered. LatencyTransaction write latency.UnfinishedCommitCounterNumber of transactions that were committed on write.ConditionNotMetCounterNumber of transaction preconditions did not match current values.ContentionHistogramHistogramHow many contended writes were encountered
RequestType:	Read
Description:	Metrics related to standard read requests.
Metrics:	NameTypeDescriptionTimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of read failures encountered. LatencyRead latency.UnavailablesCounterNumber of unavailable exceptions encountered.
RequestType:	RangeSlice
Description:	Metrics related to token range read requests.
Metrics:	NameTypeDescriptionTimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of range query failures encountered. LatencyRange query latency.UnavailablesCounterNumber of unavailable exceptions encountered.
RequestType:	Write
Description:	Metrics related to regular write requests.
Metrics:	NameTypeDescriptionTimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of write failures encountered. LatencyWrite latency.UnavailablesCounterNumber of unavailable exceptions encountered.
RequestType:	ViewWrite
Description:	Metrics related to materialized view write wrtes.
Metrics:	TimeoutsCounterNumber of timeouts encountered.FailuresCounterNumber of transaction failures encountered.UnavailablesCounterNumber of unavailable exceptions encountered.ViewReplicasAttemptedCounterTotal number of attempted view replica writes.ViewReplicasSuccessCounterTotal number of succeded view replica writes.ViewPendingMutationsGaugeViewReplicasAttempted - ViewReplicasSuccess.ViewWriteLatencyTimerTime between when mutation is applied to base table and when CL.ONE is achieved on view.

Cache Metrics#

Cassandra caches have metrics to track the effectivness of the caches. Though the Table Metrics might be more useful.

Reported name format:

Metric Name

org.apache.cassandra.metrics.Cache.<MetricName>.<CacheName>
JMX MBean

org.apache.cassandra.metrics:type=Cache scope=<CacheName> name=<MetricName>

Name	Type	Description
Capacity	Gauge	Cache capacity in bytes.
Entries	Gauge	Total number of cache entries.
FifteenMinuteCacheHitRate	Gauge	15m cache hit rate.
FiveMinuteCacheHitRate	Gauge	5m cache hit rate.
OneMinuteCacheHitRate	Gauge	1m cache hit rate.
HitRate	Gauge	All time cache hit rate.
Hits	Meter	Total number of cache hits.
Misses	Meter	Total number of cache misses.
MissLatency	Timer	Latency of misses.
Requests	Gauge	Total number of cache requests.
Size	Gauge	Total size of occupied cache, in bytes.

The following caches are covered:

Name	Description
CounterCache	Keeps hot counters in memory for performance.
ChunkCache	In process uncompressed page cache.
KeyCache	Cache for partition to sstable offsets.
RowCache	Cache for rows kept in memory.

Note

Misses and MissLatency are only defined for the ChunkCache

CQL Metrics#

Metrics specific to CQL prepared statement caching.

Reported name format:

Metric Name

org.apache.cassandra.metrics.CQL.<MetricName>
JMX MBean

org.apache.cassandra.metrics:type=CQL name=<MetricName>

Name	Type	Description
PreparedStatementsCount	Gauge	Number of cached prepared statements.
PreparedStatementsEvicted	Counter	Number of prepared statements evicted from the prepared statement cache
PreparedStatementsExecuted	Counter	Number of prepared statements executed.
RegularStatementsExecuted	Counter	Number of non prepared statements executed.
PreparedStatementsRatio	Gauge	Percentage of statements that are prepared vs unprepared.

DroppedMessage Metrics#

Metrics specific to tracking dropped messages for different types of requests. Dropped writes are stored and retried by Hinted Handoff

Reported name format:

Metric Name

org.apache.cassandra.metrics.DroppedMessages.<MetricName>.<Type>
JMX MBean

org.apache.cassandra.metrics:type=DroppedMetrics scope=<Type> name=<MetricName>

Name	Type	Description
CrossNodeDroppedLatency	Timer	The dropped latency across nodes.
InternalDroppedLatency	Timer	The dropped latency within node.
Dropped	Meter	Number of dropped messages.

The different types of messages tracked are:

Name	Description
BATCH_STORE	Batchlog write
BATCH_REMOVE	Batchlog cleanup (after succesfully applied)
COUNTER_MUTATION	Counter writes
HINT	Hint replay
MUTATION	Regular writes
READ	Regular reads
READ_REPAIR	Read repair
PAGED_SLICE	Paged read
RANGE_SLICE	Token range read
REQUEST_RESPONSE	RPC Callbacks
_TRACE	Tracing writes

Streaming Metrics#

Metrics reported during Streaming operations, such as repair, bootstrap, rebuild.

These metrics are specific to a peer endpoint, with the source node being the node you are pulling the metrics from.

Reported name format:

Metric Name

org.apache.cassandra.metrics.Streaming.<MetricName>.<PeerIP>
JMX MBean

org.apache.cassandra.metrics:type=Streaming scope=<PeerIP> name=<MetricName>

Name	Type	Description
IncomingBytes	Counter	Number of bytes streamed to this node from the peer.
OutgoingBytes	Counter	Number of bytes streamed to the peer endpoint from this node.

Compaction Metrics#

https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsCompactionMetrics.html

Monitoring compaction performance is an important aspect of knowing when to add capacity to your cluster. The following attributes are exposed through CompactionManagerMBean:

Attribute	Description
BytesCompacted	Total number of bytes compacted since server [re]start
CompletedTasks	Number of completed compactions since server [re]start
PendingTasks	Estimated number of compactions remaining to perform
TotalCompactionsCompleted	Total number of compactions since server [re]start

Metrics specific to Compaction work.

Reported name format:

Metric Name

org.apache.cassandra.metrics.Compaction.<MetricName>
JMX MBean

org.apache.cassandra.metrics:type=Compaction name=<MetricName>

Name	Type	Description
BytesCompacted	Counter	Total number of bytes compacted since server [re]start.
PendingTasks	Gauge	Estimated number of compactions remaining to perform.
CompletedTasks	Gauge	Number of completed compactions since server [re]start.
TotalCompactionsCompleted	Meter	Throughput of completed compactions since server [re]start.
PendingTasksByTableName	Gauge<Map<String, Map<String, Integer>>>	Estimated number of compactions remaining to perform, grouped by keyspace and then table name. This info is also kept in `Table Metrics`.

CommitLog Metrics#

Metrics specific to the CommitLog

Reported name format:

Metric Name

org.apache.cassandra.metrics.CommitLog.<MetricName>
JMX MBean

org.apache.cassandra.metrics:type=CommitLog name=<MetricName>

Name	Type	Description
CompletedTasks	Gauge	Total number of commit log messages written since [re]start.
PendingTasks	Gauge	Number of commit log messages written but yet to be fsync’d.
TotalCommitLogSize	Gauge	Current size, in bytes, used by all the commit log segments.
WaitingOnSegmentAllocation	Timer	Time spent waiting for a CommitLogSegment to be allocated - under normal conditions this should be zero.
WaitingOnCommit	Timer	The time spent waiting on CL fsync; for Periodic this is only occurs when the sync is lagging its sync interval.

Storage Metrics#

Metrics specific to the storage engine.

Reported name format:

Metric Name

org.apache.cassandra.metrics.Storage.<MetricName>
JMX MBean

org.apache.cassandra.metrics:type=Storage name=<MetricName>

Name	Type	Description
Exceptions	Counter	Number of internal exceptions caught. Under normal exceptions this should be zero.
Load	Counter	Size, in bytes, of the on disk data size this node manages.
TotalHints	Counter	Number of hint messages written to this node since [re]start. Includes one entry for each host to be hinted per hint.
TotalHintsInProgress	Counter	Number of hints attemping to be sent currently.

HintedHandoff Metrics#

Metrics specific to Hinted Handoff. There are also some metrics related to hints tracked in Storage Metrics

These metrics include the peer endpoint in the metric name

Reported name format:

Metric Name

org.apache.cassandra.metrics.HintedHandOffManager.<MetricName>
JMX MBean

org.apache.cassandra.metrics:type=HintedHandOffManager name=<MetricName>

Name	Type	Description
Hints_created-	Counter	Number of hints on disk for this peer.
Hints_not_stored-	Counter	Number of hints not stored for this peer, due to being down past the configured hint window.

SSTable Index Metrics#

Metrics specific to the SSTable index metadata.

Reported name format:

Metric Name

org.apache.cassandra.metrics.Index.<MetricName>.RowIndexEntry
JMX MBean

org.apache.cassandra.metrics:type=Index scope=RowIndexEntry name=<MetricName>

Name	Type	Description
IndexedEntrySize	Histogram	Histogram of the on-heap size, in bytes, of the index across all SSTables.
IndexInfoCount	Histogram	Histogram of the number of on-heap index entries managed across all SSTables.
IndexInfoGets	Histogram	Histogram of the number index seeks performed per SSTable.

BufferPool Metrics#

Metrics specific to the internal recycled buffer pool Cassandra manages. This pool is meant to keep allocations and GC lower by recycling on and off heap buffers.

Reported name format:

Metric Name

org.apache.cassandra.metrics.BufferPool.<MetricName>
JMX MBean

org.apache.cassandra.metrics:type=BufferPool name=<MetricName>

Name	Type	Description
Size	Gauge	Size, in bytes, of the managed buffer pool
Misses	Meter	The rate of misses in the pool. The higher this is the more allocations incurred.

Client Metrics#

Metrics specifc to client managment.

Reported name format:

Metric Name

org.apache.cassandra.metrics.Client.<MetricName>
JMX MBean

org.apache.cassandra.metrics:type=Client name=<MetricName>

Name	Type	Description
connectedNativeClients	Counter	Number of clients connected to this nodes native protocol server
connectedThriftClients	Counter	Number of clients connected to this nodes thrift protocol server

JVM Metrics#

JVM metrics such as memory and garbage collection statistics can either be accessed by connecting to the JVM using JMX or can be exported using Metric Reporters.

BufferPool#

Metric Name

jvm.buffers.<direct|mapped>.<MetricName>
JMX MBean

java.nio:type=BufferPool name=<direct|mapped>

Name	Type	Description
Capacity	Gauge	Estimated total capacity of the buffers in this pool
Count	Gauge	Estimated number of buffers in the pool
Used	Gauge	Estimated memory that the Java virtual machine is using for this buffer pool

FileDescriptorRatio#

Metric Name

jvm.fd.<MetricName>
JMX MBean

java.lang:type=OperatingSystem name=<OpenFileDescriptorCount|MaxFileDescriptorCount>

Name	Type	Description
Usage	Ratio	Ratio of used to total file descriptors

GarbageCollector#

Metric Name

jvm.gc.<gc_type>.<MetricName>
JMX MBean

java.lang:type=GarbageCollector name=<gc_type>

Name	Type	Description
Count	Gauge	Total number of collections that have occurred
Time	Gauge	Approximate accumulated collection elapsed time in milliseconds

Memory#

Metric Name

jvm.memory.<heap/non-heap/total>.<MetricName>
JMX MBean

java.lang:type=Memory

Committed	Gauge	Amount of memory in bytes that is committed for the JVM to use
Init	Gauge	Amount of memory in bytes that the JVM initially requests from the OS
Max	Gauge	Maximum amount of memory in bytes that can be used for memory management
Usage	Ratio	Ratio of used to maximum memory
Used	Gauge	Amount of used memory in bytes

MemoryPool#

Metric Name

jvm.memory.pools.<memory_pool>.<MetricName>
JMX MBean

java.lang:type=MemoryPool name=<memory_pool>

Committed	Gauge	Amount of memory in bytes that is committed for the JVM to use
Init	Gauge	Amount of memory in bytes that the JVM initially requests from the OS
Max	Gauge	Maximum amount of memory in bytes that can be used for memory management
Usage	Ratio	Ratio of used to maximum memory
Used	Gauge	Amount of used memory in bytes

Metrics

目录

Metrics#

Thread pool and read/write latency statistics#

Metrics of a thread pool#

Dropped Messages#

Read/Write latency metrics#

Table Metrics#

Keyspace Metrics#

Client Request Metrics#

Cache Metrics#

CQL Metrics#

DroppedMessage Metrics#

Streaming Metrics#

Compaction Metrics#

CommitLog Metrics#

Storage Metrics#

HintedHandoff Metrics#

SSTable Index Metrics#

BufferPool Metrics#

Client Metrics#

JVM Metrics#

BufferPool#

FileDescriptorRatio#

GarbageCollector#

Memory#

MemoryPool#