List tracepoints#
by perf#
root@worknode5:/home/labile# perf list tracepoint
List of pre-defined events (to be used in -e):
tcp:tcp_destroy_sock [Tracepoint event]
tcp:tcp_probe [Tracepoint event]
tcp:tcp_rcv_space_adjust [Tracepoint event]
tcp:tcp_receive_reset [Tracepoint event]
tcp:tcp_retransmit_skb [Tracepoint event]
tcp:tcp_retransmit_synack [Tracepoint event]
tcp:tcp_send_reset [Tracepoint event]
Metric Groups:
sock:inet_sock_set_state [Tracepoint event]
Metric Groups:
By BCC tplist#
sudo tplist-bpfcc -v 'tcp:*'
bpftrace#
bpftrace -l 't:sock:*'
By kernel filesystem#
root@worknode5:/home/labile# sudo ls /sys/kernel/debug/tracing/events
alarmtimer btrfs compaction devlink exceptions filelock ftrace huge_memory initcall irq kmem mce module neigh page_isolation power ras regulator rtc skb swiotlb tcp tlb wbt xdp
block cgroup cpuhp dma_fence ext4 filemap gpio hwmon intel_iommu irq_matrix kvm mdio mpx net page_pool printk raw_syscalls resctrl sched smbus sync_trace thermal udp workqueue xen
bpf_test_run clk cros_ec drm fib fs header_event hyperv iocost irq_vectors kvmmmu migrate msr nmi pagemap qdisc rcu rpm scsi sock syscalls thermal_power_allocator vmscan writeback xhci-hcd
bridge cma devfreq enable fib6 fs_dax header_page i2c iommu jbd2 libata mmc napi oom percpu random regmap rseq signal spi task timer vsyscall x86_fpu
root@worknode5:/home/labile# sudo ls /sys/kernel/debug/tracing/events/tcp/
enable filter tcp_destroy_sock tcp_probe tcp_rcv_space_adjust tcp_receive_reset tcp_retransmit_skb tcp_retransmit_synack tcp_send_reset
root@worknode5:/home/labile# sudo cat /sys/kernel/debug/tracing/events/tcp/tcp_retransmit_skb/format
name: tcp_retransmit_skb
ID: 1400
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:const void * skbaddr; offset:8; size:8; signed:0;
field:const void * skaddr; offset:16; size:8; signed:0;
field:int state; offset:24; size:4; signed:1;
field:__u16 sport; offset:28; size:2; signed:0;
field:__u16 dport; offset:30; size:2; signed:0;
field:__u8 saddr[4]; offset:32; size:4; signed:0;
field:__u8 daddr[4]; offset:36; size:4; signed:0;
field:__u8 saddr_v6[16]; offset:40; size:16; signed:0;
field:__u8 daddr_v6[16]; offset:56; size:16; signed:0;
print fmt: "sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s", REC->sport, REC->dport, REC->saddr, REC->daddr, REC->saddr_v6, REC->daddr_v6, __print_symbolic(REC->state, { TCP_ESTABLISHED, "TCP_ESTABLISHED" }, { TCP_SYN_SENT, "TCP_SYN_SENT" }, { TCP_SYN_RECV, "TCP_SYN_RECV" }, { TCP_FIN_WAIT1, "TCP_FIN_WAIT1" }, { TCP_FIN_WAIT2, "TCP_FIN_WAIT2" }, { TCP_TIME_WAIT, "TCP_TIME_WAIT" }, { TCP_CLOSE, "TCP_CLOSE" }, { TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" }, { TCP_LAST_ACK, "TCP_LAST_ACK" }, { TCP_LISTEN, "TCP_LISTEN" }, { TCP_CLOSING, "TCP_CLOSING" }, { TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })
Tracepoint definition#
https://blogs.oracle.com/linux/post/taming-tracepoints-in-the-linux-kernel
Tracepoints are defined in header files under include/trace/events.
Our netif_rx tracepoint is defined in include/trace/events/net.h.
Each tracepoint definition consists of a description of
TP_PROTO, the function prototype used in calling the tracepoint. In the case of netif_rx(), that’s simply struct sk_buff *skb
TP_ARGS, the argument names; (skb)
TP_STRUCT__entry field definitions; these correspond to the fields which are assigned when the tracepoint is triggered
TP_fast_assign statements which take the raw argument to the tracepoint (the skb) and set the associated field values (skb len, skb pointer etc)
TP_printk which is responsible for using those field values to display a relevant tracing message
Note: tracepoints defined via TRACE_EVENT specify all of the above, whereas we can also define an event class which shares fields, assignments and messages. In fact netif_rx is of event class net_dev_template, so our field assignments and message come from that event class.
Tracepoint arguments and format string#
cat /sys/kernel/debug/tracing/events/block/block_rq_issue/format
cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_accept4/format
name: sys_enter_accept4
ID: 1369
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int __syscall_nr; offset:8; size:4; signed:1;
field:int fd; offset:16; size:8; signed:0;
field:struct sockaddr * upeer_sockaddr; offset:24; size:8; signed:0;
field:int * upeer_addrlen; offset:32; size:8; signed:0;
field:int flags; offset:40; size:8; signed:0;
print fmt: "fd: 0x%08lx, upeer_sockaddr: 0x%08lx, upeer_addrlen: 0x%08lx, flags: 0x%08lx", ((unsigned long)(REC->fd)), ((unsigned long)(REC->upeer_sockaddr)), ((unsigned long)(REC->upeer_addrlen)), ((unsigned long)(REC->flags))
cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_connect/format
name: sys_enter_connect
ID: 1365
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int __syscall_nr; offset:8; size:4; signed:1;
field:int fd; offset:16; size:8; signed:0;
field:struct sockaddr * uservaddr; offset:24; size:8; signed:0;
field:int addrlen; offset:32; size:8; signed:0;
print fmt: "fd: 0x%08lx, uservaddr: 0x%08lx, addrlen: 0x%08lx", ((unsigned long)(REC->fd)), ((unsigned long)(REC->uservaddr)), ((unsigned long)(REC->addrlen))
cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_bind/format
name: sys_enter_bind
ID: 1373
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int __syscall_nr; offset:8; size:4; signed:1;
field:int fd; offset:16; size:8; signed:0;
field:struct sockaddr * umyaddr; offset:24; size:8; signed:0;
field:int addrlen; offset:32; size:8; signed:0;
print fmt: "fd: 0x%08lx, umyaddr: 0x%08lx, addrlen: 0x%08lx", ((unsigned long)(REC->fd)), ((unsigned long)(REC->umyaddr)), ((unsigned long)(REC->addrlen))
Tracepoint interface#
Kernel File System: /sys/kernel/debug/tracing
perf_event_open(2) syscall
Tracepoint for TCP#
https://www.brendangregg.com/blog/2018-03-22/tcp-tracepoints.html
During development I talked to Song (and Alexei Starovoitov) about the recent additions, so I already have an idea about why these exist and their use. Some rough notes for the current TCP tracepoints:
tcp:tcp_retransmit_skb: Traces retransmits. Useful for understanding network issues including congestion. Will be used by my tcpretrans tools instead of kprobes.
tcp:tcp_retransmit_synack: Tracing SYN/ACK retransmits. I imagine this could be used for a type of DoS detection (SYN flood triggering SYN/ACKs and then retransmits). This is separate from tcp:tcp_retransmit_skb because this event doesn’t have an skb.
tcp:tcp_destroy_sock: Needed by any program that summarizes details in-memory for a TCP session, which would be keyed by the sock address. This probe can be used to know that the session has ended, so that sock address is about to be reused and any summarized data so far should be consumed and then deleted.
tcp:tcp_send_reset: This traces a RST send during a valid socket, to diagnose those type of issues.
tcp:tcp_receive_reset: Trace a RST receive.
tcp:tcp_probe: for TCP congestion window tracing, which also allowed an older TCP probe module to be deprecated and removed. This was added by Masami Hiramatsu, and merged in Linux 4.16.
sock:inet_sock_set_state: Can be used for many things. The tcplife tool is one, but also my tcpconnect and tcpaccept bcc tools can be converted to use this tracepoint. We could add separate
tcp:tcp_connectandtcp:tcp_accepttracepoints (ortcp:tcp_active_openandtcp:tcp_passive_open), butsock:inet_sock_set_statecan be used for this.
Ref.#
https://www.kernel.org/doc/Documentation/trace/tracepoints.txt
https://blogs.oracle.com/linux/taming-tracepoints-in-the-linux-kernel
http://www.brendangregg.com/blog/2018-03-22/tcp-tracepoints.html
[BPF Performance Tools]
https://blogs.oracle.com/linux/post/taming-tracepoints-in-the-linux-kernel