TCP Reset 的原因#
TCP RST and unread socket recv buffer#
如果应用没有完全读完一个 socket 的 recv buffer 内的数据,就 close socket。那么 Kernel 会以 TCP RST 结束连接,而不是 FIN.
一个 Envoy 下的猜想场景是:
一些场景下,Envoy 只要读完 downstream HTTP Header 就已经确认请求有问题,直接 socket write HTTP Response,且 close(fd) 了,完全不理 downstream 发过来的 HTTP Body。 kernel 看到 Envoy 连在 recv buffer 中的 HTTP Body 都没读完就 close,于是发了 RST 给 downstream。
The implementation of tcp_close
on the kernel, in the file net/ipv4/tcp.c
.
The kernel is explained as follows:
/* As outlined in RFC 2525, section 2.17, we send a RST here because
* data was lost. To witness the awful effects of the old behavior of
* always doing a FIN, run an older 2.1.x kernel or 2.0.x, start a bulk
* GET in an FTP client, suspend the process, wait for the client to
* advertise a zero window, then kill -9 the FTP client, wheee...
* Note: timeout is always zero in such a case.
*/
if (unlikely(tcp_sk(sk)->repair)) {
sk->sk_prot->disconnect(sk, 0);
} else if (data_was_unread) {
/* Unread data was tossed, zap the connection. */
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONCLOSE);
tcp_set_state(sk, TCP_CLOSE);
tcp_send_active_reset(sk, sk->sk_allocation);
}
Known TCP Implementation Problems(rfc2525)
2.17. Name of Problem Failure to RST on close with data pending Classification Resource management Description When an application closes a connection in such a way that it can no longer read any received data, the TCP SHOULD, per section 4.2.2.13 of RFC 1122, send a RST if there is any unread received data, or if any new data is received. A TCP that fails to do so exhibits "Failure to RST on close with data pending". Note that, for some TCPs, this situation can be caused by an application "crashing" while a peer is sending data. We have observed a number of TCPs that exhibit this problem. The problem is less serious if any subsequent data sent to the now- closed connection endpoint elicits a RST (see illustration below). Significance This problem is most significant for endpoints that engage in large numbers of connections, as their ability to do so will be curtailed as they leak away resources. Implications Failure to reset the connection can lead to permanently hung connections, in which the remote endpoint takes no further action to tear down the connection because it is waiting on the local TCP to first take some action. This is particularly the case if the local TCP also allows the advertised window to go to zero, and fails to tear down the connection when the remote TCP engages in "persist" probes (see example below). Relevant RFCs RFC 1122 section 4.2.2.13. Also, 4.2.2.17 for the zero-window probing discussion below. Trace file demonstrating it Made using tcpdump. No drop information available. ... How to detect The problem can often be detected by inspecting packet traces of a transfer in which the receiving application terminates abnormally. When doing so, there can be an ambiguity (if only looking at the trace) as to whether the receiving TCP did indeed have unread data that it could now no longer deliver. To provoke this to happen, it may help to suspend the receiving application so that it fails to consume any data, eventually exhausting the advertised window. At this point, since the advertised window is zero, we know that the receiving TCP has undelivered data buffered up. Terminating the application process then should suffice to test the correctness of the TCP's behavior.
参考:
TCP RST: Calling close() on a socket with data in the receive queue
Consider two peers, A and B, communicating via TCP. If B closes a socket and there is any data in B’s receive queue, B sends a TCP RST to A instead of following the standard TCP closing protocol, resulting in an error return value from recv( ).
A B send() data → data → data → recv()→ERROR ← RST close( )When might this situation occur? Consider a simple protocol where A sends 5 prime numbers to B, and B responds with “OK” and closes the connection. If B receives a nonprime number, it assumes A is confused, responds with “ERROR”, and closes the connection. If A sends 5 numbers and the second number is not prime, then B will send “ERROR” and call close( ) with data (3 more numbers) in its receive queue. This will cause a TCP RST to be sent to A. Meanwhile, A is blocked on recv( ) awaiting word from B. The RST from B causes A’s recv( ) to return an error (-1) so A never receives the “ERROR” message from B.
So why not just have B read all 5 prime numbers before closing? Consider the case where A sends 6 prime numbers. Here B reads the first 5 prime numbers and closes the socket with the 6th prime still in its receive queue, resulting in a RST.
TCP RST and SO_LINGER#
/kernel/network/kernel-tcp/socket-opt-linger/ref/tcprst-linger.md
TCP RST and close(fd)#
如果程序对 socket fd 执行了 close(fd)
后, kernel 会:
发出 FIN
关闭 fd 如果同时对端发来数据。 kernel 将丢弃数据,且以 RST 回应。因为 fd 已经关闭,程序不能再读写 socket 了。