307 lines
12 KiB
Markdown
307 lines
12 KiB
Markdown
---
|
|
created_at: '2015-09-22T16:48:38.000Z'
|
|
title: TCP in 30 instructions (1993)
|
|
url: http://www.pdl.cmu.edu/mailinglists/ips/mail/msg00133.html?
|
|
author: majke
|
|
points: 108
|
|
story_text:
|
|
comment_text:
|
|
num_comments: 7
|
|
story_id:
|
|
story_title:
|
|
story_url:
|
|
parent_id:
|
|
created_at_i: 1442940518
|
|
_tags:
|
|
- story
|
|
- author_majke
|
|
- story_10259805
|
|
objectID: '10259805'
|
|
|
|
---
|
|
[Source](http://www.pdl.cmu.edu/mailinglists/ips/mail/msg00133.html? "Permalink to Fwd: TCP in 30 instructions (Was Re: Karl Auerbach: Re: Storage over Eth")
|
|
|
|
# Fwd: TCP in 30 instructions (Was Re: Karl Auerbach: Re: Storage over Eth
|
|
|
|
![][1]
|
|
|
|
| ----- |
|
|
|
|
|
|
|
| **SORT BY:**
|
|
|
|
[** LIST ORDER**][2]
|
|
[** THREAD**][3]
|
|
[**AUTHOR**][4]
|
|
[**SUBJECT**][5] |
|
|
|
|
|
|
|
|
|
|
|
**SEARCH ** | |
|
|
|
|
|
|
|
[** IPS HOME **][6] |
|
|
* * *
|
|
|
|
[[Date Prev][7]][[Date Next][8]][[Thread Prev][9]][[Thread Next][10]][[Date Index][11]][[Thread Index][12]]
|
|
|
|
# Fwd: TCP in 30 instructions (Was Re: Karl Auerbach: Re: Storage over Ethernet/IP)
|
|
|
|
* * *
|
|
|
|
* **To**: **Dave Nagle <[bassoon@yogi.ece.cmu.edu][13]>**
|
|
* **Subject**: **Fwd: TCP in 30 instructions (Was Re: Karl Auerbach: Re: Storage over Ethernet/IP)**
|
|
* **From**: **Zachary Amsden <[zamsden@cthulhu.engr.sgi.com][14]>**
|
|
* Date: Fri, 26 May 2000 15:16:14 -0700
|
|
* Cc: [ips@ece.cmu.edu][15]
|
|
* Delivery-Date: Fri May 26 18:02:33 2000
|
|
* In-Reply-To: Your message of "Fri, 26 May 2000 12:52:46 EDT." <[200005261652.MAA06069@yogi.ece.cmu.edu][9]>
|
|
* Sender: [owner-ips@ece.cmu.edu][16]
|
|
* * *
|
|
|
|
Received: from rx7.ee.lbl.gov by uu2.psi.com
|
|
(5.65b/4.0.071791-PSI/PSINet) via SMTP;
|
|
id AA12583 for popbbn; Wed, 8 Sep 93 01:29:46 -0400
|
|
Received: by rx7.ee.lbl.gov for craig@aland.bbn.com (5.65/1.44r)
|
|
id AA05271; Tue, 7 Sep 93 22:30:15 -0700
|
|
Message-Id: <9309080530.AA05271@rx7.ee.lbl.gov>
|
|
To: Craig Partridge <craig@aland.bbn.com>
|
|
Cc: David Clark <ddc@lcs.mit.edu>
|
|
Subject: Re: query about TCP header on tcp-ip
|
|
In-Reply-To: Your message of Tue, 07 Sep 93 09:48:00 PDT.
|
|
Date: Tue, 07 Sep 93 22:30:14 PDT
|
|
From: Van Jacobson <van@ee.lbl.gov>
|
|
|
|
Craig,
|
|
|
|
As you probably remember from the "High Speed TCP" CNRI meeting,
|
|
my kernel looks nothing at all like any version of BSD. Mbufs
|
|
no longer exist, for example, and `netipl' and all the protocol
|
|
processing that used to be done at netipl interrupt level are
|
|
gone. TCP receive packet processing in the new kernel really is
|
|
about 30 instructions on a RISC (33 on a sparc but three of
|
|
those are compiler braindamage). Attached is the C code & the
|
|
associated sparc assembler.
|
|
|
|
A brief recap of the architecture: Packets go in 'pbufs' which
|
|
are, in general, the property of a particular device. There is
|
|
exactly one, contiguous, packet per pbuf (none of that mbuf
|
|
chain stupidity). On the packet input interrupt, the device
|
|
driver upcalls through the protocol stack (no more of that queue
|
|
packet on the netipl software interrupt bs). The upcalls
|
|
percolate up the stack until the packet is completely serviced
|
|
(e.g., NFS requests that can be satisfied from in-memory data &
|
|
data structures) or they reach a routine that knows the rest of
|
|
the processing must be done in some process's context in which
|
|
case the packet is laid on some appropriate queue and the
|
|
process is unblocked. In the case of TCP, the upcalls almost
|
|
always go two levels: IP finds the datagram is for this host &
|
|
it upcalls a TCP demuxer which hashes the ports + SYN to find a
|
|
PCB, lays the packet on the tail of the PCB's queue and wakes up
|
|
any process sleeping on the PCB. The IP processing is about 25
|
|
instructions & the demuxer is about 10.
|
|
|
|
As Dave noted, the two processing paths that need the most
|
|
tuning are the data packet send & receive (since at most every
|
|
other packet is acked, there will be at least twice as many data
|
|
packets as ack packets). In the new system, the receiving
|
|
process calls 'tcp_usrrecv' (the protocol specific part of the
|
|
'recv' syscall) or is already blocked there waiting for new
|
|
data. So the following code is embedded in a loop at the start of
|
|
tcp_usrrecv that spins taking packets off the pcb queue until
|
|
there's no room for the next packet in the user buffer or the
|
|
queue is empty. The TCP protocol processing is done as we
|
|
remove packets from the queue & copy their data to user space
|
|
(and since we're in process context, it's possible to do a
|
|
checksum-and-copy).
|
|
|
|
Throughout this code, 'tp' points to the pcb and 'ti' points to
|
|
the tcp header of the first packet on the queue (the ip header was
|
|
stripped as part of interrupt level ip processing). The header info
|
|
(excluding the ports which are implicit in the pcb) are sucked out
|
|
of the packet into registers [this is to minimize cache thrashing and
|
|
possibly to take advantage of 64 bit or longer loads]. Then the
|
|
header checksum is computed (tp->ph_sum is the precomputed pseudo-header
|
|
checksum + src & dst ports).
|
|
|
|
int tcp_usrrecv(struct uio* uio, struct socket* so)
|
|
{
|
|
struct tcpcb *tp = (struct tcpcb *)so->so_pcb;
|
|
register struct pbuf* pb;
|
|
|
|
while ((pb = tp->tp_inq) != 0) {
|
|
register int len = pb->len;
|
|
struct tcphdr *ti = (struct tcphdr *)pb->dat;
|
|
|
|
u_long seq = ((u_long*)ti)[1];
|
|
u_long ack = ((u_long*)ti)[2];
|
|
u_long flg = ((u_long*)ti)[3];
|
|
u_long sum = ((u_long*)ti)[4];
|
|
u_long cksum = tp->ph_sum;
|
|
|
|
/* NB - ocadd is an inline gcc assembler function */
|
|
cksum = ocadd(ocadd(ocadd(ocadd(cksum, seq), ack), flg),
|
|
sum);
|
|
|
|
Next is the header prediction check which is probably the most
|
|
opaque part of the code. tp->pred_flags contains snd_wnd (the
|
|
window we expect in incoming packets) in the bottom 16 bits and
|
|
0x4x10 in the top 16 bits. The 'x' is normally 0 but will be
|
|
set non-zero if header prediction shouldn't be done (e.g., if
|
|
not in established state, if retransmitting, if hole in seq
|
|
space, etc.). So, the first term of the 'if' checks four
|
|
different things simultaneously:
|
|
- that the window is what we expect
|
|
- that there are no tcp options
|
|
- that the packet has ACK set & doesn't have SYN, FIN, RST or URG set
|
|
- that the connection is in the right state
|
|
and the 2nd term of the if checks that the packet is in sequence:
|
|
|
|
#define FMASK (((0xf000 | TH_SYN|TH_FIN|TH_RST|TH_URG|TH_ACK) << 16) |
|
|
0xffff)
|
|
|
|
if ((flg & FMASK) == tp->pred_flags && seq == tp->rcv_nxt) {
|
|
|
|
The next few lines are pretty obvious -- we subtract the header
|
|
length from the total length and if it's less than zero the packet
|
|
was malformed, if it's zero we must have a pure ack packet & we
|
|
do the ack stuff otherwise if the ack field didn't move we have
|
|
a pure data packet which we copy to the user's buffer, checksumming
|
|
as we go, then update the pcb state if everything checks:
|
|
|
|
len -= 20;
|
|
if (len <= 0) {
|
|
if (len < 0) {
|
|
/* packet malformed */
|
|
} else {
|
|
/* do pure ack things */
|
|
}
|
|
} else if (ack == tp->snd_una) {
|
|
cksum = in_uiomove((u_char*)ti + 20, len, uio,
|
|
cksum);
|
|
if (cksum != 0) {
|
|
/* packet or user buffer errors */
|
|
}
|
|
seq += len;
|
|
tp->rcv_nxt = seq;
|
|
if ((int)(seq - tp->rcv_acked) >= 0) {
|
|
/* send ack */
|
|
} else {
|
|
/* free pbuf */
|
|
}
|
|
continue;
|
|
}
|
|
}
|
|
/* header prediction failed -- take long path */
|
|
...
|
|
|
|
That's it. On the normal receive data path we execute 16 lines of
|
|
C which turn into 33 instructions on a sparc (it would be 30 if I
|
|
could trick gcc into generating double word loads for the header
|
|
& my carefully aligned pcb fields). I think you could get it down
|
|
around 25 on a cray or big-endian alpha since the loads, checksum calc
|
|
and most of the compares can be done on 64 bit quantities (e.g.,
|
|
you can combine the seq & ack tests into one).
|
|
|
|
Attached is the sparc assembler for the above receive path. Hope
|
|
this explains Dave's '30 instruction' assertion. Feel free to
|
|
forward this to tcp-ip or anyone that might be interested.
|
|
|
|
- Van
|
|
|
|
----------------
|
|
ld [%i0+4],%l3 ! load packet tcp header fields
|
|
ld [%i0+8],%l4
|
|
ld [%i0+12],%l2
|
|
ld [%i0+16],%l0
|
|
|
|
ld [%i1+72],%o0 ! compute header checksum
|
|
addcc %l3,%o0,%o3
|
|
addxcc %l4,%o3,%o3
|
|
addxcc %l2,%o3,%o3
|
|
addxcc %l0,%o3,%o3
|
|
|
|
sethi %hi(268369920),%o1 ! check if hdr. pred possible
|
|
andn %l2,%o1,%o1
|
|
ld [%i1+60],%o2
|
|
cmp %o1,%o2
|
|
bne L1
|
|
ld [%i1+68],%o0
|
|
cmp %l3,%o0
|
|
bne L1
|
|
addcc %i2,-20,%i2
|
|
bne,a L3
|
|
ld [%i1+36],%o0
|
|
! packet error or ack processing
|
|
...
|
|
L3:
|
|
cmp %l4,%o0
|
|
bne L1
|
|
add %i0,20,%o0
|
|
mov %i2,%o1
|
|
call _in_uiomove,0
|
|
mov %i3,%o2
|
|
cmp %o0,0
|
|
be L6
|
|
add %l3,%i2,%l3
|
|
! checksum error or user buffer error
|
|
...
|
|
L6:
|
|
ld [%i1+96],%o0
|
|
subcc %l3,%o0,%g0
|
|
bneg L7
|
|
st %l3,[%i1+68]
|
|
! send ack
|
|
...
|
|
br L8
|
|
L7:
|
|
! free pbuf
|
|
...
|
|
L8: ! done with this packet - continue
|
|
...
|
|
|
|
L1: ! hdr pred. failed - do it the hard way
|
|
|
|
|
|
|
|
|
|
|
|
* * *
|
|
|
|
* **References**:
|
|
* [**Karl Auerbach: Re: Storage over Ethernet/IP][9]**
|
|
* _From:_ Dave Nagle <bassoon@yogi.ece.cmu.edu>
|
|
* Prev by Date: [**Peter Johansson: Re: IETF mailing list question on Storage over Ethernet/IP][7]**
|
|
* Next by Date: [**RE: Storage over Ethernet/IP][8]**
|
|
* Prev by thread: [**Karl Auerbach: Re: Storage over Ethernet/IP][9]**
|
|
* Next by thread: [**Valdis.Kletnieks@vt.edu: Re: Storage over Ethernet/IP][10]**
|
|
* Index(es):
|
|
* [**Date**][11]
|
|
* [**Thread**][12]
|
|
* * *
|
|
|
|
** [Home][17] **
|
|
|
|
Last updated: Tue Sep 04 01:08:15 2001
|
|
6315 messages in chronological order
|
|
|
|
[1]: http://www.pdl.cmu.edu/mailinglists/ips/images/redbar_flag4.gif
|
|
[2]: http://www.pdl.cmu.edu/maillist.html
|
|
[3]: http://www.pdl.cmu.edu/threads.html
|
|
[4]: http://www.pdl.cmu.edu/author.html
|
|
[5]: http://www.pdl.cmu.edu/subject.html
|
|
[6]: http://www.ece.cmu.edu/~ips
|
|
[7]: http://www.pdl.cmu.edu/msg00132.html
|
|
[8]: http://www.pdl.cmu.edu/msg00134.html
|
|
[9]: http://www.pdl.cmu.edu/msg00126.html
|
|
[10]: http://www.pdl.cmu.edu/msg00129.html
|
|
[11]: http://www.pdl.cmu.edu/mail62.html#00133
|
|
[12]: http://www.pdl.cmu.edu/thrd2.html#00133
|
|
[13]: mailto:bassoon%40yogi.ece.cmu.edu
|
|
[14]: mailto:zamsden%40cthulhu.engr.sgi.com
|
|
[15]: mailto:ips%40ece.cmu.edu
|
|
[16]: mailto:owner-ips%40ece.cmu.edu
|
|
[17]: http://sip.pdl.cs.cmu.edu
|
|
|