Technology behind the Killer NIC

We will not be spending several pages and displaying numerous charts in an attempt to explain in absolute detail how the networking architecture and technology operates. Instead we will provide a high level technology overview in our explanations, which will hopefully provide the basic information needed to show why there are advantages in offloading the data packet processing from the CPU to a dedicated processing unit. Other technologies such as RDMA and Onloading are available but in the interest of space and keeping our readers awake we will not detail these options.

The basic technology the Killer NIC utilizes has been in the corporate server market for a few years. One of the most prevalent technologies utilized and the one our Killer NIC is based upon is the TCP/IP Offload Engine (TOE). TOE technology (okay that phrase deserves a laugh) is basically designed to offload all tasks associated with protocol processing from the main system processor and move it to the TOE network interface cards (TNIC). TOE technology also consists of software extensions to existing TCP/IP stacks within the operating system that enable the use of these dedicated hardware data planes for packet processing.

The process required to place packets of data inside TCP/IP packets can consume a significant amount CPU cycles dependent upon the size of the packet and amount of traffic. These dedicated cards have proven very effective in relieving TCP/IP packet processing from the CPU resulting in greater system performance from the server. The process allows the system's CPU to recover lost cycles so that applications that are CPU bound are now unaffected by TCP/IP processing. This technology is very beneficial in a corporate server or datacenter environment where there is a heavy volume of traffic that usually consists of large blocks of data being transferred, but does it really belong on your desktop where the actual CPU overhead is generally minimal? Before we address this question we need to take a further look at how the typical NIC operates.

The standard NIC available today usually processes TCP/IP operations in software that can create a substantial system overhead depending upon the network traffic on the host machine. Typically the areas that create increased system overhead are data copies along with protocol and interrupt processing. When a NIC receives a typical data packet, a series of interactions with the CPU begins which will handle the data and route it to the appropriate application. The CPU is first notified there is a data packet waiting and generally the processor will read the packet header and determine the contents of the data payload. It then requests the data payload and after verifying it, delivers it to the waiting application.

These data packets are buffered or queued on the host system. Depending upon the size and volume of the packets this constant fetching of information can create additional delays due to memory latencies and/or poor buffer management. The majority of standard desktop NICs also incorporate hardware checksum support and additional software enhancements to help eliminate transmit-data copies. This is advantageous when combined with packet prioritization techniques to control and enhance outbound traffic with intelligent queuing algorithms.

However, these same NICs cannot eliminate the receive-data copy routines that consume the majority of processor cycles in this process. A TNIC performs protocol processing on its dedicated processor before placing the data on the host system. TNICs will generally use zero-copy algorithms to place the packet data directly into the application buffers or memory. This routine bypasses the normal process of handshakes between the processor, NIC, memory, and application resulting in greatly reduced system overhead depending upon the packet size.

Most corporate or data center networks deal with large data payloads that typically are 8 Kbit/sec up to 64 Kbit/sec in nature (though we fully understand this can vary greatly). Our example will involve a 32 Kbit/sec application packet receipt that usually results in thirty or more interrupt-generating events between the host system and a typical NIC. Each of these multiple events are required to buffer the information, generate the data into Ethernet packets, process the incoming acknowledgements, and send the data to the waiting application. This process basically reverses itself if a reply is generated by the application and returned to the sender. This entire process can create significant protocol-processing overhead, memory latencies, and interrupt delays on the host system. We need to reiterate that our comments about "significant" system overhead are geared towards a corporate server or datacenter environment and not the typical desktop.

Depending upon the application and network traffic a TNIC can greatly reduce the network transaction load on the host system by changing the transaction process from one event per Ethernet packet to one event per application network I/O. The 32 Kbit/sec application packet process now becomes a single data-path offload process that moves all data packet processing to the TNIC. This eliminates the thirty or so interrupts along with the majority of system overhead required to process this single packet. In a data center or corporate server environment with large content delivery requirements to multiple users the savings in system overhead due to network transactions can have a significant impact. In some instances replacing a standard NIC in the server with a TNIC almost has the same effect as adding another CPU. That's an impressive savings in cost and power requirements, but once again is this technology needed on the desktop?

BigFoot Networks believes it is and we will see what they have to say about it and their technology next.

Index Killer NIC Technology
Comments Locked

87 Comments

View All Comments

  • Gary Key - Tuesday, October 31, 2006 - link

    quote:

    I don't mean to be a jerk, and I appreciate any sincere and fact-finding test/review article.


    I fully agree the article was probably too long. It was a case of trying to cover all the bases and then some. If we had left out the technology sections and reduced the commentary it would have read better as a basic hardware item. We looked at this as not being your basic NIC review.However, I am sure there would have been comments that we did not properly review the card or provide this same information. Thanks for the comments.
  • Crassus - Wednesday, November 1, 2006 - link

    I agree with the comment above. I would have like an even more expanded page detailing the technology and the roots in the corporate sector. What I didn't really care about was the endless description of the pains it took to benchmark the card.
    Two things about that:
    1. If it was easy, everyone could do it. You (and Anandtech) stand above the crowd for going the extra mile and giving us some added (useful) information. This is usually self-evident and doesn't require elaboration.
    2. My firm expects me to get the job done, as, I suppose, it is the same with yours. No one gives a hoot as to all the steps I had to go through to get the job done, unless they offer some added value. Thinking about throwing something out of the window (if you're blessed with having one in your office) occurs to everyone at some point and certainly doesn't hold any additional value - in other words: it comes with the job. If it was otherwise, see (1) above. There's really no need to mention it a couple of times - unless you're reviewing your work instead of the product.
  • Gary Key - Wednesday, November 1, 2006 - link

    quote:

    What I didn't really care about was the endless description of the pains it took to benchmark the card.


    I appreciate your comments. I am alawys open to other viewpoints and opinions. What paragraphs contained endless descriptions that in your opinion could have been cut? Email me if you can please.

    quote:

    Thinking about throwing something out of the window (if you're blessed with having one in your office) occurs to everyone at some point and certainly doesn't hold any additional value - in other words: it comes with the job.


    I agree it comes with the job. The message I was trying to convey was one of total frustration with the product after six weeks of almost non-stop testing. There were several choice words I wanted to use but felt like that statement would be universally understood. ;-)
  • Sunrise089 - Tuesday, October 31, 2006 - link

    I really liked reading the article. When G80 comes out, we can cut strait to the benches, because I'm going to want to know whether or not to buy the card. None of us are going to buy this thing, but we're all enthusiests, so reading about it can still be fun. With performance changes so minor however, adding a little commentrary to spice up the review makes it a lot more entertaining for this reader.
  • Frumious1 - Tuesday, October 31, 2006 - link

    I'm in agreement with Sunrise - liked the article and the sarcasm. I can only imagine your pain during the review. Can't believe how many people apparently lack the ability to read and need pictures. "Just give us two paragraphs saying whether or not to buy the card!" Bah! That's what the conclusion page is for, where it's pretty clear the card "works as advertised" which means fractional gains in a few games.
  • Zaitsev - Tuesday, October 31, 2006 - link

    "Just give us two paragraphs saying whether or not to buy the card!"

    The only reason I still read Anandtech is because they do exactly the opposite. In articles like this one and the Conroe review, I think the pages discussing the technology are more interesting than the results. I can't talk from experience, but it also seems that it would get boring for the authors if they just punched out cookie cutter articles for every review.

    As for the card, I wouldn't be able to sleep at night if I bought this instead of a Conroe.
  • michal1980 - Tuesday, October 31, 2006 - link

    i can sum in up for you in one line.

    "In most cases the Killer-Nic Does Nothing"


    as for windows vista.

    it has a total new audio stack that is seperate from the kenernal, so in theory it could run on a core other then the main os kerenal.
  • Googer - Tuesday, October 31, 2006 - link

    FNA is the only thing that makes a killer nic really worthwhile.

    http://www.extremetech.com/article2/0,1697,2037279...">http://www.extremetech.com/article2/0,1697,2037279...
  • cryptonomicon - Tuesday, October 31, 2006 - link

    Assuming the review quantified "ping measurements" correctly, this thing has a long way to go. If it gave even a consistent 10% faster pings all the time it would be very appealing to pro-gaming. But from those ping charts, the results were truely inconclusive. The side effect of increased FPS was even more significant than any ping reduction.

    Looking forward to revisions or later models from Bigfoot though!
  • floffe - Tuesday, October 31, 2006 - link

    That's because in most cases 98% of the ping is not on the local computer, but from your internet connection point (DSL/cable modem or whatever) to the server. Tis means even cutting 5% off that will be very hard (in general. WoW seems to be an exception).

Log in

Don't have an account? Sign up now