Testing:

On the replay machine install tcpreplay. Connect the receiving machine to the network and bring the interface up (IP address doesn't matter). Replay one (or all) of the packets attached to this post from the replay machine:

As described in my blog post here I experienced an issue with certain Intel ethernet controllers. Here's how to see if your controllers are affected.For this simplified test you'll need two machines (one to replay the packet and one to receive it) and you'll need to be on the same ethernet segment. No routers or VLAN aware switches should be in the mix (but dumb switches/hubs should be fine).

sudo tcpreplay -v -i [transmitting interface] [pcap name]

Example:

sudo tcpreplay -v -i eth1 pod-icmp-ping.pcap

If your controllers are affected the ethernet interface will lose link. In many circumstances the only way to get the controller to work again is to physically power off the machine and power it back on.

NOTE: These packets will be sent to the ethernet broadcast address (to simplify testing). If you are affected by this issue it will take down all of the ethernet interfaces on the connected network. If that is of concern you should use tcpreplay-edit to set a specific destination ethernet address:

sudo tcpreplay-edit --enet-dmac=00:11:22:33:44:55 -v -i eth1 pod-icmp-ping.pcap

Where "00:11:22:33:44:55" is the MAC address of the machine you'd like to test.

Finding other examples (findpod):

I've had various people report similar (if not identical) behavior with various other ethernet controller and traffic types. If you're experiencing sporadic failures of your ethernet controller and you think it may be related to network traffic you're receiving I've created a tool called "findpod" that can help you narrow your search . It's called "findpod.sh" and there is a download link below . If you're using a Debian based system you can install it like so:



sudo bash ./findpod.sh install



It will install three software dependencies: ifplugd, screen, and tcpdump. Run it like this:



sudo findpod <interface> start



Example:



sudo findpod eth1 start



This will start the ifplugd daemon. Once link is detected on the provided interface it will start an automatically rotating packet capture up to 100MB in size (can be changed in the script). When the interface loses link it will stop the packet capture and move it to a meaningful file name. You can then review this packet capture and find the last packets sent or received on the suspect interface. Suggestions and comments are welcome!



Fixing:

As news of this issue spreads further some controllers are affected and some aren't. That's m ore or less w hat I e xpected. Here's what I know about fixing this.

It has been my underst anding that Intel provide s at le ast two EEPROM versions for this chip: one with BMC enabled and one without. My controllers do not have BMC enabled, ther efore my fix only applies to non-BMC enabled controllers. This is un fortunate because the BMC enabled controllers seem to be much more widely used . Even with that o ther than the very bas ics (MAC address and checksum) I don't know the meaning of these values. Another reason not to reprogram the EEPROM on your NIC based on what some guy o n the internet told you.



With that being said her e is a diff between a n affected E EPROM and a good EEPROM:

Offset Values



-0x0010: ff ff ff ff 6b 02 00 00 86 80 d3 10 ff ff 5a c0

+0x0010: 01 01 ff ff 6b 02 d3 10 d9 15 d3 10 ff ff 58 85



-0x0030: c9 6c 50 31 3e 07 0b 46 84 2d 40 01 00 f0 06 07

+0x0030: c9 6c 50 21 3e 07 0b 46 84 2d 40 01 00 f0 06 07



-0x0060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

+0x0060: 20 01 00 40 16 13 ff ff ff ff ff ff ff ff ff ff



Where the "-" lines were the bad EEPROM and the "+" lines were the good EEPROM.

Under Linux you can view these values with ethtool:

# ethtool -e [interface]

