The venerable ESP8266 has rocked the Internet of Things world. Originally little more than a curious $3 WiFi-to-serial bridge, bit by bit, the true power of the ESP has become known, fully programmable, with a treasure trove of peripherals it seemed that the list of things the ESP couldn’t do was short. On that list, at least until today was Ethernet.

No, despite the misleading title, the ESP does not have a MAC and/or PHY, but what it does have is an incredible 80 MHz DMA-able shift register which can be used to communicate 10BASE-T Ethernet using a new project, espthernet. Join me after the break for video proof, and a deep dive into how this is possible.

Why Does the ESP8266 Need Ethernet?

The ESP8266 has all sorts of communication and peripherals, but one interface I found that I wanted was Ethernet. I wanted some way to do bizarre things with the ESP’s wireless that would disrupt its connection with the host AP, since it supports promiscuous and mesh modes. I started out and tried to see where I would end up.. Every step of the way I had no idea if this would work at all. I couldn’t find anyone who is using the I2S interface in duplex. No one knew how fast it could go. No one knew if it would mangle data. What would the performance be like? Was it possible to send/receive 1.5kB frames? Over the course of several months, the answers to all of these questions panned out much better than I had expected!

DISCLAIMER: This project does not comply explicitly to the IEEE 802.3 standard. It will not work as well as properly engineered devices. Don’t get confused. This is a party trick, not a legitimate engineering solution.

No MAC? No PHY? No problem.

10BASE-T uses differential signalling, NLPs to announce link presence, preambles, Manchester encoding for the bits, CRC32s for the FCS, various protocol checksums and bursts of data at up to ten megabits. It’s no wonder engineers use dedicated hardware. Some of the better known Ethernet controllers used by hobbyists are the ENC28J60 or its big, 100BASE-T brother, the ENC424J600 but one thing’s for sure: with the exception of some novelty projects, like simplex on an ATMega168 or half-duplex ATTiny85, people use purpose-built Ethernet hardware.

The ESP’s I2S bus is home to a variety of projects including an MP3 player, WS2812 controller, CNC driver, and Color NTSC broadcaster. By operating the I2S bus in the ESP8266 at 40 MHz, we can capture everything that happens on wire and shift it into 32-bit words. In software, we can look at what was received, one 32-bit-word at a time; decoding packets as they roll in. Between the DMA engine attached to the I2S bus, and the 160 MHz 32-bit Xtensa core, several of the layers of decoding can be done on-the-fly and, in software.

There’s still the little problem of electrical compatibility. Though the ESP can send and receive frames with a resistor and two capacitors, it’s best to use an actual line driver, such as the $1.50 ISL3177 since it is much more sensitive, powerful, has better protection and slew rate limiting. The error rate goes from ~5-10% to <0.1%. It’s still in violation of the IEEE802.3 standard, just not as badly.

Ethernet Frames

Ethernet frames are what hold packets. The frames exist to transport packets over the dark and dangerous physical layer. They contain a preamble, the MAC addresses of the receiver and sender and the actual packet. The packet is what contains things like TCP, UDP, etc. It also contains an FCS at the end to make sure the packet didn’t get corrupted in flight.

These frames are not just sent out on-wire as a serial stream. 10BASE-T Ethernet transmits differential signals over copper wires, with magnetic isolation. The combination of these things means we can’t send too many 1’s or 0’s in a row without wrecking up the magnetics.

Though there are many techniques used such as 4B5B (On 100BASE-T Ethernet) or EFM (on CDs) to solve this in other communications technologies, 10BASE-T chose Manchester encoding.

Manchester encoding describes all 1’s and 0’s in terms of upward or downward transitions. A “1” bit is represented by a low to high transition, while a 0 is represented by a high-to-low transition. These transitions must be kept carefully in sync, otherwise, there would be a trainwreck of confusion. If the wrong transitions are checked it’s easy to get the bits backwards and lose data.

As a software engineer, I originally found it mind-boggling why someone would use such a sensitive coding scheme. The answer lies in the hardware. It is possible to use a PLL to sync up to the stream in the preamble. Once locked, it finds the end of the preamble and it gets a clear stream of the data. For us, however, we will have to do the decode in software and with this approach, it will be evident that Manchester is less than convenient.

The System

The goal is to send and receive Ethernet frames. There are several steps that need to be taken in order to do this. While the I2S DMA engine provides us raw 32-bit values of the bits on the wire, we still need to find the frames, decode the frames, check the FCS and pass the frames off to the user. If we were to move the Manchester decoding into the main thread, it would take a staggering 6208 bytes per buffered packet! We need to do the decoding inside the I2S interrupt.

Once passed off to the main thread, it can check FCS and whatever else is needed at the user layer. The FCS is critical in this application, since, any number of things can cause our packet to become corrupt. Normal systems only need to contend with electrical noise, while we need to worry about interrupts taking too long, buffer overflows, and anything else that could corrupt our packet.

If the user layer wants to respond or send any packets back, it can frame up the message, put on the CRC, encode the packet and pass it back to the interrupt, which will take and transmit the packet at the next available opportunity.

The Decode

So, now, the ESP is perpetually capturing this stream of 1’s and 0’s. Once a new chunk of data comes in, an interrupt is called and our code can begin to search for a packet in the data. To check if there is a packet present, it simply has to search the stream for 32-bit words that aren’t all 1’s or 0’s. Once we find 3 frames in a row with something going on, our confidence that we have a packet is high.

Manchester decoding is a little trickier from software land. Because we have a digitized signal, it’s not always perfectly in lock-step with the signal. To make matters more frustrating, if we are sampling at exactly 40 MHz, and there is an equal chance of a high verses low signal, there can be ambiguous cases. To solve this, we must add a bias so we see more 0’s than 1’s. There’s a lot that goes into interpreting each bit. Long and short detection, finding error states, finding end of the preamble, interpreting bit states. This was complicated, so an HTML5 GUI was made. It’s available on the web here which allows you to play with a sample packet, flip bits, etc. When running on the ESP, it can be used to capture problematic packets and examine the raw bits.

Because the bits are coming in at 40 MHz, that gives us 4 processor clock cycles (@160 MHz) per bit, at most to figure out what to do with our input stream. The naive algorithm is 140 lines of code and processes one bit at a time, and after heavy optimization takes about 10 times too long to run in the interrupt handler.

In order to achieve the speed-up needed to run in the interrupt, we use a table. For each possible combination of input states and input bits, we compute the output states. We have to worry about polarity, whether the last bit before the nibble was a 1 or a 0, the number of bits that have been the same in a row, check for unmatched short pairs, and 4 bits of new data. We also have 10 bits of output data. This works out to a 1024×2-byte table.

The results are staggering. By going nibble-at-a-time and using a table instead of code, we received the 10x performance boost we needed and some change!

The “user” layer



Because it is not yet possible to use the TCP/IP stack that comes with the ESP, we needed to use our own. Since I modelled this Ethernet stack off of the ENC424J600 driver, it was very easy to port the avrcraft IP+ARP+UDP+TCP stack to this project. To make things more convenient, the HTTP server used in this project was borrowed from there, too! So porting even the web server to our TCP/IP stack was trivial.

Encoding

Ethernet isn’t very interesting if all you can do is receive. Transmitting packets is needed, too. The IP stack will be able to build a frame with the MAC addresses and payload, but we’ll have to add the preamble and FCS, then manchester encode the data. Manchester encoding a signal could be painful, if one were to output one bit at a time, but we can leverage a table here as well! Encoding the data on wire can be done byte-at-a-time using the function below.

static const uint16_t ManchesterTable[16] __attribute__ ((aligned (16))) = { 0b1100110011001100, 0b0011110011001100, 0b1100001111001100, 0b0011001111001100, 0b1100110000111100, 0b0011110000111100, 0b1100001100111100, 0b0011001100111100, 0b1100110011000011, 0b0011110011000011, 0b1100001111000011, 0b0011001111000011, 0b1100110000110011, 0b0011110000110011, 0b1100001100110011, 0b0011001100110011, }; void PushManch( unsigned char k ) { *(sDMA++) = ( ManchesterTable[ (k)>>4 ] )|( ManchesterTable[ (k)&0x0f ]<<16 ); }

Seriously. Tables are awesome.

Transmitting Raises Hardware Problems

The ESP’s I2S engine cannot receive packets unless the transmitter is also running. That means that even if we’re only transmitting 0’s, we still have to feed the DMA engine valid descriptors. The descriptors link to a “next” which is streamed out after the link it’s on. There isn’t a way of changing the active DMA once it’s started. Additionally, stopping the DMA subsystem on the TX side, even briefly will cause the I2S bus to lock up.

There’s no way of emitting just a packet here and there. To make matters worse, interrupt calls can be missed, so they can’t be relied on switching chains immediately. All of the states must be stable. We have several descriptors that just send 0’s ([0] through [3]) and a the possibility of linking in one or more data descriptors. In order to send only one packet, we have to transition between pinging from [0] and [1] to sending the packet, to pinging between [2] and [3].

What this means is:

Normally, [0] must point to [1] and [1] to [0].

When sending a packet, [1] points to [Packet], [Packet] points to [3] The system will stay bouncing between [3] and [2] until…

You clear it out by hooking [1] to [0] and [3] to [0]

Then, the system will go back to idling between [0] and [1].

Voilà! We can now receive and send packets on 10BASE-T Ethernet!

Where to from here?

There are still many potential improvements, lots of maintenance, algorithm improvement, great room for development and integration with the existing (or new TCP/IP stacks) and much more. Maybe this could even be ported to an Ethernetless ARM? None of that is critical, though. With Ethernet unlocked, it frees up the WiFi interface to do all sorts of unusual things. Now, it is possible to monitor for packets from all sorts of sources, inject packets. Groups of ESPs in mesh mode could be bridged to Ethernet. ESPs in monitor mode can communicate their findings back, or even inject packets remotely. But I’m looking forward to seeing the projects you will create that couldn’t have even imagined!