Debugging Embedded Linux Ethernet

Ethernet support is a vital tool during board bring up on a new Embedded Linux project. Access to the network speeds up your build and test turn-around time by allowing you to deploy new code quickly to the microprocessor.

A typical script I might use to build and deploy Linux on an iMX6 may look something like this:

cd ~/project/linux make -j8 zImage scp arch/arm/boot/zImage root@10.0.27.44:/boot/ ssh root@10.0.27.44 ‘/sbin/reboot’

This allows me to quickly test changes to the kernel by leveraging ssh over the ethernet port.

Without the working ethernet port an alternative method might require removing power from the board, taking the boot SD card and re-programming it via the PC, then restoring the SD card to the board and applying power.

This is a laborious process, which means getting the ethernet working is an early priority. This article will explore my typical process for debugging faulty ethernet drivers or hardware. I am going to limit myself to discussing external PHYs connected to the processor via the typical “Media Independent Interface”.

Stage 1: Link Lights and MDIO comms

The first thing to start with (assuming no obvious errors in dmesg) is to check if the ethernet PHY is negotiating a link with the other end. Your switch should have link LEDs and therefore should provide a clear indication of the link status of your board.

If you haven’t got any link it’s an early indication of something wrong with the PHY. Many PHYs will auto-negotiate by default out of reset. If you think your PHY should be doing so as well then there are typically three things to check:

Power supplies

Reset/powerdown signals

Oscillator input or reference clocks

Now’s a good time to check the reset signals and power supplies are in the correct state. Generally, the power supplies should be stable when the PHY is brought out of reset. The datasheet for the PHY will contain timing diagrams for how the power supplies and reset signals should be timed with respect to each other.

If everything looks good with respect to power-on and reset, then it’s a good time to check the MDIO bus. This is the management bus for ethernet PHYs. You want to make sure your software is correctly configuring the MDIO bus, and that it’s searching for the PHY at the correct address (again see the PHY datasheet).

A handy tool for poking around on the PHY bus is called “phytool”. It comes as part of the “meta-networking” layer in Yocto/OpenEmbedded. It may or may not work depending on the driver support for it.

To read the auto-negotiation advertisement register (ANAR; offset 0x04) for PHY at address 1 on the MDIO bus associated with with eth0 you would write:

phytool read eth0/1/0x4

To write 0xffff to the same register:

phytool write eth0/1/0x4 0xffff

Some PHYs require extra configuration via the MDIO bus before link will come up. For example, the KSZ8031RNA will default to expecting a 25MHz oscillator clock input on XI (Pin 8). However if the board is wired up to use the RMII 50MHz reference clock from the microprocessor as its internal reference, then bit 7 of register 0x1f must be set before link can occur.

The Linux kernel should do the above for you if you have set up your device tree correctly and you have the CONFIG_PHY_MICREL set in your kernel configuration. However some silicon vendors have their ethernet drivers work in idiosyncratic ways with the “proper Linux” way of doing things. You might find yourself having to edit the ethernet or PHY drivers to set the registers correctly for your project.

If the PHY seems like it should be advertising and negotiating a link with the router, it’s worth checking that the PHY is wired up correctly to your (RJ45 or otherwise) connector, checking the signals on an scope, and ensuring that the correct hardware design constraints have been followed for your transmission medium.

Stage 2: Linux ethernet integration

Once you think you have a PHY you can talk to, it’s time to give the port you’re on an IP address, and try talking to the outside world. If you’ve got busybox on the board you can try running the following to get an IP address via DHCP

udhcpc -i eth0

Alternatively you can set up a static IP address using ifconfig and ping a known host on the network

ifconfig eth0 192.168.0.45 netmask 255.255.255.0 up ping 192.168.0.1

If you aren’t getting a response back then run the following comamnd.

ifconfig eth0

This will show you the TX/RX packet counters. Most likely your device will report that it is sending out packets (non-zero TX packet counter) but it isn’t getting any in response (zero RX packet counter).

A great tool to use here is Wireshark/tcpdump. Using this tool from your host PC on the same network as the device, you can inspect the network to check for packets originating from the board. You can also see if packets are being sent back or not. Depending on what you see from these packet inspection tools, you can decide what to do in the next section...

Stage 3: MII signals

If the link up and the MDIO communication is working correctly between the Linux Kernel and the PHY, then the final source of possible error is the MII signals between the MAC (microcontroller) and the PHY.

Check the datasheet carefully and ensure the signals are correct. Datasheets for PHYs typically list the inputs as the TX signals and the outputs as the RX signals, this is because the TX and RX pins are described from the point of view of the MAC. It’s not uncommon for these to be swapped over accidentally in the hardware design stage.

If the pins are correctly wired up then it’s worth probing the signals to see that data is being transmitted and received as expected. Do the signals match what is required in the datasheet? Check all of the following:

Pin-muxing

Drive strength

Signal amplitude

Rise/fall times

Track length

RX interrupts

Cross-talk

Low-level driver

The issue may be one or a combination of all of the above. The hardest thing to debug is something obscure in the low-level driver. It may be, for example, that the silicon vendor has tested on RGMII but not properly on RMII. I had one project where a bit that would allow full-duplex to work a register in the low-level driver wasn't being set, and the driver had completely overlooked this fact. Only by checking every single hardware register for the ethernet controller could I track down the problem.

Whatever the issue is, divide-and-conquer is the best way to try and isolate an area in where the probem might be, which is probably true of all low-level embedded bugs...

Good luck!

Copyright © 2020