The VDX runs a U-Boot based loader and on onscreen messages already I had already noticed that the boot environment variables were all set to defaults:

=> printenv

autoset_mac=true

baudrate=9600

bootdelay=10

eth3addr=ac:de:48:xx:xx:xx

eth4addr=ac:de:48:xx:xx:xx

goscmd=true

pcidelay=1000

Also scrolling up the terminal scroll back buffer, the RTC was yelling around about lost time:

MMK configuring pit regs

In: serial

Out: serial

Err: serial

SRIO1: disabled

SRIO2: disabled

turning on NVRAM/RTC oscillator for the first time

be sure to saveenv and perhaps set the RTC time

Net: Fman: Uploading microcode version 101.6.0.

Ignoring these things in the beginning, I did try to setup the environment from the loader prompt to boot up from the internal flash memory. This is done by calling the setenv command.

=> setenv OSRootPartition=sda1;sda2

=> setenv bootargs root=/dev/sda1 rootfstype=ext4 quiet S

The VDX host system is Linux-based and the sda device is linked to the corresponding flash memory card. Afterwards I tried to save the environment and reset the hardware.

=> saveenv

Saving Environment to NVRAM… Readback of environment miscompared

Readback of redundant environment miscompared

Second readback of environment miscompared

Second readback of redundant environment miscompared

Oh well, that thing again. I reseted the board again, only to find myself back again on the loader prompt with default environment variables for boot up . No way to go from here.

If you are an Apple user, you certainly know about NVRAM, where Apple saves information about the hardware, boot settings and other useful or not useful things. The first procedure for hardware issues on Apple system has been always “reset NVRAM” in hitting some mystique key combinations on boot up. PC users may know this kind of thing as CMOS RAM that is holding the hard disk information, the time or the starting booting device.

So it seemed that this Brocade VDX had issues committing to the NVRAM and therefore lost his long-time basic “identity” memory. I almost wanted to give up at this point, but thought that a complete new install of the operating system could repair the boot environment or maybe unlock a blocked NVRAM again.

Reinstall VDX from scratch

For a complete re-installation of the VDX6740 from scratch you need to create an usb drive that contains the loadable boot image for the specific VDX series and also the Brocade NOS expanded — all on a ext2 Linux based filesystem. Be sure about this step because the loader cant boot from ext3 or ext4 (officially?).

On my USB stick I have created a new partition table with a primary partition and formatted it solely as ext2 filesystem from a Linux machine, the USB stick was attached as device /dev/sdb.

# partition the usb disk, my linux /dev/sdb:

fdisk /dev/sdb # list partitions

fdisk -l /dev/sdb Disk /dev/sdb: 1,9 GiB, 2002780160 bytes, 3911680 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disklabel type: dos

Disk identifier: 0x961d9655 Device Boot Start End Sectors Size Id Type

/dev/sdb1 2048 3903487 3901440 1,9G 83 Linux # format the first primary partition

mkfs.ext2 /dev/sdb1 # mount the disk to /mnt/usb

mount /dev/sdb1 /mnt/usb

From Windows or Mac you can also do this with the right set of tools. For example Paragon ExtFS for Windows and Mac would be a possible solution to format and mount the filesystem. You just need to take care that the partition table is in MBR-format and not GPT.

On the mounted usb stick you need to create the directories castorXX and also expand the NOS-zip directly under the root tree:

/castorXX

/nos5.0.2b

Inside the castorXX-directory you need to place startup images for VDX6740:

./castorXX/ramdisk.image

./castorXX/silkworm_bd131.dtb

./castorXX/silkworm_bd137.dtb

./castorXX/silkworm_bd153_nopci.dtb

./castorXX/uImage

The /nos-directory needs to contain the fully expanded NOS-version:

/nos5.0.2b/install

/nos5.0.2b/SWBD1004

/nos5.0.2b/SWBD164

/nos5.0.2b/…

…

Also before removing the usb stick it is a good moment to correct missing executable permissions for the installer or the directories. If you are working on Windows, you may need to do this step after booting the VDX from USB and before running the installer command.

chmod 0755 /nos5.0.2b/*

# switch to root directory and unmount the usb stick

cd / && umount /mnt/usb

Now it is time to plug that stick into the VDX and power cycle the unit. You will land right back on the loader prompt where you need to input some magic commands to boot from the dead:

# If necessary, reset the usb to find the plugged stick

=> usb reset 1

scanning bus for storage devices… 1 Storage Device(s) found # List usb stick contents

=> ext2ls usb 0:1

<DIR> 1024 lost+found

<DIR> 1024 castorXX

<DIR> 1024 nos5.0.2b # Enter this magic boot command for VDX6740

=> makesinrec 0x1000000; ext2load usb 0:1 2000000 castorXX/uImage;ext2load usb 0:1 3000000 castorXX/ramdisk.image;ext2load usb 0:1 4000000 castorXX/silkworm_bd131.dtb; bootm 2000000 3000000 4000000

## Booting kernel from Legacy Image at 02000000 …

After this long boot process, you will be dropped into a normal bash shell, where you search around the Linux kernel display messages to identify the USB device and partition, e.g. in my case /dev/sdb1:

bash-2.04# dmesg | grep sd[abc]:

sda: sda1 sda2 sda3 sda4

sdb: sdb1

Now it is time to mount the filesystem and change the working directory right into the NOS firmware directory:

bash-2.04# mount -t ext2 /dev/sdb1 /load

bash-2.04# cd /load/nos5.0.2b

Start the installer and reboot the switch afterwards:

bash-2.04# ./install release; sync; reboot -f

INSTALL26: Installing Linux 2.6 …

In a perfect world, the installer will do his work without any more interactive prompts and boots up in a perfect freshly new NOS.

Network OS (sw0)

sw0 console login: admin

But not this time — do you still remember the NVRAM error messages from the beginning?

During the installation of the VDX, the installation process just exited while installing a package called bootenv, restarted without a full install and brought me right back into the U-Boot loader prompt with defaults loaded — whoops, I did it again!

INSTALL26: Warning — failed rpm — root /mnt -ivU — force — nodeps SWBD131/bootenv-1.0.3–8.ppc.rpm

INSTALL26: install PASSED with 3 warnings.

Restarting system.

Back on my Linux machine, I took a deeper look at the bootenv-1.0.3–8.ppc.rpm package. I unpacked the RPM file and listed the contents and the installer commands. No surprises here, after install it will run the bootenv program to list and alter the boot environment for the next startup.

And on this VDX it failed hardly.

Getting closer to the root cause

At this point I really came to the conclusion that the NVRAM had some issue. Also the initial messages around the RTC (Real time clock), gave me reasonable cause.

I studied the Brocade VDX6740 hardware installation guides for NVRAM or RTC related things and found this important line:

The VDX 6740 and VDX 6740T have the following features:

— A real-time clock (RTC) with battery

— Batteries used for RTC/NVRAM backup are not located in operator-access areas.

And on the VDX 6940 manual I found this:

The Brocade VDX 6940 contains non-volatile random access memory (NVRAM) with integrated realtime clock (RTC) function.

Could this IC be some kind of defect? Back in the Extreme knowledge base I did not find any clues about NVRAM and VDX6740, but I got some hits for the original Diamond-series switch:

NVRAM has the information about the boot image, boot count, configuration, clock info etc. NVRAM is non-volatile because it has battery which is used when the switch is in power-off status. But if this battery becomes discharged or is removed, then the switch cannot retrieve the correct information from NVRAM.

Oh well, this sounded reasonable the same like my current VDX issue. I decided to open the case and fetched the next screwdriver.

VDX 6740 with open enclosure

DS9034PCXI PowerCap

After the device was open, I handed the board over to my wife, a graduated Master of Electronic Engineering. I asked her to check for a RTC/NVRAM part or any similar SMD and whether something looks broken and could be replaced easily.

My wife quickly found this under a black cover on the right side:

Dallas DS9034I-PCX

Checking up on Google I found out that the Dallas DS9034PCX is a lithium battery source for keeping NVRAM alive. It comes with a module (PowerCap module) that is soldered on the board. The battery itself is mounted on top of that module and can be easily detached with a flat head screwdriver. The lifetime is given for around ten years.

Dallas battery from behind

I went ahead and detached the DS9034PCX cover and found the battery fixed on the back side of the surface. Now I was pretty excited to find a possible source of my problems: A possible broken NVRAM IC or a failed power source.

In a rush I got another spare VDX6740, opened it, detached the Dallas battery unit and mounted it into the failed unit. I restored the power to the supply and fired up the before failed VDX.

Monitoring the boot messages on the serial connection, the RTC complained again about the initial startup, but I did not see any “environment” or CRC errors like before. The switched dropped into the => loader prompt like before. That made sense because the NOS install did not succeed before, too.

I listed the environment with printenv again and saw all the defaults over again. I wanted to commit the current environment back to the NVRAM and typed saveenv again:

=> saveenv

Saving Environment to NVRAM…

=>

No error was given this time. I entered the command again to be sure:

=> saveenv

Saving Environment to NVRAM…

=>

Great! I went back to step “Reinstall the VDX from scratch” and repeated the install routine. And this time everything passed and my installation succeeded with:

INSTALL26: Unmounting file system…

INSTALL26: Setup of Boot Environment

INSTALL26: Setting boot environment parameters…

The VDX booted into the freshly new NOS installation. I run different health checks on this system and restored the initial configuration to bring it back to life. Afterwards I went on to order some of the Dallas chips as replacement parts, the original price is currently around 10$ per unit.

The soldered SMD socket from Dallas

TLDR

Small easy to replaceable unit can break big switch in the field :-)

References

VDX6740 netinstall from GTAC

Dallas PowerCap datasheet from Mouser