One of the advantages of using a microcontroller as opposed to, say a computer's CPU, is that we have a very tight control over the timing of the instructions we program into it. In fact, to show how precise that control can be, we'll be using assembly instructions instead of the typical high-level functions such as digitalWrite. The use of assembly allows us to know exactly how many clock cycles are taken up during the execution of each instruction.



Since the Arduino Uno R3 development board maintains a 16MHz external clock signal on the onboard ATMega328p, the microcontroller executes a 1 clock-cycle instruction in exactly 62.5ns (1/16MHz = 62.5ns). Since we can find out how many clock cycles each instruction takes, we can precisely control how many instructions we need to generate our signal.



As we saw previously, in order to transmit a 1 to the WS281X chip we need to transmit a signal that stays at a maximum (HIGH) value for 0.8μs, and then stays at a minimum (LOW) value for 0.45μs. Thus, we want to write a list of instructions that:



- Set digital pin to HIGH

- Wait 0.8μs

- Sets digital pin to LOW

- Waits 0.45μs



In assembly language, this can be achieved by the following code:



asm volatile(

// Instruction Clock Description Phase Bit Transmitted

"sbi %0, %1

\t" // 2 PIN HIGH (T = 2)

"rjmp .+0

\t" // 2 nop nop (T = 4)

"rjmp .+0

\t" // 2 nop nop (T = 6)

"rjmp .+0

\t" // 2 nop nop (T = 8)

"rjmp .+0

\t" // 2 nop nop (T = 10)

"rjmp .+0

\t" // 2 nop nop (T = 12)

"nop

\t" // 1 nop (T = 13)

"cbi %0, %1

\t" // 2 PIN LOW (T = 15)

"rjmp .+0

\t" // 2 nop nop (T = 17)

"rjmp .+0

\t" // 2 nop nop (T = 19)

"nop

\t" // 1 nop (T = 20) 1

::

// Input operands

"I" (_SFR_IO_ADDR(PORT)), //%0

"I" (PORT_PIN) //%1

);



Instruction

The first column includes the assembly instruction followed by a linefeed and tab characters, which make the final assembler listing generated by the compiler more readable.



Clock

The second column shows the number of clock cycles each instruction takes. For this set of simple instructions there is only one possible value, we'll see later how some instructions (e.g., conditional) may have 1, 2, or 3 possible values. Remember that each clock cycle on the 16MHz Arduino Uno takes 62.5ns.



Description

The third column shows a very brief description of what each operation does.



Phase

Using the term a bit loosely, we use it to indicate the cumulative sum of clock cycles taken by the instructions that have been executed thus far.



In order to send a single 255 value—11111111 in binary—to the WS281X we need to repeat this set of instructions 8 times. In addition, if we insert a 50μs (or greater) pause between transmissions of the 8-bit sequence, the WS281X latches the transmitted data to its output register. Once the data are latched, the first LED (green) of the WS281X should turn on to a maximum brightness level. The Arduino sketch inside bitbang_255.zip demonstrates this operation.



To send a 0 we need to change the code that produces a 1 by decreasing the time during which the signal has a HIGH (maximum) value, and increasing the time during which the signal is at a LOW (minimum). In addition, we should note that the values to each LED should always be specified using 8 bits. For instance, if we wanted to send a value of 105—1101001 in binary—we would need to send the 8 bits 01101001 including the leading 0. The code that produces a 0 looks like:



asm volatile(

// Instruction Clock Description Phase Bit Transmitted

"sbi %0, %1

\t" // 2 PIN HIGH (T = 2)

"rjmp .+0

\t" // 2 nop nop (T = 4)

"rjmp .+0

\t" // 2 nop nop (T = 6)

"cbi %0, %1

\t" // 2 PIN LOW (T = 8)

"rjmp .+0

\t" // 2 nop nop (T = 10)

"rjmp .+0

\t" // 2 nop nop (T = 12)

"rjmp .+0

\t" // 2 nop nop (T = 14)

"rjmp .+0

\t" // 2 nop nop (T = 16)

"rjmp .+0

\t" // 2 nop nop (T = 18)

"rjmp .+0

\t" // 2 nop nop (T = 20) 0

::

// Input operands

"I" (_SFR_IO_ADDR(PORT)), //%0

"I" (PORT_PIN) //%1

);



We can use the Arduino sketch inside bitbang_105.zip to generate the signal whose image can be seen on the oscilloscope screen captures that are attached to this step.



Now, for the WS281X to display the whitish color we want, we need to send not one but three 255 values—in which case our signal consists of 24 ones—before waiting the 50μs for the data to latch. We could do this by copy-pasting the eleven assembly instructions above 23 times (you can give it a try modifying the bitbang_255.ino sketch). But the code would be impractical for sending values to more than one WS281X chips. A better solution would be to write a loop that would iterate through the 8-bit values until all three of them have been sent.



The sketch inside bitbang_whitish.zip includes a clear description of the steps taken to achieve the desired outcome. The main section, written in assembly following the logic described above, looks as follows:



asm volatile(

// Instruction Clock Description Phase

"nextbit:

\t" // - label (T = 0)

"sbi %0, %1

\t" // 2 signal HIGH (T = 2)

"sbrc %4, 7

\t" // 1-2 if MSB set (T = ?)

"mov %6, %3

\t" // 0-1 tmp'll set signal high (T = 4)

"dec %5

\t" // 1 decrease bitcount (T = 5)

"nop

\t" // 1 nop (idle 1 clock cycle) (T = 6)

"st %a2, %6

\t" // 2 set PORT to tmp (T = 8)

"mov %6, %7

\t" // 1 reset tmp to low (default) (T = 9)

"breq nextbyte

\t" // 1-2 if bitcount ==0 -> nextbyte (T = ?)

"rol %4

\t" // 1 shift MSB leftwards (T = 11)

"rjmp .+0

\t" // 2 nop nop (T = 13)

"cbi %0, %1

\t" // 2 signal LOW (T = 15)

"rjmp .+0

\t" // 2 nop nop (T = 17)

"nop

\t" // 1 nop (T = 18)

"rjmp nextbit

\t" // 2 bitcount !=0 -> nextbit (T = 20)

"nextbyte:

\t" // - label -

"ldi %5, 8

\t" // 1 reset bitcount (T = 11)

"ld %4, %a8+

\t" // 2 val = *p++ (T = 13)

"cbi %0, %1

\t" // 2 signal LOW (T = 15)

"rjmp .+0

\t" // 2 nop nop (T = 17)

"nop

\t" // 1 nop (T = 18)

"dec %9

\t" // 1 decrease bytecount (T = 19)

"brne nextbit

\t" // 2 if bytecount !=0 -> nextbit (T = 20)

::

);



The best way to understand the operation of this section is to consider different case scenarios, and follow the assembly code line by line. For instance, we know that in order to send a value of 255, we need to send 8-bits with a timing corresponding to a 1. In other words, the Digital Pin connected to the WS281X should remain HIGH for 13 cycles (0.8125μs), and LOW for 7 (0.4375μs). Does the code above achieve this? Let's see what happens when we first start transmitting:



asm volatile(

"nextbit:

\t" // This is only a label for directing the jumps below.

"sbi %0, %1

\t" // The signal is set to HIGH, instruction uses 2 cycles.

"sbrc %4, 7

\t" // True. Sending 255 implies current MSB is 'set' (=1).

"mov %6, %3

\t" // This is executed. “tmp” is set to HIGH.

"dec %5

\t" // Bit is being transmitted, decrease bit counter.

"nop

\t" // Need to idle for getting to the 13 clock cycles.

"st %a2, %6

\t" // Write the “tmp” value to the PORT (pin still HIGH).

"mov %6, %7

\t" // Set “tmp” to low for the next pass through the loop.

"breq nextbyte

\t" // False. Bit counter isn't 0, use 1 cycle and continue.

"rol %4

\t" // Shift the byte value MSB leftwards.

"rjmp .+0

\t" // Idle for 2 clock cycles. Phase reached T = 13.

"cbi %0, %1

\t" // Set signal to LOW.

"rjmp .+0

\t" // Idle for 2 clock cycles.

"nop

\t" // Idle for 1 clock cycle.

"rjmp nextbit

\t" // Bit counter wasn't 0 so jump to next bit. T = 20.

);



So the instructions that actually get executed generate a signal on the data pin that is 13 cycles HIGH (0.8125μs) and 7 LOW (0.4375μs), thus sending a bit with a value of 1 to the WS281X. If we continue to study what the code does when the rest of the bits are sent, and what it does when values other than 255 are used, we'll get a deeper understanding of this particular implementation of bitbanging.



I personally hope that you find this tutorial useful for getting started with bitbanging your own communication protocols whenever it's necessary!