Introduction

Some weeks ago, I published an article on how we used the PRU to implement a power supply control loop having hard realtime constraints:

//www.embeddedrelated.com/showarticle/586.php

Writing this kind of logic in assembly language is not easy. First the assembly language itself may be difficult to learn depending on your background. Then, fixed and floating point arithmetics require lot of code. While macros help to handle the complexity, they still are error prone as you must be very careful to enforce rules as regard with register usage for instance. This is especially difficult in projects involving several people, who are not used to work together and not full time on the project. Even if the rules are well established, people tend to forget them when they come back in the project from time to time, several weeks apart.

Someone pointed that TI introduced C support in the latest Code Composer Studio version. For me, using the C language clearly helps in the above issues so I started to investigate this feature. Several people contacted me to know if I made any progress on using the C language to program the PRU. My answer is that the toolchain proposed by TI, called CGT (Code Generation Tools) is very usable once the different tools and some details have been understood.

This article describes how to setup and use the PRU C toolchain, and includes a simple example.





Prerequisites

I have a GITHUB repository containing all the materials related to this article. Please clone it from here:

https://github.com/texane/pru_sdk

The build system assumes that the 2 following variables are correctly set (for instance, using the export shell command):

CROSS_COMPILE: points to your ARM toolchain,

PRU_SDK_DIR: points to the directory where you cloned the repo.

Refer to the previous article for more information on how to use it.

Installing the PRU C toolchain

The PRU C toolchain is too big for me to include it in the pru_sdk repo. Thus, you have to install it manually. The following describes how to do so.

First, you have to download the installer from here:

http://processors.wiki.ti.com/index.php/Download_CCS

Registration is required but it is free and runs on both LINUX and Windows. During the installation, you must enable PRU support. Please refer to the following link for more info:

http://www.element14.com/community/community/knode/single-board_computers/next-gen_beaglebone/blog/2014/04/30/bbb--pru-c-compiler

The toolchain is located in the CCS installation directory called pru_2.0.0B2. If you do not find it, you may have to start CCS and install missing components using the GUI menus. The pru_2.0.0B2 directory is self contained and can moved anywhere to be used as is. Please put it in the pru_sdk directory you created previously.

pru_2.0.0B2/bin contains the usual tools:

clpru: a C compiler,

asmpru: an assembler,

dispru: a disassembler,

lnkpru: a linker,

hexpru: an output file generation tool,

others tools to have information on binary (ELF) files.

Also, TI provides a C standard library and a runtime, with the sources included. It can be found in the pru_2.0.0B2/lib directory. More information can be found in the README.txt file and man pages in pru_2.0.0B2/man.





Installing the PRU loader

To load application and interact with the PRU from LINUX, I use the opensource library found here:

https://github.com/beagleboard/am335x_pru_package

One important feature is the loading and execution of PRU binary files. In our case, it must be noted that the starting address will not be 0. Rather, it is located at a symbol called _c_int00, which can be anywhere in the text section. Thus, I had to modify the loading library so that an address can be specified in the routine:

prussdrv_exec_program

This was done by adding a new routine, without breaking existing interfaces:

prussdrv_exec_program_at

It writes the PRU control register address fields, as specified in am335xPruReferenceGuide section 5.4. Refer to examples/pruss_c/host_main.c for an example.

I put the new library in the pru_sdk repo, and I will submit a patch to the official repository soon.

Compiling and running a program

I added an example for this article in the pru_sdk repo, directory example/pruss_c.

The code itself simply does a floating point multiplication on the PRU and puts the result into the memory shared, refer to pru_main.c The CPU reads and display the result, refer to host_main.c

I made a simple script that compiles everything, refer to example/pruss_c/build.sh .

I will integrate it to the pru_sdk build system later. The script works as follow:

invoke the compiler to produce object files from PRU C files,

link them to produce an ELF file,

extract code and data binary from the ELF file,

retrieve the start address,

compile the CPU program, to be run on the Beagle Bone Black.

At the end of the process, the following files are produced:

pru_enable-00A0.dtbo: the device tree overlay enabling the PRU,

main: the ELF program to be run on the CPU,

text.bin: the binary file to initialize the PRU code,

data.bin: the binary file to initialize the PRU data.

You must copy the following files to your BBB board:

pru_enable-00A0.dtbo into /lib/firmware,

run.sh,

main, text.bin and data.bin.

On the BBB, the program is then run using run.sh. It is a small wrapper taking care of loading the uio_pruss driver and enabling the PRU.

Using inline assembly

Accessing some parts of the hardware still require assembly code. Previously, we took care of wrapping this code inside macros. I am in the process of porting these macros into functions usable from C code.

The functions are kept minimal and implemented using inline assembly. The inline assembly support is not as advanced as the GCC one, especially lacking support to describe register usage.

One important thing is to know the calling convention and rules used by the PRU C compiler. They are described in the README.txt. To summarize:

r2 contains the stack pointer,

r3 contains the return address,

r14 to r29 are used for argument passing,

r14 is used for the return value,

r3 to r13 must be saved by the callee.

You can look at pru_hal.c for examples. It is still in progress, but gives the idea.

More to come

I am now in the process of rewriting the low level hardware related routines in C. I started with inline assembly, but it may be possible to use intrinsics instead. This is to be investigated.

Another thing to investigate is the generated code size, as the PRU program memory is limited to 8KB. From what I have seen, CLPRU does a good job and provides different optimization options to reduce the generated code size. I will soon see if it is enough.

One may argue about using the PRU C toolchain directly from command line instead of using the CCS software. It is a valid point, as CCS is freely available. Personnaly, I prefer the command line especially as I do not always have a usable X connection to the machine hosting the build system we use for the PRU.

Updates

A reader gave a link to download the PRU C compiler without the whole CCS software:

http://software-dl.ti.com/codegen/non-esd/downloads/beta.htm