Many of you have heard about one of the oldest programming languages, COBOL, and you have also heard that COBOL programmers are much asked for nowadays to maintain old legacy code. There's another old-timer which few know about and which is still in use and will be in use for quite a while for applications in various specific fields (i.e. finance, banking, etc.). Its name is IBM RPG.

Once upon a time...

RPG is a high-level programming language for business applications. It is an IBM proprietary programming language and its later versions are available only on IBM i- or OS/400-based systems.

RPG has been around for more than half a century. By the end of the 1950s, IBM had built a huge number of electromechanical devices called tabulating machines. The last mass-produced tabulator model was the IBM 407 Accounting Machine:

Figure 1. IBM 407 at U.S. Army's Redstone Arsenal, 1961.

By the way, this tabulator rented from $800 to $920 per month ($8100 to $9300 per month in 2016 dollars), depending on the model. They were withdrawn from marketing on December 17, 1976.

As you know, scientific and technological progress was striding forward at the time, and the end of the 1950s saw the birth of the first transistor computers. The first IBM computer of this type was IBM 1401 built in 1959:

Figure 2. IBM 1401 at the Endicott History and Heritage Center.

The Computer History Museum has one working unit. Take a look at how it all worked - that's really interesting (I especially like the phrase, "We need a little bit of cooling," at 4:41):

To facilitate ease of transition for their customers to the new transistor computers, IBM developed two tools in 1959 to replicate punched-card processing on computers: FARGO (Fourteen-o-one Automatic Report Generation Operation) and RPG (Report Program Generator). Both languages' syntax resembled that of the instruction language of electromechanical tabulators. FARGO and RPG also replicated an important feature of tabulators: cyclic processing mode (known as the program cycle), in which tabulators read punched cards, summarized their contents, and printed a result. IBM still maintains backward compatibility with the program cycle even in the latest dialect, RPG IV.

Even though RPG was developed under much influence of FARGO, it took RPG just a few years to replace its predecessor as a more superior language. The next system to receive RPG support was mainframe IBM System/360 Model 20:

Figure 3. IBM System/360 Model 20 at the Deutsches Museum.

A couple of years later (at the end of the 1960s), IBM released the next dialect, RPG II, for midrange computer System/3:

Figure 4. Midrange computer IBM System/3 Model 10 and the operator at work.

RPG has since evolved into an HLL equivalent to COBOL and PL/I. Its distinctive feature (in contrast to modern languages) was the so called fixed-format syntax, so programs were difficult to read without a special debugging template.

Figure 5. Debugging template (click to enlarge).

An RPG program once typically started off with File Specifications, listing all files being written to, read from, or updated, followed by Data Definition Specifications containing program elements such as Data Structures and dimensional arrays, much like a "Working-Storage" section of a COBOL program or "var" statements in Pascal. A variable is defined with a fixed-format Definition Specification. In the Definition Specification, denoted by a letter D in column 6 of a source line, the data type character (see the table below) would be encoded in column 40. Also, if the data type character is omitted, that is, left blank, the default will be A if no decimal positions are specified. Otherwise, the default will be P.

Data type Name Length Description * Basing-Pointer Procedure-Pointer System-Pointer 16 bytes Address to Data Address to Activated Procedure Address to Object A Alphanumeric character 1 to 16,773,104 bytes (fixed) 1 to 16,773,100 bytes (varying-length) Alphanumeric character B Binary numeric 1 byte (8-bit) 2 byte (16-bit) 4 bytes (32-bit) 8 bytes (64-bit) Signed binary integer C UCS-2 character 1 to 8,386,552 characters (fixed) 1 to 8,386,550 characters (varying) 16-bit UCS-2 character (DBCS or EGCS) D Date 10 bytes Date: year, month, day F Floating point numeric 4 bytes (32-bit) 8 bytes (64-bit) Signed binary floating-point real G Graphic character 1 to 8,386,552 characters (fixed) 1 to 8,386,550 characters (varying) 16-bit graphic character (DBCS or EGCS) I Integer numeric 1 byte (8-bit) 2 bytes (16-bit) 4 bytes (32-bit) 8 bytes (64-bit) Signed binary integer N Character indicator 1 byte '1' = TRUE '0' = FALSE O Object Size undisclosed Object reference P Packed decimal numeric 1 to 63 digits, 2 digits per byte plus sign Signed fixed-point decimal number with integer and fraction digits S Zoned decimal numeric 1 to 63 digits, 1 digit per byte Signed fixed-point decimal number with integer and fraction digits T Time 8 bytes Time: hour, minute, second U Integer numeric 1 byte (8-bit) 2 bytes (16-bit) 4 bytes (32-bit) 8 bytes (64-bit) Unsigned binary integer Z Timestamp 26 bytes Date and time: year, month, day, hour, minute, second, microseconds

Further the Calculation Specifications were given, which are the list of commands for execution. Output Specifications could also follow, they determined the layout of other files or reports.

The following is the description of all the specifications types:

The U or Auto Report spec is only required for Auto Report programs.

The H or Header spec is at the top of the program and describes compiler options such as maximum compile size, whether the program is an MRT (Multiple Requestor Terminal) program, and what type of listing is generated when the program is compiled. The object name of the program created is located in columns 75-80; if a source does not have an H spec, the name RPGOBJ is used.

The F or File spec(s) are next, and describes the files used in the program. Files may be disk files (DISK) or may be devices such as a printer (PRINTER), the workstation (WORKSTN), keyboard (KEYBORD), unformatted display (CRT or DISPLAY), or user-defined (SPECIAL). Record size, block size, overflow indicators, and external indicators are described. It is possible that an RPG program will not use any F specs.

The E or Extension spec(s) are next, and describe arrays and tables, which may be prefetched from disk files (an Input table), drawn from constants placed at the end of the source between ** and /* symbols, or built from calculations.

The L or Line Counter spec(s) are next, and if present, describe the form to be printed. It defines the number of lines in a page and the positions where printing begins and ends.

The I or Input specs are next, and describe the data areas within files. RPG II permits redefinition of data areas so that a field named FLDA might occupy the same area as an array AR that contains 8 elements of 1 character each. Non-record areas such as data structures can be described. Depending on the values of the input record, indicators may be conditioned.

The C or Calculation spec(s) are next. Total fields may be described and accumulated. Complex computations and string manipulations are possible. Indicators may be conditioned.

The last specification(s) are O or Output specifications, which describe the output record in terms of fields and output positions.

Now, here's a small example for you to practice reading RPG programs (F and D specifications are omitted):

* Asterisk (*) in column 7 defines a comment line * In column 6, you write a character denoting * the specification type to be used. The type of specification * defines what the source line does (i.e. definitions section, * calculations section, etc.). * "C" spec (calculation) describes calculations to be done. C HOURS IFLE 40 C HOURS MULT RATE PAY C ELSE C RATE MULT 40 PAY 72 C HOURS SUB 40 OTIME 30 C RATE MULT 1.5 OTRATE 94 C OTRATE MULT OTIME OTPAY 72 C ADD OTPAY PAY C END

This code looks complicated, doesn't it? And that's just a simple payroll tax calculation for an hourly employee (employees get time and a half for hour worked beyond the first 40).

In 1978, IBM introduced midrange computer System/38 and a new RPG dialect, RPG III, for it:

Figure 6. Midrange computer IBM System/38

From that moment on, IBM was gradually abolishing the limitations of the language; programmers were allowed to write specifications in free form:

/free If Hours <= 40; Pay = Hours * Rate; Else; Pay = (40 * Rate) + ((Hours - 40) * (Rate * 1.5)); EndIf; /end-free

Finally, the last version, RPG IV (aka RPGLE, ILE RPG), was released in 1994. Its three prominent features are built-in functions, procedures, and free-form programming. Until November 2013, the free format applied exclusively to the calculation specifications. With the V7R1 TR7 upgrade to the language, the "/free" and "/end-free" calculations are no longer necessary, and the language has finally broken the ties to punched cards.

I also recommend looking at this document where IBM demonstrates the drastic changes made to the language. RPG remains a popular programming language on the IBM i operating system, which runs on IBM Power i platform hardware:

Figure 7. IBM Power i server series

All in all, the RPG programming language has been used on 1401, /360, /3, /32, /34, /36, /38 AS/400 and System i systems. There have also been implementations for the Digital VAX, Sperry Univac BC/7, Univac system 80, Siemens BS2000, Burroughs B700, B1700, Hewlett Packard HP3000, ICL 2900 series, Honeywell 6220 and 2020, Four-Phase IV/70 and IV/90 series, Singer System 10 and WANG VS, as well as miscellaneous compilers and runtime environments for Unix-based systems, such as Infinite36 (formerly Unibol 36), and PCs (Baby/400, Lattice-RPG).

Today, RPG IV is a more robust language. Editing can still be done via SEU, the simple green-screen editor:

Figure 8. Green-screen editor.

However, a long progression of tools has been developed over time. Some of these have included Visual Age for RPG, which was developed by IBM and promoted by Jon Paris and others. Currently the preferred editing platform is IBM's Websphere Development Studio client, now named RDi (Rational Developer for i), which is a customized implementation of Eclipse:

Figure 9. Rational Developer for i.

There is also an RPG compiler for Microsoft .NET. This version contains extensions to RPG IV beyond that of the base IBM compiler. These extensions provide Microsoft Windows and .NET hooks in the Native and System 36 environment, as well as the ability to port DB/2 files to Microsoft Access and Microsoft SQL Server databases via ODBC.

Conclusion

Although RPG is quite an exotic language, it is a fact that there exists a large amount of code written in it in specific fields such as finance, banking, and the like. The language is quite diverse: it includes 4 dialects and supports both fixed- and free-form programming. The need to maintain old legacy code and write new code force companies to seek RPG developers, who seem to be on the Red List for threatened species. Given the circumstances, additional code-quality control tools such as static analyzers are much called for.

You must have already guessed what the reason for writing this article is and what we are driving at. You're right: we are now considering the idea of creating a static code analyzer for the RPG language. I believe such an analyzer would be a great aid to those who still maintain and develop RPG programs. But we haven't come to a decision yet. Needless to say, it's quite a specific niche.

That's why we need you to tell us if you want PVS-Studio analyzer to become a tool that could detect bugs in RPG programs. Dear potential users of PVS-Studio RPG, we are looking for you. Don't hesitate to email us!