This paper covers the history and use of literals (or constants) in programming languages, from the beginning of programming to the present day. Literals in many programming languages are discussed including modern languages such as C, Java, scripting languages, and older languages such as Ada , COBOL, and FORTRAN. Design issues, types of literals, and problems with literals are illustrated. Literals vary across languages much more than most programmers would expect.

Literals. 1

Integer Literals. 2

Design Issues for Integer Constants. 3

Ada Integers. 3

Size of Integer 4

C Family. 4

Arbitrarily Long Integers. 4

Visual BASIC 6.0 and QBasic. 4

Visual Basic .NET Type Designations. 5

Base or Radix of Integers. 5

Questions. 7

Real Literals. 7

Design Issues for Floating Point Constants. 7

Decimal Point Placement 7

Precision of Reals. 8

Complex Numbers. 8

What is doubled?. 9

FORTRAN 90 Kind Numbers. 10

Questions. 11

Questions. 11

Boolean Literals. 11

Design Issues for Boolean. 11

Character Strings Literals. 13

Design Issues for Character Strings. 13

String Delimiters. 14

String Escape Sequences. 14

Perl and UNIX Shell Character Strings. 16

Perl Alternative Quotes. 16

Perl Additional Escape Sequences. 17

UNIX Backquotes. 17

Special Literals= = where. 18

C# Verbatim Sting Literals. 18

Python Triple-Quoted Strings. 18

here Documents. 19

Date Literals. 20

Array Literals. 20

String Comparison==move to strings. 21

Repeating Literals. 21

Conclusion. 21

Questions. 22

New FORTRAN Declarations – why here, move to type chapter??. 24

Copyright Dennie Van Tassel 2004.

Please send suggestions and comments to dvantassel@gavilan.edu

Literals or constants are the values we write in a conventional form whose value is obvious. In contrast to variables, literals (123, 4.3, "hi") do not change in value. These are also called explicit constants or manifest constants . I have also seen these called pure constants, but I am not sure if that terminology is agreed on. At first glance one may think that all programming languages type their literals the same way. While there are a lot of common conventions in different languages there are some interesting differences.

Literal Explanation 285 Typical integer 34.67 Typical real 4.23E-4 Typical scientific 140_345 Integer in Perl or Ada true Typical boolean 0x1b or Z"1B" Hexadecimal literal 'B' Typical character "Hello" or 'Hello' Typical character string 5HHello Old FORTRAN Hollerith string null ZERO Special literals

Various Literals in Different Languages

Table x.1

Literals represent the possible choices in primitive types for that language. Some of the choices of types of literals are often integers, floating point, Booleans and character strings. Each of these will be discussed in this chapter.

Integers are commonly described as numbers without a decimal point or exponent. Another description for integer literals is a string of decimal digits without a decimal point. Thus the following are valid integers in all languages:

123 0 -14 21345

Integers may or may not have a sign and must fall within some restricted range. Negative values need to be preceded by a minus sign. If integers use 32 bits, then the maximum value would be 2^31 – 1 (since we need to use one bit for negative numbers).

There are two more integer constants available in some languages:

+45 5e2

Early C did not allow +45 since integers without a sign, such as just 45, are positive by default, so no unary positive sign was used. Thus C had a unary negative operator but no unary positive operator. But many later C compilers and Java allow the unneeded positive signs on constants. Few other languages actually forbid unary positive signs.

The last constant 5e2 which would evaluate to 500 would be a floating point value in C and FORTRAN. Their rule is a floating point constant has a decimal point OR exponent, or both. Thus 5.0, 5e0, and 5.0e0 would all be the same floating point 5.0. But in Ada integers can have positive exponents, so 5e2 (or 5e+2) is integer 500. Negative exponents are not allowed for Ada integers. Thus 5e-3 is an error in Ada , but 5.0e-3 is a floating-point constant.

There are a few design issues for integers . They are:

What sizes of integer constants is available? For example, do we have short integer, regular integers, and long integers?

How do we indicate the particular type of integer constant wanted?

What bases of integers are available? Examples that may be available besides decimal could be octal, hexadecimal, or any base.

Is there any separator available like the comma used for thousands?

There is a yes answer to all the above questions in some language, and different languages have different answers.

Most languages have one or more default size for integers available. On a 16-bit word size machine integers range from –32,768 to +32,767, which is about 2^15 - 1. On a 32-bit word size machine integers range from –2,147,483,648 to +2,147,483,647, which is about 2^31 – 1. Today 64-bit integers are common. Unfortunately, computer integers cannot have those useful commas to mark thousands.

But this is an over simplification since we can have hexadecimal integers and they use letters. And we may want octal values and some way to indicate the desired size of our integers. Also, the definition of integer in the previous paragraph is not true for all languages.

Ada Integers

For example, in Ada both integer and real literals can have an exponent. Thus in Ada the integer literal 2100 could also be written as:

21e2 210e+1 2100e+0

But in many other languages the exponent would indicate that the above are floating point literals. For integers, the exponent must be positive. Ada allows us to use the underscore to improve readability. The underscore is often used to separate a number into groups of three digits like commas are used in non-programming areas. Here are some examples:

1_234.56 408_847_1400 1_000_000 12_27_05 4_345e2

In most of the above numbers the underscore is placed where a comma would normally be, but the underscore can be placed in any convenient place. Perl and Ruby also allow underscores in their integers.

If we have more than one size of integers , we need some way to indicate the precision of the integer constant. The C family uses an L or l (ell ) after an integer to indicate a long integer . Thus 12L is used for a long integer. We can use the lower case l but few can tell the difference between 12l (12 and L) or 121 (12 and one), so we always use an upper case L. These suffixes are useful to force arithmetic into a particular precision.

Besides long integers, we have unsigned integers in C, which use the suffix u or U . Thus we could write 15u or 15U to get the unsigned integer fifteen. Long unsigned integers are indicated with the terminating ul or UL , so 23ul or 23UL will get an unsigned long integer twenty-three. For regular integers one bit must be saved to store the sign of the integer. If a variable or constant is unsigned, then that bit can be used for the integer. Thus a signed integer may have 2^15-1 or -32,768 to +32,767, but an unsigned integer stored in the same amount of storage can go from 0 to +65,535 which is 2^16-1.

If we are in a language that has long integers, then how do we use them? For example, if we write 123456789012, we do not want to end up with an integer overflow or truncation. A good compiler would automatically store this integer as a long integer, but we may want to help it (or us) with 123456789012L.

In most languages long integers are restricted to some large size. Python uses the same L to indicate a long integer like, 12345678901234567890L, but Python long integers can be arbitrarily big. Other languages such as Ruby and Lisp dialects have these arbitrarily long integers and are called bignum systems.

These forms of BASIC have two types of integers. The two types are integer and long integer. Early BASIC did not have types for numbers. There was no distinction between integers and floating point. But now we have several numeric types. For numeric constants a suffix is used on the number to indicate the type. Here is what they use:

Numeric Type Suffix Bytes of Storage Integer % 2 Long integer & 4 Single precision none or ! 4 Double precision # 8

Types in BASIC

Table x.2

Thus 15% is an integer , while 15& is a long integer , and 15 (or 15! ) is a floating point, single precision float . By default all numbers are real (floating point) single precision. If we want a double precision float 15, then we type 15# .

VB .NET has broken from its BASIC parents and changed the type-designations characters they append to numeric literals. Whole numbers (no decimal points) are type Integer and numbers with decimal points are type Double. Otherwise, they use a method similar to previous dialects of BASIC, but use different codes to change the default type. VB .NET codes are as follows:

S Short integer

I Integer

L Long integer

F Single-precision floating point

R Double-precision floating point

D Decimal

So they have three types of integers and two types of floating point. They use Decimal for decimal fractions such as dollars and cents. Thus 45S is a Short integer, 45I (or 45) is an Integer, and 45L is a Long integer. And 234.5F is a Single-precision floating point literal and 234.5R (or 234.5) is a Double-precision floating point literal. Finally, 780.23D is Decimal currency-type literal.

The range of values for VB .Net is much larger than previous languages. For example, long integer range from ±9x10^18. C# .NET has similar types and value ranges.

C Family

Sometimes we want a different base or radix of our constants besides base 10. Base 8 and base 16 are useful for storage addresses. The C family allows us to indicate octal constants by preceding the number with a zero. So 012 is octal 12, not decimal twelve. For octal values the range of digits is 0-7.

So putting this together with what we learned in the previous section we can use the terminating L to make the constant Long and the U to make it unsigned. Thus 012UL is the unsigned long octal value 12 or the equivalent of the decimal value 10.

For hexadecimal values we need to precede the number with an 0x or 0X. Thus 0x12 is hexadecimal 12, not decimal 12. Now the range of acceptable "digits" is 0 1 2 3 ... 9 A B … E F. We can use upper or lower case letters a-f. Again we can use long integer indicator "L" on these too. Thus 07L is a long octal seven, and 0x7L is a long hexadecimal seven. We can also use the terminating U to make it unsigned. Thus 0XFUL is the unsigned long hexadecimal value F, which is equivalent to the decimal value 15.

Ruby does the same for octal and hexadecimal literals as C does, but Ruby has added 0b for binary numbers. So in Ruby we can have hexadecimal values like 0x12, octal values like 012, and binary values like 0b1001.

FORTRAN 90

FORTRAN 90 does this a little differently. They allow radix (number base) 2, 8, or 16. They start the value with letter B for binary or radix 2, letter O (oh) for octal , and letter Z for hexadecimal . Then the number follows by a string of digits enclosed in double or single quotes. The range of digits must be acceptable for the desired base (no 8 or 9 in octals). The integer value 200 would be B"11001000" for base two, O"310" for base eight, and Z"C8" for base 16. I try very hard not to be chauvinistic, but I sure like the C method better in this case.

This FORTRAN 90 solution illustrates the problem of adding a feature to an existing language. They cannot just decide to use the C solution, that all numbers starting with a zero are octal values. Millions of old FORTRAN programs would no longer work correctly when compiled on new FORTRAN 90 compilers, since 012 would be octal 12 instead of decimal 12. On the positive side of this change, thousands of old FORTRAN programmers would suddenly have employment.

Ada

Ada , being a language with always a little more, does what C and FORTRAN do, but has added more bases and uses a different syntax. An integer can be expressed in any base from 2 to 16 by prefixing the number by its base and then bracketing the number within # symbols. Thus the decimal value 35 can be expressed in various bases as follows:

2#100011# 4#203# 8#43# 10#35# 16#23#

While this is kind of interesting, I do not see much use for base 7 or 11, but obviously someone did. In addition, C and FORTRAN 90 can only use octal or hexadecimal integer constants; Ada allows floating point constants in these different bases. Thus 23.45 could be expressed in base 16 or another base from 2 to 16

1. Suppose you wanted to add more bases to Java or C++. Presently, those languages can only handle decimal, octal, and hexadecimal. The Ada people designed their methods in at the beginning, but the FORTRAN had to add it to an existing language. Try to figure out how you could add more bases to C++ or Java without breaking millions of old programs.

Reals are numbers with a decimal point, thus 4.3 is a real literal . Real numbers are called floats or floating point in some languages. Another descriptions of reals is a number with a decimal point or an exponent (or both), thus 2e2 would be a real literal using this definition. Like integer literals, a positive or negative sign can precede the number and no commas are allowed. Thus some real literals are:

0.0 -4.302 7. 3.2e-4 4.9678E+3 4e-3

If the language accepts both lower and upper case, the "e" for exponent can be lower case or upper case. It may vary by language if 4e-3 is acceptable, or we may need 4.0e-3 (with a decimal point). The "e" stands for exponent and means multiply by 10 the value that follows. Thus

4.3e2 = 4.3 x 10^2 = 4.3 x 100 = 430.0

Scientific notation is useful for expressing very small numbers or very large values (such as your chances to win the lottery or the national debt).

There are a few design issues for floating point constants. Here are some:

What sizes of floating point constants is available? For example, do we have float, double, and long double?

How do we indicate the particular type of floating point constant we want?

What bases of floating point are available? Examples that may be available besides decimal could be octal, hexadecimal, and maybe others.

Is there any separator available like the comma used for thousands?

There are interesting answers to all the above questions in some language, and different languages have different answers.

Early in this chapter when we discussed integer literals, we noted that integer literals can also have exponents. So for Ada , real literals must have a decimal point . Another Ada rule is real literals must have a digit on each side of the decimal point. Thus 4. (or .05) are not a legal Ada real literal, but 4.0 is acceptable. COBOL has similar but different restrictions on floating point literals. In COBOL the literal .25 is OK, but 25. is not OK, and must be changed to 25.0 since the period terminates statements when followed by a space. In Pascal .04 is not legal, since we need a digit before the decimal point, such as 0.04.

Precision of Reals

C Family

The C family has three types of reals: float , double , and long double . And they allow us to indicate the type of the real literal. Real constants such as 3.4, 2.0, and 4.564e-2 are all stored as double by default. If we want 4.3 to be stored as a float (instead of a double) we can add an f or F after the constant like this 3.4F or 3.4f. If we want 3.4 to be stored as a long double, then we use l (lower case L) or L like we do with integers. Thus 3.4 as a long double would be 3.4L or 3.4l, but the last one looks a lot like three point forty one, instead of 3.4L. All these suffixes are useful to control the amount of storage used and the precision of the result.

1.0/3.0 // uses double precision.

1.0F/3.0F // uses float precision.

1.0L/3.0L // uses long double precision.

For the float example, we need both constants float, otherwise the arithmetic would be done in the higher type, that is double. For the long precision, just one of the constants in long double would force the arithmetic to use long double. This is explained more in the section on Coercion in the Arithmetic chapter.

FORTRAN

In FORTRAN the default type is single precision (like float in C). We may type 4.3 which is a single precision real but we may want it stored as a double precision real. FORTRAN uses the suffix D or d to indicate double precision. Thus we can write 4.3D0 or .43d1 to indicate this is a double precision real value. This is an easy way to force arithmetic into double precision. For example:

x = 1/3d0

will get us a double precision division because 3d0 is double precision.

FORTRAN IV has complex numbers. Data of complex type is represented by two numbers in parenthesis separated by a comma. The number left of the comma is the real part, while the number to the right of the comma represents the imaginary part of the complex number. Thus the complex constant 3 + 2i can be assigned to the complex variable x as follows:

x = (3, 2)

Fortran has all the necessary operations and functions to handle complex values. It is interesting how early in computing history complex values were handled by Fortran.. Ruby uses a similar syntax for their complex constants.

In Python, complex numbers are composed of two floating-point numbers – the real part and the imaginary part – and are coded by adding a J or j to the imaginary part. Thus we can write 3.0 + 4.3J for a complex number. A few other languages have built in complex numbers and the necessary arithmetic operators and functions.

When we talk about single or double precision of integers and reals we need to figure out what is doubled. Integers are the easiest to understand since we do not have to worry about an exponent or decimal point. The smallest integer can be stored in one byte, 8 bits, with one bit for a sign. Thus there is room for a positive or negative sign and then 7 bits, or 2^7, which gives us a range of –32768 to +32767. The next size of integer may use two bytes, which allow for a range of 2^15, or –2147483648 to +2147483647. Finally, the next largest integer would be 2^31. As you have seen the largest, smallest, and number of integer types is language and machine dependent. But this is fairly true across many languages.

language integers size

The situation gets much more machine and language dependent for floating point values. For reals, there are two parts besides the sign, the exponent and mantissa. Thus for 3.45e-2, 3.45 is the mantissa and -2 is the exponent. The mantissa is commonly 7 places for the smallest float, 15 places for next largest float, and finally 31 places for the largest float. Not all languages have three sizes. Early languages only had one size. Newer languages tend to have three sizes, especially when the language is used for scientific programming.

There are two ways available for programs to get the precision of integers and floating-point values. The way covered so far and the most common, is programmers get what the language or hardware gives us. For example, smaller floating-point values have 7 place accuracy and larger floating point values have 15 place accuracy. These defaults are based on the size of words in the hardware. This loss of control is mostly accepted without question. But when we expand the variety of machines available and the size of machines, the defaults change. So both FORTRAN and Ada have means for us programmers to select the exact precision needed.

FORTRAN 90 has a method similar to how C marks precision of their real numbers, but the FORTRAN method is more powerful and flexible. But first we need to discuss the need for FORTRAN variations for default number precision. FORTRAN has been around for decades and is available on very small computers and very large computers. A single-precision real number might have seven significant digits and a double-precision number might have 15 places on many computers. But a small computer may not have that range and a large super computer may have twice the range. So if a FORTRAN program is written on one computer a means is needed to indicate the needed precision when the program is taken to a new computer.

So FORTRAN 90 provides a kind number that is used to indicate the kind of precision needed for real and integer values. For real numbers there are at least two default kind numbers and for integer values there are 3 or 4 kind numbers.

For real numbers they use the kind number 1 to indicate single (7 significant digits) or the kind number 2 to indicate double (15 significant digits). Some FORTRAN compilers may have larger significant digits and another kind number. To specify a kind of constant, an underscore followed by a kind number is appended to the constant. Thus 3.14159_1 has a single precision kind and 3.14159_2 has a double precision kind, because it has a "_2" after it. (Notice the underscore in FORTRAN has a different meaning than it has in Perl or Ada .)

Integer values have a kind 1 for values in the range of 2^7, kind 2 for values in the range of 2^15, kind 3 for 2^31, and maybe kind 4 for 2^63. Thus 123456789_3 has an integer kind number 3. If a kind value is not supported by a compiler it generates a syntax error when compiling the program.

There is a great deal more to this in FORTRAN. We can use the operator :: (two colons) to indicate exact minimum precision needed for both integer and real values. Named constants can be used for kind values. Here are a couple of brief examples:

INTEGER, PARAMETER :: Range18 = SELECTED_INT_KIND(18)

REAL, PARAMETER :: Prec20 = SELECTED_REAL_KIND(20, 40)

First, we need to set up kind indicators. In the above two lines, Range18 can be used to indicate integer kind range of 18 digits, and Prec20 indicates real numbers with at least 20 significant digits with exponents range up to 40. Now we can use these like we did the kind constants 1, 2, or 3.

12345_Range18

3.14159_Prec20

This was a very brief description of FORTRAN kind numbers. If you want more information you will need to find a FORTRAN 90 textbook. These kind numbers are also available for variables.

1. Your PhD thesis is to indicate how to expand C++ or Java so these languages can indicate desired precision for constants or variables. Read the previous section on FORTRAN 90 kind numbers. If your method breaks all previous C++ programs you will not obtain your PhD.

2. Perl and early BASIC does not distinguish between integers and floats. These two languages just have numbers. Do you think this is a good approach? Should we do this in OPL? Why or why not?

1. We have seen several types of numeric literals or constants. What numeric constants do you think we should have in OPL?

2. Do we want to allow for different integer literals? For example, short or long integers? Do we need one, two, three, or more types of integers? And how shall we indicate what is desired when we type an integer?

3. Do we want to allow for different real literals? For example, float, double, or long double. Do we need one, two, three or more types of real literals? How shall we write these different forms in OPL?

4. What base or radix of integers will we allow: binary, octal, hexadecimal, others? The C family has one way, FORTRAN 90 has another way, and Ada has a third method. And how shall we write the different numbers in OPL?

5. Most or all languages do not allow commas in numeric literals, like 1,234. Is this restriction still necessary? Do you think we should allow commas in number for OPL? Notice how Ada handles this.

We need or want a literal for true and false. These are called Booleans or logicals , depending on the language. Some languages have a reserved or keyword for these values. Booleans are ordinal values and usually false is less than true. The normal operations are and, or, and not.

There are a few decisions and differences for Boolean values. Here are some questions:

Are there special reserved or keywords for the Boolean values?

Are the Boolean values ordered? That is, is false < true or vice versus?

< or vice versus? Are Booleans ordinal values? Can they be used for choices in a switch statement?

statement? When talking about booleans do we use the capitalized Booleans or the lower case booleans ? Both versions are common in books. This is probably the most difficult problem, since the difficulty of a problem is often inversely related to its importance.

FORTRAN

All versions of FORTRAN use .TRUE. and .FALSE. for their logical constants. And in FORTRAN they are called logicals instead of Booleans. Since FORTRAN does not have reserved words, FORTRAN uses a period before and after these logical literals to differentiate them from the variables TRUE and FALSE. If we print a FORTRAN Boolean variable, it will print either T or F, and those are what we need for input if reading Boolean data into a program.

Ada , Pascal, ALGOL, and Java use true and false for their Boolean literals.

The inputting and outputting of logical values is messy in most languages. If we want to use the integer 123, we can use it exactly that way as a literal constant in the program, read in the integer, or print the integer and it is all the same. It is not as simple with logical literals.

Some languages do not have a nice way to input or output Boolean values. For example, in FORTRAN, logicals print as an F or T. But when we want to read in a value for true, we can use the letter t, or period and t (i.e., .t), or period and the word true (i.e., .true), or any string that starts with the letter t, or a period, letter t, then anything. So input, output, and inside the program are all potentially different for FORTRAN logicals.

C Family

The C family of languages does not use named constants for logical values. Instead they use 0 (zero) for false and 1 (one) for true as the result of relational or logical expressions. Thus if we tried to print 4< 3 we would get a zero, and 4< 3 would get us a 1.

cout << "true=" << 3<4 << endl;

cout << "false=" << 4<3 << endl;

While this works in C++, something similar could be done in other languages to see if and what it prints. The situation is a little more complicated since a value of zero is equivalent to false, and any other value is equivalent to true. So

if (x) . . .

will be false, when x is equal to zero and false otherwise. While this can be a blessing when we know what we are doing, it is also a common source of bad program errors when we are not careful. Java broke away from its C background and does not allow this.

Character strings are the next type of literal. A character string is a group of characters glued together. An example is our now famous:

"Hello world."

But things did not start out this easily. In fact, there was little availability of any type of character use on very early computers. FORTRAN introduced the Hollerith string, which was named after Herman Hollerith who invented the punched card equipment for the U.S. Bureau of the Census. Now we would use it as follows

13H HELLO WORLD.

The 13H indicates thirteen characters follow. These were used in output format statements. Thus we could have something like the following:

WRITE (5, 10)

10 FORMAT(13H HELLO WORLD.)

There were no character variables or any way to manipulate character strings.

There are a few design issues for character string constants. They are:

How are character strings delimitated? Examples of ways are using double or single quotes.

How do we use the delimiter inside the character string?

Do character strings have escape sequences and variable interpolation?

Thus character strings are a sequence of symbols enclosed in matched single or double quotes. Character strings are also called just strings, character constants , or non-numeric literals . Each character uses one byte of storage.

One interesting consideration is how to delimit strings, that is, indicate the start and end of the string. Next, we need to know how to use the delimiter inside the string itself. For example, if we use quotes to enclose a string, then how do we put a quote in the string? Also, can we have strings that extend over one line, or are we restricted to one line only? Finally, we need string operators and functions to search, compare, and construct strings. There are almost as many ways to concatenate character strings as there are language groups.

Quotations marks or apostrophes are the common delimiter or demarcating mark for character strings. Quotation marks are often called double quotes , while apostrophes are often called single quotes . Some languages such as the Java or C use only quotation (") marks. Other languages such as Pascal and FORTRAN use only {‘) apostrophes. And still other languages such as HTML, xxx allow either quotations or apostrophes as long as both of the same are used for a particular string. Thus we can use "Hello world." (quotations) or ‘Hello world.’ (apostrophes ) but we can not use "Hello World’ (quotation mark to start and apostrophe to end). I don’t know of any language that allows that, but I will watch my e-mail for someone that knows of one.

If we use quotes (single or double) to enclose strings, then we need some way to insert the same quote inside the string without upsetting the character string. An escape character is used to void the special meaning of the next character. There are two common solutions to this problem. One solution is to use two characters (quotes) to indicate one. Another method is to use an escape character to protect the quote.

FORTRAN uses apostrophes for strings and uses the first solution. So in FORTRAN if we want DON’T DO IT, then we would do something similar to this

PRINT *, ‘DON’’T DO IT’ ! uses two apostrophes

which modifies the DON’’T to DON’T when processing it. COBOL, Pascal, Ada , and BASIC use this doubling method . The repetition of the enclosed quote is sometimes called quote stuffing .

Ada is unusual since it does not require quote stuffing or escape character to protect the quotation character when it is a single character. Ada has character literals that enclosed within single quotes, and any character can be enclosed, including the single quote. Thus ‘x’ is fine, but also ‘’’ is fine. Most languages would require ‘’’’ or ‘\’’ to protect the enclosed single quote. If you think about this, it seems quite feasible to do it the Ada way, but no other languages seem to do this way.

Java and C use the escape characters. The backslash is used to "escape" the special meaning of the next character. Thus we could print a similar phrase as follows:

cout << "Use \" for strings." << endl;

Here we have a quote mark inside a character string using quotation marks. This use of an escape character allows us to enclose other characters that would normally cause problems inside a character string such as

for new line. And the escape character can be followed by a hexadecimal or octal value to generate any character, even non-printable characters. C languages and other UNIX programs use this last method extensively. Table x.x includes a list of escape characters that work with the C family languages, and UNIX programs. C# mostly uses the same escape characters as C.

Escape sequence Description Dec Hex Oct \a Alarm/bell (BEL) 7 \x07 \007 \b Backspace (BS) 8 \x08 \010 \f Form feed (FF) 12 \x0C \014

New line (LF) 10 \x0A \012 \r Carriage return (CR) 13 \x0D \015 \t Horizontal tab (TAB) 9 \x09 \011 \v Vertical tab (VT) 11 \x0B \013 \" " double quote 34 \x22 \042 \' ' single quote 39 \x27 \047 \\ \ backslash 92 \x5C \134 \032 Octal character \xff Hexadecimal character \ (Enter key) Newline continuation

Short Table of Escape Characters

Table x.1

HTML and XML use angle brackets, "<html>", for their commands. Then they need a way to insert angle brackets. XML uses character references that start with the & symbol and end with a semicolon. Then the necessary named reference or character reference number. Thus we can get < by using either < (which stands for less than) or < which is the character reference number for <. Not only can we get characters that otherwise would cause problems such as <, <, and &, but we can get characters from other languages not on our keyboard.

A third partial solution to the need for a quote inside a character string is allow either type of quotation. This method is used in versions of BASIC, SQL, COBOL, and HTML. This allows us to do the following:

"Don’t" ç apostrophe inside quotation marks

‘Use " for quotes’ ç quotation mark inside apostrophes

We still have the problem of what to do when we really need or want the same quote mark inside a quoted string, and one of the two above methods is used. Such as "Use " or ‘ for quotes."

From this previous discussion, you can see there are levels of activity within character strings and it varies by language. At the lowest level there is the problem of using the quotation delimiter inside the character string. Two methods were used, either ‘don’’t’ or ‘don\’t’. As we move along from mostly dead character strings (no activity inside the string), we will find language that have long or short list of escape characters like the above table, and variable interpretation in the UNIX languages and their friends. These latter character strings, I will label live character strings, since a lot of activity can happen within the character string.

Both Perl and UNIX shell programming languages handle character strings, but there are a few important differences from other languages. Strings can be delimited by either matching quotation marks or apostrophes. When character strings are enclosed in apostrophes, all characters are treated as literals. When character strings are enclosed in quotation marks, almost all characters are treated as literals, with the exception of variable substitution and special escape sequences . For example, in Perl variables are interpreted:

$x = 45;

print 'x = $x' # printed output x = $x

print "x = $x" # printed output x = 45

So $x is a Perl variable, and it is interpreted when enclosed in double quotes, but not interpreted when enclosed in single quotes. Likewise, escape sequences are processed within quotation marks but not apostrophes:

print 'hi

bye'; # printed output hi

bye

print "hi

bye"; # interprets newline



# and prints two lines.

There are many named control characters and any character can be processed by decimal, octal, or hexadecimal value. See xxx in zzzz.

==== check all this out in Perl ====

If the UNIX methods of quoting prove inadequate or too messy, Perl provides an alternate form of quoting as follows:

q represents single quotes.

qq represents double quotes.

represents double quotes. qx represents back quotes.

The string to be quoted needs to be enclosed in matching delimiters. We use a forward slash here, but other matching characters could be used:

print 'He said, "Don\'t do it."',"

"; # very complicated

print qq/He said, "Don't do it."

/; # less complicated

The dollar sign is a special symbol in Perl and needs to be quoted with single quotes.

print 'Give me $10.00.',"

"

print q/Give me $10.00.

/

Any character can be used for the quote deliminator. The above examples have used a slash but other characters can be used. Perl has added a little more magic. If the opening quote character is an opening bracket – angle, square, curly, or round – the closing quote is the next matching bracket, so we can nest them:

qq<Use <br> for break line.>

gets converted to:

Use <br> for break line.

leaving the internal < > alone. This removes the possibility of the dangling bracket problem. And remember you saw the first dangling bracket in my book.

Besides all the escape sequences used by C and UNIX, Perl has additional escape sequences . Here are some more Perl escape sequences :

Escape sequence Description \c[ Control character \l Next character is converted to lower case. \u Next character is converted to upper case. \L Next characters are converted to lower case until \E is found. \U Next characters are converted to upper case until \E is found.

Additional Perl Escape Sequences

Table x.2

These escape sequences come in useful for character matching, which Perl is used for a lot.

UNIX Backquotes

UNIX, Perl, but not C/C++ uses a backquote to indicate a command to execute. The output is assigned to a variable or used in a output statement.

FIND WHAT ?? should be

$today = `date+%??`; # places the date in variable today

$today = qx/date+%??/; # Alternating quoting method.

print "The hour is `date+%H`"; # prints the hour.

This method works with a variety of UNIX shells and commands.

Many languages have their own special literals. Examples are eof or null in C, ZERO and SPACE in COBOL, and __LINE__ and __FILE__ in Perl. These special literals are discussed in the Named Constant section of the Variables chapter.

Character string literals that do not use escape sequences are useful for file addresses, inserting special characters such as line feeds and tabs, and other messy situations. For example, we would normally have something similar to this:

string strFileName = "c:\\Mydocs\\Graphics\\dennie.gif";

to point to a great picture of me. But we need to escape all the backslashes. With a C# verbatim string literal , we can avoid the double backslashes as follows:

string strFileName = @"c:\Mydocs\Graphics\dennie.gif";

Verbatim string literals start with an @ character, followed by a double-quoted character string.

In a verbatim string literal, the characters between the delimiters are interpreted verbatim, the only exception being a quote-escape-sequence. If we need a quotation mark in a verbatim string literal, we need to do the familiar doubled quotation marks.

string a = "She said \"Hello\" to me. "; //regular string

string a = @"She said ""Hello"" to me. "; //verbatim string

Most any quotation method used, requires an exception.

Python has two new types of strings. First is a triple-quoted strings where everything between the matching groups of three single or double quotes is included, including other quotes and line returns. This is similar to the Perl qq/ method. This is an easy way to define a string with both single and double quotes, like qq /.../ in Perl.

""""A quote is used to start a string,

either ‘ or " can be used."""

In this example, unescaped newlines and quotes are allowed (and are retained). The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. Either triple double or triple single quotes can be used.

The second Python string type is raw strings . These strings start with an r or R and use different rules for processing escape sequences. Backslashes are left in the string. Thus r"

" gets stored as

and not the newline character. Otherwise, the C family of languages will process escape sequences as

(newline), and \t (tab) in character strings. There are situations (Web addresses or regular expressions) where you don’t want escape sequences interpolated. For example:

path=’c:

owhere’

gets processed as:

c:

owhere’

since the

is turned into a new line.

Raw strings prevent this. We can change the above to:

...path= r ’c:

owhere’

The "r" before the apostrophe indicates a raw sting.

If you are interested in how character strings can be handled, Python is one of the languages to look at.

UNIX, PHP, and Perl have here documents (or heredoc) , which are similar to Python triple-quoted or raw strings. The name here document comes from the fact that the document is right here. We could do the previous Python example in Perl as follows:

print<<EOF ; # semicolon here.

A quote is used to start a string,

either ‘ or " can be used.

EOF

The << symbols are used to indicate the start of the here document. The word immediately following (in this case EOF) are used to indicate what to look for at the end of the here document. The closing indicator (e.g. EOF) must be on a line by itself with no spaces before it. A here document is similar to a double quoted string since normal escape sequences will be interpolated. Notice the semicolon at the end of the first line in the Perl example above. In the PHP example below, there is a semicolon on the last line but not the first line.

There is quite a bit more to here document and the different languages process the document differently depending if the terminator string is enclosed in double quotes ("EOF"), single quotes (‘EOF’), or back quotes. If you are interested you need to look at how Perl and UNIX shells handle these. PHP also does here documents but they differ a little. Here is the same example in PHP:

print<< < EOF

A quote is used to start a string,

either ‘ or " can be used.

EOF ; # semicolon here.

PHP uses three left angle brackets instead of two and the last line ends with a semicolon. The different UNIX shells also use here documents and that is where they came from.

A few languages have date literals. The first problem is how do we type a date so it is recognized as a date. Visual Basic .NET uses # signs to enclose the date as follows:

#07/04/1776#

A date literal can then be used in an assignment statement much like other literals:

Dim objMyBirthday As Date = #12/15/1981#

C# date variables always assume a time too, even if not given one, so the above value stored would be

12/15/1981 12:00:00 AM

VB provides a wide variety of functions for manipulating dates, such as adding an interval to a date, subtracting two dates, and formatting date output. If you are interesting in what operations can be done with dates, you might look at VB or look at applications such as spreadsheets or database programs to see what is possible.

Several languages have special literals for initializing arrays. These will be briefly covered here and covered in more detail in the Array Chapter. Perl has list literals which are used for arrays. Examples are:

(1, 2, 3, 4) # array of four values 1, 2, 3, and 4.

(1 .. 4) # array of four values 1, 2, 3, and 4.

(1.5 .. 4.5) # array of four values 1.5, 2.5, etc.

Perl has lot more available so skip to the Array chapter if interested. FORTRAN and Ada also has extensive array methods.

When comparing character strings, what constitutes equality is of interest. The main question is what do extra blanks on the right side do to the comparison? If we compare the strings "hi" and "hi ", are they equal? There are no spaces after the first "hi", but are blank spaces after the second string.

In QBASIC two string expressions are considered equal if they are the same length and contain identical sequence of characters. In COBOL strings are equal if all the characters are equal and any longer field has just blank spaces on the right. Thus in COBOL for the purpose of comparison, blank spaces are padded on the right side of the shorter field. So in QBasic the character strings "Hi " and "Hi" would be not equal , but in COBOL they would be evaluated as equal . We may be able to find a pattern where business programming languages will ignore extra right-most blank spaces, but non-business languages do not.

Some languages have an operator to repeat string constants . For example, PL/I has the following method:

(2)’Walla ‘ /* Walla Walla */

(35)’ ' /* 35 spaces */

repeat('Walla ', 2)

which will get you the city ‘Walla Walla ‘ which you probably type a lot. (Quick, name other cities with identical words.) This repetition could be used to assign a value to a PL/I character string as follows:

DECLARE CITY CHAR(20);

CITY = (2)’Walla ‘;

Perl uses the "x" operator to repeat strings, and Python and Ruby use the "*" operator to repeat strings. If we wanted to generate the string hahaha, we could do it as follows:

PL/I Perl Python/Ruby (3)’ha’ ‘ha’ x 3 ‘ha’ * 3

Few languages, besides these languages have any way to do this. But I suspect someone will e-mail about another language I missed.

I went over literals in many languages in this chapter. At the simplest level literals are very similar in many languages, but when we look in more detail we find a lot of interesting differences. Some languages allow an underscore in numbers, like a comma is used to indicate thousands and millions. And when we get to character strings there are a lot of difference. Quoting and interpreting character strings varies a lot, especially in the scripting languages. And there are several ways to handle long character strings and special characters in them including newlines.

Two ways of handling booleans have been discussed. One method was where we use reserved words like true and false in FORTRAN and Pascal. The second way was using zero for false and one (or anything not zero) for true in C. What should we do in OPL? An interesting problem with booleans is how do we print and input boolean values. No present solution seems very elegant. Can you come up with an elegant solution as your PhD thesis? Well maybe your A.S. thesis? How should we delimit our character strings in OPL? Shall we use apostrophes or quotation marks or either, or do you have a better idea?

4. What method do you think we should use in OPL to use the string delimiter within the character string? Two present methods were presented: doubling the quote, or an escape sequence.

5. Perl and the UNIX shell have an interesting quoting mechanism. Variables and escape sequences are evaluating inside double quotes, but not single quotes. Can we use this in OPL and if so how? Would this feature be more interesting in some languages than others such as business, scientific, or system languages?

6. Perl and the UNIX shell have an interesting way to execute system commands with the back quotes. Can we use this in OPL and if so how? Would this feature be more interesting in some languages than others: business language, scientific languages, and system languages?

7. Are boolean values ordered? If so is false < true?

8. Python has the triple quote character string. This is a character string that starts with three quotes (single or double), and then most anything can be inside the character string including new lines and other quotes. Look this up and compare it to other methods such as the qq/ in Perl. Do you think this method would be a good additional for OPL?

9. Each language has a different way to indicate octal or hexadecimal constants. Develop a chart by language indicating how each language does this. What method do you recommend for OPL?