Date Sun, March 01, 2020

Update Mar 14, 2020: I’m working on an update to the article based on all the feedback I’ve received so far. Stay tuned!





I was reading Computer Systems: A Programmer’s Perspective the other day and in the chapter on Unix I/O the authors mention that there is no explicit “EOF character” at the end of a file.

If you’ve spent some time reading and/or playing with Unix I/O and have written some C programs that read text files and run on Unix/Linux, that statement is probably obvious. But let’s take a closer look at the following two points related to the statement in the book:

EOF is not a character EOF is not a character you find at the end of a file



1. Why would anyone say or think that EOF is a character? I think it may be because in some C programs you can find code that explicitly checks for EOF using getchar() and getc() routines:

#include <stdio.h> ... while (( c = getchar ()) != EOF ) putchar ( c ); OR FILE * fp ; int c ; ... while (( c = getc ( fp )) != EOF ) putc ( c , stdout );

And if you check the man page for getchar() or getc(), you’ll read that both routines get the next character from the input stream. So that could be what leads to a confusion about the nature of EOF, but that’s just me speculating. Let’s get back to the point that EOF is not a character.

What is a character anyway? A character is the smallest component of a text. ‘A’, ‘a’, ‘B’, ‘b’ are all different characters. A character has a numeric value that is called a code point in the Unicode standard. For example, the English character ‘A’ has a numeric value of 65 in decimal. You can check this quickly in a Python shell:

$python >>> ord('A') 65 >>> chr(65) 'A'



Or you could look it up in the ASCII table on your Unix/Linux box:

$ man ascii





Let’s check the value of EOF by writing a little C program. In ANSI C, EOF is defined in <stdio.h> as part of the standard library. Its value is usually -1. Save the following code in file printeof.c, compile it, and run it:

#include <stdio.h> int main ( int argc , char * argv []) { printf ( "EOF value on my system: %d

" , EOF ); return 0 ; }





$ gcc -o printeof printeof.c $ ./printeof EOF value on my system: -1

Okay, so on my system the value is -1 (I tested it both on Mac OS and Ubuntu Linux). Is there a character with a numerical value of -1? Again, you could check the available numeric values in the ASCII table or check the official Unicode page to find the legitimate range of numeric values for representing characters. But let’s fire up a Python shell and use the built-in chr() function to return a character for -1:

$ python >>> chr ( -1 ) Traceback ( most recent call last ) : File "<stdin>" , line 1 , in <module> ValueError: chr () arg not in range ( 0x110000 )

As expected, there is no character with a numeric value of -1. Okay, so EOF (as seen in C programs) is not a character.

Onto the second point.



2. Is EOF a character that you can find at the end of a file? I think at this point you already know the answer, but let’s double check our assumption.

Let’s take a simple text file helloworld.txt and get a hexdump of the contents of the file. We can use xxd for that:

$ cat helloworld.txt Hello world! $ xxd helloworld.txt 00000000 : 4865 6c6c 6f20 776f 726c 6421 0a Hello world!.

As you can see, the last character at the end of the file is the hex 0a. You can find in the ASCII table that 0a represents nl, the newline character. Or you can check it in a Python shell:

$ python >>> chr ( 0x0a ) '

'



Okay. If EOF is not a character and it’s not a character that you find at the end of a file, what is it then?

EOF (end-of-file) is a condition provided by the kernel that can be detected by an application.

Let’s see how we can detect the EOF condition in various programming languages when reading a text file using high-level I/O routines provided by the languages. For this purpose, we’ll write a very simple cat version called mcat that reads an ASCII-encoded text file byte by byte (character by character) and explicitly checks for EOF. Let’s write our cat version in the following programming languages:

ANSI C

C Python

Go

JavaScript (node.js)

You can find source code for all of the examples in this article on GitHub. Okay, let’s get started with the venerable C programming language.

ANSI C (a modified cat version from The C Programming Language book) /* mcat.c */ #include <stdio.h> int main ( int argc , char * argv []) { FILE * fp ; int c ; if (( fp = fopen ( *++ argv , "r" )) == NULL ) { printf ( "mcat: can't open %s

" , * argv ); return 1 ; } while (( c = getc ( fp )) != EOF ) putc ( c , stdout ); fclose ( fp ); return 0 ; } Compile $ gcc -o mcat mcat.c Run $ ./mcat helloworld.txt Hello world!

Quick explanation of the code above: The program opens a file passed as a command line argument

The while loop copies data from the file to the standard output one byte at a time until it reaches the end of the file.

On reaching EOF , the program closes the file and terminates Python 3 Python doesn’t have a mechanism to explicitly check for EOF like in ANSI C, but if you read a text file one character at a time, you can determine the end-of-file condition by checking if the character read is empty: # mcat.py import sys with open ( sys . argv [ 1 ]) as fin : while True : c = fin . read ( 1 ) # read max 1 char if c == '' : # EOF break print ( c , end = '' )

$ python mcat.py helloworld.txt Hello world! Python 3.8+ (a shorter version of the above using the walrus operator): # mcat38.py import sys with open ( sys . argv [ 1 ]) as fin : while ( c : = fin . read ( 1 )) != '' : # read max 1 char at a time until EOF print ( c , end = '' )

$ python3.8 mcat38.py helloworld.txt Hello world! Go In Go we can explicitly check if the error returned by Read() is EOF. // mcat . go package main import ( "fmt" "os" "io" ) func main () { file , err : = os . Open ( os . Args [ 1 ]) if err != nil { fmt . Fprintf ( os . Stderr , "mcat: %v

" , err ) os . Exit ( 1 ) } buffer : = make ([] byte , 1 ) // 1 - byte buffer for { bytesread , err : = file . Read ( buffer ) if err == io . EOF { break } fmt . Print ( string ( buffer [: bytesread ])) } file . Close () }

$ go run mcat.go helloworld.txt Hello world! JavaScript (node.js) There is no explicit check for EOF, but the end event on a stream is fired when the end of a file is reached and a read operation tries to read more data. /* mcat.js */ const fs = require ( 'fs' ); const process = require ( 'process' ); const fileName = process . argv [ 2 ]; var readable = fs . createReadStream ( fileName , { encoding : 'utf8' , fd : null , }); readable . on ( 'readable' , function () { var chunk ; while (( chunk = readable . read ( 1 )) !== null ) { process . stdout . write ( chunk ); /* chunk is one byte */ } }); readable . on ( 'end' , () => { console . log ( '

EOF: There will be no more data.' ); });

$ node mcat.js helloworld.txt Hello world! EOF: There will be no more data.



How do the high-level I/O routines in the examples above determine the end-of-file condition? On Linux systems the routines either directly or indirectly use the read() system call provided by the kernel. The getc() function (or macro) in C, for example, uses the read() system call and returns EOF if read() indicated the end-of-file condition. The read() system call returns 0 to indicate the EOF condition.

Let’s write a cat version called syscat using Unix system calls only, both for fun and potentially some profit. Let’s do that in C first:

/* syscat.c */ #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main ( int argc , char * argv []) { int fd ; char c ; fd = open ( argv [ 1 ], O_RDONLY , 0 ); while ( read ( fd , & c , 1 ) != 0 ) write ( STDOUT_FILENO , & c , 1 ); return 0 ; }





$ gcc -o syscat syscat.c $ ./syscat helloworld.txt Hello world!

In the code above, you can see that we use the fact that the read() function returns 0 to indicate EOF.

And the same in Python 3:

# syscat.py import sys import os fd = os . open ( sys . argv [ 1 ], os . O_RDONLY ) while True : c = os . read ( fd , 1 ) if not c : # EOF break os . write ( sys . stdout . fileno (), c )





$ python syscat.py helloworld.txt Hello world!

And in Python3.8+ using the walrus operator:

# syscat38.py import sys import os fd = os . open ( sys . argv [ 1 ], os . O_RDONLY ) while c : = os . read ( fd , 1 ): os . write ( sys . stdout . fileno (), c )





$ python3.8 syscat38.py helloworld.txt Hello world!



Let’s recap the main points about EOF again:

EOF is not a character

is not a character EOF is not a character that you find at the end of a file

is not a character that you find at the end of a file EOF is a condition provided by the kernel that can be detected by an application when a read operation reaches the end of a file

Update Mar 3, 2020 Let’s recap the main points about EOF with added details for more clarity:

EOF in ANSI C is not a character. It’s a constant defined in <stdio.h> and its value is usually -1

in C is not a character. It’s a constant defined in <stdio.h> and its value is usually -1 EOF is not a character in the ASCII or Unicode character set

is not a character in the or Unicode character set EOF is not a character that you find at the end of a file on Unix/Linux systems

is not a character that you find at the end of a file on Unix/Linux systems There is no explicit “ EOF character” at the end of a file on Unix/Linux systems

character” at the end of a file on Unix/Linux systems EOF (end-of-file) is a condition provided by the kernel that can be detected by an application when a read operation reaches the end of a file (if k is the current file position and m is the size of a file, performing a read() when k >= m triggers the condition)

Update Mar 14, 2020: I’m working on an update to the article based on all the feedback I’ve received so far. Stay tuned!



Happy learning and have a great day!



Resources used in preparation for this article (some links are affiliate links):





If you want to get my newest articles in your inbox, then enter your email address below and click "Get Updates!"