Network Working Group A. Bhushan Request for Comments: 114 MIT Project MAC NIC: 5823 16 April 1971 A FILE TRANSFER PROTOCOL I . Introduction RFC 97) for teletypewriter communications, NETRJS (RFC 88) for remote job entry). You, however, have to know the different conventions of remote systems, in order to use them. Indirect usage, by contrast, does not require that you explicitly log into a remote system or even know how to "use" the remote system. An intermediate process makes most of the differences in commands and conventions invisible to you. For example, you need only know a standard set of network file transfer commands for your local system in order to utilize remote file system. This assumes the existence of a network file transfer process at each host cooperating via a common protocol. Indirect use is not limited to file transfers. It may include execution of programs in remote hosts and the transfer of core images. The extended file transfer protocol would facilitate the exchange of programs and data between computers, the use of storage and file handling capabilities of other computers (possibly including the trillion-bit store data computer), and have programs in remote hosts operate on your input and return an output. The protocol described herein has been developed for immediate implementation on two hosts at MIT, the GE645/Multics and the PDP- 10/DM/CG-ITS (and possibly Harvard's PDP-10). An interim version with limited capabilities is currently in the debugging stage. [1] Since our implementation involves two dissimilar systems (Multics is a "service" system, ITS is not) with different file systems (Multics provides elaborate access controls, ITS provides none), we feel that the file transfer mechanisms proposed are generalizable. In addition, our specification reflects a consideration of other file systems on the network. We conducted a survey [2] of network host Bhushan [Page 1]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 3]. A named file is uniquely identified in a system by its file name and directory name. The directory name may be the name of a physical directory or it may be the name of a physical device. An example of physical directory name is owner's project-programmer number and an example of physical device name is tape number. A file may or may not have access controls associated with it. The access controls designate the users' access privileges. In the absence of access controls, the files cannot be protected from accidental or unauthorized usage. A principal objective of the protocol is to promote the indirect use of computers on the network. Therefore, the user or his program should have a simple and uniform interface to the file systems on the network and be shielded from the variations in file and storage systems of different host computers. This is achieved by the existence of a standard protocol in each host. Criteria by which a user-level protocol may be judged were described by Mealy in RFC 91, as involving the notion of logical records, ability to access files without program modifications, and implementability. I would add to these efficiency, extendibility, adaptability, and provision of error-recovery mechanisms. The attempt in this specification has been to enable the reliable transfer of network ASCII (7-bit ASCII in 8-bit field with leftmost bit zero) as well as "binary" data files with relative ease. The use of other character codes, such as EBCDIC, and variously formatted data (decimal, octal, ASCII characters packed differently) is facilitated by inclusion of data type in descriptor headings. An alternative mechanism for defining data is also available in the form of attributes in file headings. The format control characters reserved for the syntax of this protocol have identical code representation in ASCII and EBCDIC. (These character are SOH, STX, ETX, DC1, DC2, DC3, US, RS, GS, and FS.) Bhushan [Page 2]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 4], and using CLS to achieve synchronization when necessary (a CLS is not transmitted until a RFNM is received). The user may be identified by having the using process send at the start of the connection the user's name information (either passed on by user or known to the using system) [5]. This user name information (a sequence of standard ASCII characters), along with the host number (known to the NCP), positively identifies the user to the serving process. At present, more elaborate access control mechanisms, such as passwords, are not suggested. The user, however, will have the security and protection provided by the serving system. The serving host, if it has access controls, can prevent unprivileged access by users from other host sites. It is up to the using host to prevent its own users from violating access rules. The files in a file system are identified by a pathname, similar to the labels described in RFC 76 (Bouknight, Madden, and Grossman). The pathname contains the essential information regarding the storage and retrieval of data. In order to facilitate use, default options should be provided. For example, the main file directory on disk would be the default on the PDP-10/ITS, and a pool directory would be the default on Multics. The file to be transferred may be a complete file or may consist of smaller records. It may or may not have a heading. A heading should contain ASCII or EBCDIC characters defining file attributes. The file attributes could be some simple agreed-upon types or they could be described in a data reconfiguration or interpretation language similar to that described in RFC 83 (Anderson, Haslern, and Heffner), or a combination. Bhushan [Page 3]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 6]. III. SPECIFICATIONS 1 . Transactions 7]. As the length of the filler count field is 8-bits, the number of bits of filler shall not exceed 255 bits. The data count is a binary count of the number of data (i.e., information) bits in the data field, not including filler bits. The number of data bits is limited to (2^24-1), as there are 24 bits in the data count field. Bhushan [Page 4]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 8]. 2 . Transaction Types 9] E 105 45 Response ready-to-receive (rr) < 074 3C ready-to-send (rs) > 076 3E Transfer complete_file * 052 heading # 043 23 part_of_file ' 054 2C last_part . 056 2E Terminate successful (pos.) + 053 2B unsuccessful (neg.) - 055 2D Bhushan [Page 5]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 10], name and filenames have the following syntax (expressed in BNF, the metalanguage of the ALGOL 60 report): <pathname> ::= <device name>|<name>|<pathname>US<name> <device name> ::= DC1<name> <name> ::= <char> | <name> <char> <char> ::= All 8-bit ASCII or EBCDIC characters except US, RS, GS, FS, DC1, DC2, AND DC3. <filenames> ::= <name>|<filenames> RS <name> The data type for the request transaction shall be either A (octal 101 for ASCII, or E (octal 105) for EBCDIC [11]. Some examples of pathname are: DC1 MT08 DC1 DSK 1.2 US Net<3> US J.Doe US Foo udd US proj. US h,n/x US user US file filename 1 filename 2 Bhushan [Page 6]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 section 4 will govern the syntax of the data field in transfer transactions. No other syntactical restrictions exist. 2B.4 Terminates The successful terminate shall normally have an empty data field. The unsuccessful terminate may have a data field defined by the data types A (octal 101) for ASCII, E (octal 105) for EBCDIC, or S (octal 123) for status. A data type code of 'S' would imply byte oriented error return status codes in the data field. The following error return status codes are defined tentatively: Error Code Meaning Error Code ASCII Octal Hexadecimal Undefined error U 125 55 Transaction type error T 124 54 Syntax error S 123 53 File search failed F 106 46 Data type error D 104 44 Access denied A 101 41 Improper transaction sequence I 111 49 Time-out error O 117 4F Error condition by system E 105 45 2C. Semantics 2C.1 Requests Requests are always sent by using host. In absence of a device name or complete pathname, default options should be provided for all types of requests. _Identify_ request identifies the user as indicated by <pathname> from serving to using host. _Retrieve_ request achieves the transfer of file specified in <pathname> from serving to using host. Bhushan [Page 7]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 12]. 2C.2 Response Responses are always sent by serving host. The rr response indicates that serving host is ready to receive the file indicated in the preceding request. The rs response indicates that the next transaction from serving host will be the transfer of file indicated in the preceding request. Bhushan [Page 8]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 3 . Transaction Sequence 13]. The exact sequence in which transactions occur depends on the type of request. A transaction sequence may be aborted anytime by either host, as explained in Section 3C. 3B. Examples The identify request doesn't require a response or terminate and constitutes a transaction sequence by itself. Bhushan [Page 9]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 4 . Data Types 14]. Although a large number of data types are defined, specific implementations may handle only a limited subset of data types. It is recommended that all host sites accept the Bhushan [Page 10]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 Bhushan [Page 11]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 Bhushan [Page 12]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 15]. Specifically when no conversion is to be performed, the data type used will be binary. The implicit or explicit byte size is useful as it facilitates storing of data. For example, if a PDP-10 receives data types A, A1, AE, or A7, it can store the ASCII characters five to a word (DEC-packed ASCII). If the datatype is A8 or A9, it would store the characters four to a word. Sixbit characters would be stored six to a word. If conversion routines are available on a system, the use of system program could convert the data from one form to another (such as EBCDIC to ASCII, IBM floating point to DEC floating point, Decimal ASCII to integers, etc.). 5 . Initial Connection, CLS, and Identifying Users 16] for the cooperating process on the serving host. The connection establishment will be in accordance with the initial connection protocol of RFC 66 as modified by RFC 80. The NCP dialog would be: user to server: RTS<us><3><p> if accepted, server to user: STR<3><us><CLS><3><us> server to user on link p: <ss> server to user: STR<ss+1><us>RTS<ss><us+1><q> user to server: STR<us><ss+1>RTS<us+1><ss><r> This sets up a full-duplex connection between user and server processes, with server receiving through local socket ss from remote socket us+1 via link q, and sending to remote socket us through local socket ss+1 via link r. 5B. The connection will be broken by trading a CLS between the NCP'S for each of the two connections. Normally the user will initiate the CLS. CLS may also be used by either the user or the server to abort a data transmission in the middle. If a CLS is received in the middle of a transaction sequence, the whole transaction sequence will be aborted. The using host will then reopen the connection. 5C. The first transaction from the user to server will be the identify transaction. The users will be identified by the pathname in data field of the transaction which should be a Bhushan [Page 13]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 Bhushan [Page 14]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 17]. The present specification of the protocol does not allow the simultaneous transfer and processing of multiple requests over the same pair of connections. If such a capability is desired, there is an easy way to implement it which only involves a minor change. A transaction sequence identification number (TSid) could replace a NUL field in the descriptor of transactions. The TSid would facilitate the coordination of transactions, related to a particular transaction sequence. The 256 code combinations permitted by the TSid would be used in a round-robin manner (I can't see more than 256 outstanding requests between two user-processes in any practical implementation). An alternate way of simultaneous processing of requests is to open new pairs of connection. I am not sure as to how useful simultaneous processing of requests is, and which of the two is a more reasonable approach. V . Conclusions 1] The interim version of the protocol, limited to transfer of ASCII files, was developed by Chander Ramchandani and Howard Brodie of Project MAC. The ideas of transactions, descriptors, error recovery, aborts, file headings and attributes, execution of programs, and use of data types, pathnames, and default mechanisms are new here. Howard Brodie and Neal Ryan have coded the interim protocol in the PDP-10 and the 645, respectively. [2] The network system survey was conducted last fall by Howard Brodie of Project MAC, primarily by telephone. [3] PDP-10 Reference Handbook, page 306. Bhushan [Page 15]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 4] We considered using two full-duplex links, one for control information, the other for data. The use of a separate control link between the cooperating processes would simplify aborts, error recoveries and synchronization. The synchronization function may alternatively be performed by closing the connection (in the middle of a transaction sequence) and reopening it with an abort message. (The use of INR and INS transmitted via the NCP control link has problems as mentioned by Kalin in RFC 103.) We prefer the latter approach. [5] Identifying users through use of socket numbers is not practical, as unique user identification numbers have not been implemented, and file systems identify users by name, not number. [6] This subject is considered in detail by Bob Metcalfe in a forthcoming paper. [7] Filler bits may be necessary as particular implementations of NCP's may not allow the free communication of bits. Instead the NCP's may only accept bytes, as suggested in RFC 102. The filler count is needed to determine the boundary between transactions. [8] 72-bits in descriptor field are convenient as 72 is the least common multiple of 6, 8, 9, 18, 24 and 30, the commonly encountered byte sizes on the ARPA network host computers. [9] The execute request is intended to facilitate the indirect execution of programs and subroutines. However, this request in its present form may have only limited use. A subroutine or program mediation protocol would be required for broader use of the execute feature. Metcalfe considers this problem in a forthcoming paper. [10] The pathname idea used in Multics is similar to that of labels in RFC 76 by Bouknight, Madden and Grossman. [11] We, however, urge the use of standard network ASCII. [12] The exact manner in which the input and output are transmitted would depend on specific mediation conventions. Names of input and output files may be transmitted instead of data itself. [13] The transactions (including terminate) are not "echoed", as echoing does not solve any "hung" conditions. Instead time-out mechanisms are recommended for avoiding hang-ups. [14] The data type mechanism suggested here does not replace data reconfiguration service suggested by Harslem and Heafner in RFC 83 and NIC5772. In fact, it complements the reconfiguration. For Bhushan [Page 16]

RFC 114 A FILE TRANSFER PROTOCOL 16 April 1971 15] The internal character representation in the hosts may be different even in ASCII. For example PDP-10 stores 7-bit characters, five per word with 36th bit as don't care, while Multics stores them four per word, right-justified in 9-bit fields. [16] It seems that socket 1 has been assigned to logger and socket 5 to NETRJS. Socket 3 seems a reasonable choice for the file transfer process. [17] The term program mediation was suggested by Bob Metcalfe who is intending to write a paper on this subject. [ This RFC was put into machine readable form for entry ] [ into the online RFC archives by Ryan Kato 6/01] Bhushan [Page 17]