To answer your first question, mathematical functions have often been described as "pure" in terms of some specified variables. e.g.:

the first term is a pure function of x and the second term is a pure function of y

Because of this, I don't think you'll find a true "first" occurrence.

For programming languages, a little searching shows that Ada 95 ( pragma Pure ), High Performance Fortran (1993) ( PURE ) and VHDL-93 ( pure ) all contain formal notions of 'pure functions'.

Haskell (1990) is fairly obvious, but purity isn't explicit. GCC's C has various function attributes for various differing levels of 'pure'.

A couple of books: Rationale for the C programming language (1990) uses the term, as does Programming Languages and their Definitions (1984). However, both apparently only use it once! Programming the IBM Personal Computer, Pascal (also 1984) uses the term, but it isn't clear from Google's restricted view whether or not the Pascal compiler had support for it. (I suspect not.)

An interesting note is that Green, the predecessor to Ada, actually had a fairly strict 'function' definition - even memory allocation was disallowed. However, this was dropped before it became Ada, where functions can have side-effects (I/O or global variables), but can't modify their arguments.

C28-6571-3 (the first PL/I reference manual, written before the compiler) shows that PL/I had support for pure functions, in the form of the REDUCIBLE (= pure) attribute, as far back as 1966 - when the compiler was first released. (This also answers your third question.)

This last document specifically notes that it includes REDUCIBLE as a new change since document C28-6571-2. So REDUCIBLE , which is possibly the first incarnation of formal pure functions in programming languages, appeared somewhere between January and July 1966.

Update: The earliest instance of "pure function" on Google Groups in this sense is from 1988, which easily postdates the book references.