Subject: Re: long longs in c From: mash@mash.engr.sgi.com (John R. Mashey) Date: Aug 16 1995 Newsgroups: comp.lang.c In article <40g7uj$c74@hpuerci.atl.hp.com>, swm@atl.hp.com (Sandy Morton) writes: |> Organization: Hewlett-Packard Company, Technology Solutions Lab |> |> In article <danpop.808147613@rscernix>, Dan.Pop@mail.cern.ch (Dan Pop) writes: |> |> long long is not a C feature. Compiler specific questions belong to |> |> system specific newsgroups, in this case comp.sys.hp.hpux. |> |> Okay. I'm not arguing with you and I plan to post my problem there as well. |> But ... are you sure about long longs not being a C feature? I was under |> the impression they were being added to the ansi standard (thus the reason |> HP is implementing them). They are still very new, and until HP's 10.0 ... 1) long longs are not part of ANSI C ... but probably will be, since: 2) They are implemented by many vendors. 3 years ago, there was an informal working group that included many vendors, (addressing 64-bit C progrmaming models for machines that also had 32-bit models), and the general consensus was that as much as we despised the syntax, it was: a) Already in CONVEX & Amdahl, at least b) Already in Gnu C c) And various other hardware vendors either already had it in or were planning to. Somebody in this group was also on ANSI C committee, and observed that fact of long long not being in ANSI C was no reason not to agree on doing it, since standards generally codify existing practice, rather than inventing new things, when reasonably possible. 3) On SGI, printf long long uses %lld. I don't know what others do. -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@sgi.com DDD: 415-390-3090 FAX: 415-967-8496 USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

Subject: Re: long longs in c From: mash@mash.engr.sgi.com (John R. Mashey) Date: Aug 17 1995 Newsgroups: comp.lang.c,comp.std.c In article <danpop.808659017@rscernix>, Dan.Pop@mail.cern.ch (Dan Pop) writes: |> |> In <40tdmr$j8k@murrow.corp.sgi.com> mash@mash.engr.sgi.com (John R. Mashey) writes: |> |> >1) long longs are not part of ANSI C ... but probably will be, since: (lots of people have implemented it, if not previously, as instigated by 64-bit working group in 1992). |> Well, you'd better have a look at comp.std.c. None of the committee |> people posting there seems to be favouring the addition of long long |> in C9X. They're considering other schemes. long long seems to be |> doomed to be a vendor extension. I believe this conclusion to be unwarranted .... a) Some features are random extensions by individual vendors. b) Some extensions get widely implemented in advance of the standard, because they solve some problem that cannot wait until the next standard ... after all, standards have no business changing overnight. c) Standards committees may well need to sometimes invent new things [like, when volatile was added years ago]. d) However, if an extension is widely implemented, it is incumbent on an open standards committee to give that extension serious consideration ... because otherwise, there is a strong tendency for the defacto standards to evolve away from the dejure standard... which is probably not a good idea. e) Again, as I said before, the 1-2 members of the 1992 group were also in the ANSI C group ... were where I got the opinion above from, i.e., don't let the non-existence of long long in the standard stop you from making progress - it is better to do something consistent. IF long long has definitively been ruled out (as opposed to being disliked by a few committee members), it would be interesting to hear more... as it seems inconsistent with past behavior, which has at least sometimes ratified existing practices that were less than elegant... and was appropriate in doing so. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@sgi.com DDD: 415-390-3090 FAX: 415-967-8496 USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

Subject: Re: long longs in c From: mash@mash.engr.sgi.com (John R. Mashey) Date: Aug 21 1995 Newsgroups: comp.lang.c,comp.std.c In article <412dkr$7dm@newsbf02.news.aol.com>, ffarance@aol.com (FFarance) writes: |> > |> In <40tdmr$j8k@murrow.corp.sgi.com> mash@mash.engr.sgi.com (John R. |> Mashey) writes: |> >|> |> > |> >1) long longs are not part of ANSI C ... but probably will be, |> since: |> > (lots of people have implemented it, if not previously, as instigated |> > by 64-bit working group in 1992). |> |> The "long long" type is unlikely to be included in C9X. Although the |> problem has been discussed in the EIR (extended integer range) working |> group of NCEG (numeric C extensions group -- X3J11.1) for several years, |> over the past two years is has been recognized as a faulty solution. It is informative to hear that this has been recognized over the last two years as faulty ... but there is a *serious* problem here... SO, WHEN DO WE GET A NON-FAULTY SOLUTION? I.e., there are proposals. When does one get accepted *enough* that vendors dare go implement it and expect it will actually persist (or close enough) in the final standard? (For example, years ago, "volatile" was clearly known to be coming soon enough (1985/1986) to save those of us worried about serious optimization, even though the standard didn't get approved until later.) ========= I'd like to go through some background, facts, and then a few opinions, to observe that in the effort to get a "perfect" solution, we are now in the awkward position of lacking a seriously-necessary feature / direction in the standard, with the usual result: individual vendors go implement extensions, not necessarily compatible. This particular one is *very* frustrating, since it is not rocket science, but rather predictable. Note: none of this is meant to be criticism of people involved in the standards process, which is inherently a frustrating, and often thankless task. It is meant as a plea to balance perfectionism versus pragmatism, of which both are needed to make progress. It is also a plea that people involved in this *must* have a good feel for likely hardware progress, especially for a language like C that has always been meant to make reasonably efficient use of hardware. While I have no special love for "long long", especially its syntax, and while there are plenty of issues that need to be dealt with, and while I hardly believe C's type system is perfect ... I believe that we have a *serious* problem in 1995, to *not* have multiple implementations of compilers accepting a 64-bit integer data type, such that it was already well-lined up to become part of the standard. The situation we are in, is like where we would have been ~1978, had we not already had "long" in the language for several years. That is: a) PDP-11 UNIX would have been really ugly in the area of file pointers, since int hadn't been big enough for a long time, that is, the 16-bit systems needed first-class 32-bit data, regardless of anything else. Structs with 2 ints really wouldn't have been very pleasant. Likewise, limiting files to 64KB wasn't either. b) Preparing cross-compilers and tools for 32-bit systems would have been more painful, that is, it was good to have first-class data on the 16-bit system to prepare the tools, and when done, to get code that made sense on both 16- and 32-bit systems. c) It would have been far more difficult to maintain portability and interoperability between 16- and 32-bit systems, that is, one could both write portable code if one was careful, and especially, one *could* provide structs for external data that looked the same, since both 16- and 32-bit systems could describe 8-, 16-, and 32-bit data. Of course, this was in the days before people had converted as much code to using typedefs, which made it pretty painful. Deja vu... in 1995: a) There is a great desire by many to have a UNIX API for 64-bit files on 32-bit systems (called the Large File Summit), since 2GB file limits are behind the power curve of disks these days. This is no problem on 64-bit systems, and it's not really too unclean on 32-bit systems (if you've added a a first-class 64-bit type and can typedef onto it. Yes, some older code breaks ... but well-typedefed code is OK.) Some people implemented this in 1994. b) Every major RISC microprocessor family used in systems either already has a 64-bit version on the market [1992: MIPS & DEC], or has one coming soon [1995: Sun UltraSPARC, IBM/Moto PPC620, 1996: HP PA-8000]. Hence, some people have already done compilers that have to run on 32-bit machines, to produce code for 64-bit machines ... just like running PDP-11 compilers to produce VAX code. c) Right now, without a 64-bit integer datatype usable in 32-bit C, there is the same awkwardness we would have had, had we not had long back in the 16->32 days. (But what about 128-bits: I'd be pleased to have a 128-bit type as well ... however, a pragmatic view says: we have the 64-bit problem right now, we've had it for several years; we won't have the 128-bit problem for quite a few years. Based on the typical 2 bits every 3 years increase in addressing, a gross estimate is: 32 bits/2 bits * 3 years = 48 years, or 1992+48 = 2040. Personally, I'm aggressive, so might say year 2020 for wanting 128-bit computers ... but on the other hand, there are some fairly serious penalties for building 128-bit wide integer datapaths, and there are some other impediments to making the 64->128-bit transition as smooth as the 32->64 one; In any case, I will be very surprised to see any widespread use, in general-purpose systems, of 128-bit-wide integer datapaths, in 10 years (2005). I wouldn't be surprised to see 128-bit floating-point in some micros, but 128-bit integers would indeed surprise me. Hence, I'd much rather have a simple solution for 64-bit right now. Of course, a plan that allows something sensible for the bigger integers over time is goodness. BACKGROUND If 64-bit microprocessors are unfamiliar, consider reading: John R. Mashey, "64-bit Computing", BYTE, Sept 1991, 135-142. This explained what 64-bit micros were, the hardware trends leading to this, and that there would be widespread use of them by 1995 (there is). While a little old, most of what I said there still seems OK. SOME FACTS 1) Vector supercomputers have been 64-bit systems for years. One may argue that these are low-volume, and for several reasons (word-addressing on CRAYs, non-existence of 32-bit family members, etc, stronger importance of FORTRAN, etc), some people might argue that these are not very relevant to C ... but still, there are several $B of hardware installed, and, for example, CONVEX has supported long long as 64-bit integer for years. CRAY made int 64 bits, and short 32 bits. 2) In 1992, 64-bit microprocessors became available from MIPS (R4000) and DEC (Alpha), and started shipping in systems. For {SPARC, PPC, HP PA}, the same thing happens in 1995 or 1996 - the chips have all been announced; some people guess the Intel/HP effort appears in 1998. 3) From 1992 thru current, I estimate there must be about $10B installed base of 64-bit-capable microprocessor hardware already sold. Most of it is still running 32-bit OSs, although some of the 32-bit OS code uses 64-bit integer manipulations for speed. I *think* >$1B worth is already running 64-bit UNIX + programming environments, i.e., DEC UNIX and SGI IRIX 6 (shipped 12 months ago). While some 64-bit hardware will stay running 32-bit software for a while, new OS releases may well convert some of the existing hardware to 64-bit OSs, and an increasing percentage of newer systems will run the 64-bit OSs, especially in larger servers. [2GB/4GB main memory limits do not make the grade for big servers these days; while one can get above this on 32-bit hardware, it starts to get painful.] 4) DEC UNIX is a "pure" 64-bit system, that is, there is no 32-bit programming model, since there was no such installed base of Alpha software, i.e., that was a plausible choice for DEC. SGI's IRIX 6 is a "mixed 64/32" model, i.e., it is a 64-bit OS that supports both 32- and 64-bit models, and will, i.e., that is not a transitional choice, as we believe that many applications will stick in 32-bit for a long time. IRIX 5 & 6 both support a 64-bit interface to 64-bit file systems in 32-bit user programs, i.e., somewhere underneath is a long long, although carefully typdeffed to avoid direct references in user code. DEC UNIX proves you can port a lot of software to 64-bit; IRIX proves you can make code reasonably portable between 32- and 64-bit. Both of these systems use the so-called LP64 model, i.e., at this instant, the total installed base of 64-bit software (with possible exception of CRAY T3D) uses LP64: sizes in bits Name char short int long ptr long long Notes ILP32 8 16 32 32 32 64 many LLP64 8 16 32 32 64 64 longlong needed LP64 8 16 32 64 64 64 DEC, SGI ILP64 8 16 64 64 64 64 (needs 32-bit) (The comments mean: in LLP64 (Longlong+Pointer are 64), you need *something* to describe 64-bit integers; in ILP64 (integer, long, pointer are 64) you'll want to add some other type to describe 32-bit integer. I didn't invent this nomenclature, which is less than elegant :-) 5) In 1992, there was a 6-month effort among {whole bunch of vendors} to see if we could agree on a choice of {LLP64, LP64, ILP64}. There was *serious* talent involved from around the industry, but at that time, we could not agree. As it turns out, it probably doesn't matter much for well-typdeffed code, i.e., newer applications. Some older code breaks no matter what you choose, and and some older code works on 1-2 of the models and breaks on the other(s), with the breakage depending on the specific application. What we did agree on was (1) Supply some standard typedef names that application vendors could use, if they wanted, and if they didn't already have their own set of typedefs. Some vendors have done this. (2) Do long long as a 64-bit integer datatype (NOT as a might-be-any-size >= long), so we'd at least have one. NOTE: this is more for the necessities of ILP32; LP64 and ILP64 could get away without it, but the problem is in dealing with 64-bit integers from ILP32 programs ... similar to the 16/32-bit days. As noted, there were several people in this group also involved in ANSI C, and we asked them about the wisdom of doing this, and were told, unambiguously, that we might as well go ahead and do it. Whether it was good or not, it was not for lack of communication... 6) Now, there is a new 64-bit initiative to get some 64-bit API and data representation issues settled. The first part (API), is crucial, and ISVs really want it badly; that is, some vendors have already done 64-bit ports, but a lot more are getting there, and we're starting to get into the "big" applications that have masses of software, and not surprisingly, the ISVs do not want to have to redo things any more than they need to. OPINIONS: 1) There is a right time, and two wrong times, to standardize something. If is standardized too early, before some relevant experience has accumulated, bad mistakes can be made. If it standardized too late, a whole bunch of people will have already done it, likely in more-or-less incompatible ways, especially in the subtle cases, or will have gotten into a less-than-elegant solution, basically out of desperation to get something done. I'd distinguish between two cases: a) Add an extension because it is cool, because customers have asked for it, because it helps performance, etc, etc ... or because some competitor puts it in :-) b) Add an extension because fundamental external industry trends make it *excruciatingly painful* to do without the extension or some equivalent. I think "long long" fits b) better than a); people aren't doing this for fun; they are doing it to fit the needs of straightforward, predictable, hardware trends that mostly look like straight lines on semi-log charts, with a transition coming very similar to that which occurred going from PDP-11 to VAX, i.e., not rocket science, not needing brilliant innovations. 2) So, when is the right time to have at least gotten a simple data type available to represent 64-bit integers (in a 32-bit environment, i.e., assuming that long was unavailable)? 1989: nobody would even admit to working on 64-bit micros. 1990: MIPS R4000 announced (late in year). 1991: Various vendors admit to 64-bit plans; 2GB (31-bits) SCSI disks starting 1992: 64-bit micros (MIPS, Alpha) ship in systems from multiple vendors 1992/1993: DEC ships OSF/1 (I can't recall whether late in 1992, or in 1993) 1994: SGI ships IRIX 6 (64/32-bit) 1996: IBM/Motorola PPC620, Sun UltraSPARC, HP PA8000 out in systems; (PPC620 & UltraSPARC might be out in 1995, but for sure by 1996). 1996: ? 1998: ? Intel/HP 64-bit chip ?? From the above, it sure looks to me like we really needed to get *something* for a 64-bit datatype in C (again, general agreement, not in a formal standard), usable in ILP32 environments: 1991: would have been wonderful, but too much to expect. 1992: more likely, and there were some people with experience, and there were several real chips to help check speed assumptions. 1993: starting to get a little late 1995: too late to catch most of the effort. Without going through the sequences in detail, the usual realities of adding this kind of extension usually mean that somebody is adding the extension to C a year before it would ship (on 32-bit system), and probably 2 years before there's a 64-bit UNIX shipped. This means there were several companies with committed development efforts in 1991/1992. So, in summary: it would have been really nice if we could have gotten something agreeable (that is not blessed as standard, that always takes longer, but with some agreement of intent) in 1991, or at least in 1992, late enough for people to have some experience, but early enough to get something consistent to customers that could still have a chance of being blessed later on. Proposals in 1995 ... are late enough that many vendors will have already done the compiler work that they need to ship 64-bit products in 1996 or 1997... Of course, this is hindsight, and I do feel a little bad for not pushing harder on this in 1991. |> - After much analysis, the problem is not ``can I standardize |> "long long", or how to I get a 64-bit type, or what is the name |> of a 64-bit type'', but ``loss of intent information causes |> portability problems''. This isn't an obvious conclusion. You |> should read the paper to understand the *real* problem. Hmmm. Having started with C in 1973, from Dennis' 25-pager (that's all there was), and having gone thru the 16- to 32-bit transitions, and Little-Endian -> Big-Endian transitions, and being old enough to be at least acknowledged in the first K&R C book, and having /managed various UNIX ports, and worked on C compilers, and having moved applications around, and having helped design RISC micros with some strong input from what real C code looked like, and having helped design the first 64-bit RISC micro ... I think I understand "loss of intent", which was certainly a major topic of the 1992 series of meetings. (we just couldn't agree on which intents were more common or more important.) One more time: I claim "how I get a 64-bit type" IS a problem; I don't think it's the only problem, and there may well be more general ways to handle these issues (and as soon as I dig up gnu zip so I can look at the files, I'll look at the SBEIR files). BUT, I CLAIM THAT IT IS A *REAL* PROBLEM WHEN $9B OF COMPUTERS CAN'T EVEN USE A SIMPLE C INTEGER DATATYPE TO DESCRIBE THEIR OWN INTEGER REGISTERS. ($9B = $10B - $1B running 64-bit OSs). |> - The use of "long long" causes more harm (really!) because it |> creates more porting problems. As a simple example, while we Causes more harm than what? Remember, some of us had no choice but to figure out something to do in 1991 or 1992, to get 32-bitters ready to deal with 64-bit. In any case, whether it causes more harm or not, a whole bunch of us found some 64-bit integer data type *necessary*. |> might believe that "long long" is a 64-bit type, what happens |> when you move the code to 128-bit machines? The "long long" It is *very* unlikely that there will be any 128-bit-integer CPUs used in general-purpose systems in the next few years; it would be nice if we could handle them, and the 64->128 transition earlier in the sequence than we did this time. I'd be delighted if a better type system were in place well before the time somebody has to worry about. [I expect to have retired long before that, but I do have colleagues young enough that they will have to worry about :-)] We tell everybody to use typedefs anyway; and some do; we do our best to typedef all of the APIs so people use the right things; would it have made people happier to have called this int64_t or __int64_t? But, in any case, as far as I can tell, anyone who is using this is just treating it as a 64-bit integer. If somebody is doing something else, I'd be interested in hearing it. |> type will probably map into the X or Y typedef above. This |> will cause porting problems because whatever "long long" is |> mapped into, some will believe it is (and use it as) ``the |> fastest type of at least 64 bits'' and others will believe it |> is ``exactly 64 bits''. Thus, in the port to 128-bit machines, |> we have to track down these implicit assumptions because |> programmers |> *rarely* document their intent (e.g., ``I want the fastest 32-bit |> type'') and, mostly, they believe the type itself documents |> intent (!). This is how porting problems are created. Like I say, I am *seriously* worried about supporting 64-bit integers on 32-bit machines, and *seriously* worried about source compatibility between 32- and 64-bit machines ... 128-bit machines are far away, and citing them as a big concern isn't a big help right now, although any major change to the type scheme should certainly be played off versus the realities, especially before all of us old fogies who've actually gone through 2 factor-of-2-up-bits changes are out of this business :-) |> > c) Standards committees may well need to sometimes invent new |> > things [like, when volatile was added years ago]. |> |> This solution wasn't ``just invented'', but developed over years by |> analyzing what the *real* problem is. The nature of the solution matches |> the nature of the problem. BTW, bit/byte ordering/alignment and |> representation (e.g., two's complement) will be addressed in separate |> proposals. The SBEIR proposal only addresses range extensions. Sorry, I didn't mean to imply that committees invented random features on the spur of the moment, but rather sometimes had to create features found in few, if any existing implementations. I.e., "invent" was not a pejorative in any way. |> > e) Again, as I said before, the 1-2 members of the 1992 group were |> also |> > in the ANSI C group ... were where I got the opinion above |> from, |> > i.e., don't let the non-existence of long long in the standard |> > stop you from making progress - it is better to do something |> > consistent. |> |> In 1992, that was probably a reasonable opinion. Since then we understand |> the problem and have solutions being worked now. Again ... if the solutions are being worked on now ... they are too late, I'm afraid. |> I think if we could have fixed "long long", even with a 90% solution, |> we would have done it. Among the reasons for not including "long long" |> are: we'd have to solve this problem again 10 years from now when people |> were asking for "long long long" for their 128-bit machines; "long long" |> causes more portability problems *across different architectures* than |> it helps. Years ago, many people wondered out aloud if we could find |> a ``right'' solution that solved the problem for once and all. The |> SBEIR proposal is one solution. We agree on lots of things; I don't think long long solves all the problems. I'd hope there's something better for 128-bit than long long long ... but I am really concerned that the common law "the best is the enemy of the very good" is in operation here. I think I have good reason to believe that 128-bit-integer machines are 25 years away, i.e., longer than the existence of C... Meanwhile, $9B (and growing fast) worth of computers ... and having long long, *demonstrably* has helped a bunch of porting efforts already. -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@sgi.com DDD: 415-390-3090 FAX: 415-967-8496 USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

Subject: Re: long longs in c From: mash@mash.engr.sgi.com (John R. Mashey) Date: Aug 22 1995 Newsgroups: comp.lang.c,comp.std.c In article <danpop.809087008@rscernix>, Dan.Pop@mail.cern.ch (Dan Pop) writes: (Hmm, some rather strong and pejorative statements about many people): |> In <41b0qq$juj@murrow.corp.sgi.com> mash@mash.engr.sgi.com (John R. Mashey) writes: |> |> >One more time: I claim "how I get a 64-bit type" IS a problem; I don't |> |> And the solution is straightforward: have a 64-bit long. C has 4 basic |> integral types and each of them can have a different size: 8, 16, 32 and |> 64 bits. Only brain dead software, making unwarranted assumptions about |> the relative sizes of int's, long's and pointers will be affected. In the real world, any vendor who, in 1991, declared that in their 32-bit environment, the sizeof long would now be 8 bytes, would have been lynched by their ISVs. Worse, such vendors would immediately have dropped to the bottom of the port lists, incurring serious financial damage. These things may be irrelevant to someone in a research environment, some of which place highest priorities on 1) their own code and 2) free software and relatively little on software from ISVs. But these things are *not* irrelevant to many of the rest of us. Those in research environments, paid for with research funding, may not consider these things important ... but a vendor that ignores such issues usually gets hurt badly, in many cases, going out of business. This effect is most commonly seen in high-end technical computing, where mean-time to bankruptcy is a an important parameter for purchase, and where environments difficult to program have died pretty badly. |> Because of DEC OSF/1, most free software has been already fixed. |> It's high time to stop looking at the model imposed by the VAX as being |> the holy grail. Nobody that I know involved in such decisions thinks the VAX model is the holy grail... |> The "long long" pseudo-solution wasn't needed in the first place, it was |> a mistake made by vendors who didn't have the balls to do the right thing, |> then other vendors followed like lemmings. It comes a time when the |> mistakes of the past have to be admitted and corrected. These are fairly strong, and unnecessarily impolite words, that cast aspersions upon people with whom you may not agree, but may well have to deal with differing sets of requirements than yours. -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@sgi.com DDD: 415-390-3090 FAX: 415-967-8496 USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

Subject: Re: long longs in c From: mash@mash.engr.sgi.com (John R. Mashey) Date: Aug 22 1995 Newsgroups: comp.lang.c,comp.std.c In article <41cv0r$pej@newsbf02.news.aol.com>, ffarance@aol.com (FFarance) writes: |> nightmares. Thus, "long long" is attractive *now*, but will cause |> problems |> with 128-bit architectures. 32-bit machines started to arrive around 1978 |> and 64-bit machines around 1991 (my dates are approximate). 128-bit |> machines will become available around 2004. I agree with many of the comments before this, but we still have the problem upon us already. Again, I make no representation that long long, or whatever its called, is a panacea... Again, we are where we would have been if we hadn't had long, years ago, in doing the 16->32 transition. re: 128-bit machines available in 2004: let me explain why I seriously doubt that this is going to happen in any widespread way. You apparently got 13 = 1991-1978, then 2004 = 1991+13. a) 32-bit machines,. of course, got popular in in the 1960s with S/360s... however, the time between generations is more closely related to the number of bits of addressing, i.e., proportional to number of bits added. b) But in any case, using DRAM curves, and microprocessor history, and sizes of physical memory actually shipped by vendors (I've plotted all these things at one time or another, some is in the BYTE article I noted): 1) DRAM sizes get 4X larger every 3 years; this has been consistent for years; if anything it might slow down a little after the next generation, or maybe not. 2) 4X larger = 2 more bits of physical addressing. 3) Of course, virtual addressing can consume address bits faster. For a program actually using the data, a reasonable rule-of-thumb is that there are practical programs 4X larger than the physical memory that are still usable, i.e., whose reference patterns don't make them page too much. Hennessy disagrees with me some, claiming that memory-mapped file usage can burn virtual memory faster, and I somewhat agree, but I also think it takes a while for such techniques to become widely used. In any case, even this tends to be at least somewhat bound by the actual size of physical disks. 4) So, assuming that large microprocessor servers started hitting 4GB (32 bits) in 1994 (and that's a reasonable date: SGI sold some 4GB systems in 1994, and some 8GBers either at the end of 1994 or beginning of 1995. So, if I pick a date for 4GB, knowing there are always some bigger systems, it's 1994. 1994: 32 bits (4GB) 1997: 34 bits (16GB) 2000: 36 bits (64GB) 2003: 38 bits (256GB) .... 2042: 64 bits (16Billion GB) (hmmm, seems unlikely :-) On the other hand, my 4:1 rule claims that the virtual memory pressure is at least 2 bits ahead of the physical, or 3 years earlier, and then there's the increasing use of mapped files, and then allowing for somebody being more aggressive than the rest of the crowd .... and I come back to my 2020 estimate. 5) Note that IRIX already has a 64-bit file system and files; the largest single file we've seen is 370GB. Assuming disks somehow maintain their current progress of 2X every 2 years, and that 4GB 3.5" SCSI disks are around in force: Right now, a 64-bit file pointer can address all the data in 4Billion 4GB disks ... which not everyone can afford :-) by 2020, assuming straight-line progression, suppose we've gotten 13 doublings, you'd want to have a single disk of 32,000 GB (!), and now a 64-bit file pointer can only address 2**19 or, 512,000 of such disks, still likely to be adequate for most uses. 6) Finally, while the first 64-bit micro came out in 1991/1992, and the second in 1992, it is 1998 (?) before all of the major microprocessor families get there, and whereas there were at least some 64-bit systems many years ago in the supercomputer world, offering some ueful experience, I haven't noticed *any* 128-bit systems anywhere. 7) Bottom line: something very strange would have to happen to start seeing serious use of 128-bitters in 2004. While your comments on C have serious credibility ... I'd like to see some reasoning to justify 2004, because every bit of trend analysis I've done or seen says much later... |> From the perspective of WG14, we expect to complete C9X around 1999. It |> seems silly to solve the same problem 10 years from now. Also, the |> portability problems *greatly* increase with the use of "long long" even |> if you restrict yourself to 16-bit, 32-bit, and 64-bit architectures. But again, I've got some serious portability problems, right now, that don't seem to get solved except with an integral 64-bit type that is useful in 32-bit environments, and of course, must persist into 64's. While I used to care about non-power-of-2-sized architectures, that seems less of an issue that it was the old days. Oh well, it looks like we're doomed to a sad state of affairs over the next few years: a whole lot of people will write code with an extension that won't be part of the standard; the extension won't benefit from the standards process, but it will get used, perhaps with yet more flags (like -ANSI and -ANSI+longlong, i.e., useANSI, but don't flag long longs. Sigh.) -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@sgi.com DDD: 415-390-3090 FAX: 415-967-8496 USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

Subject: Re: long longs in c From: mash@mash.engr.sgi.com (John R. Mashey) Date: Aug 25 1995 Newsgroups: comp.lang.c,comp.std.c In article <41je47$20h@newsbf02.news.aol.com>, ffarance@aol.com (FFarance) writes: |> > From: Dan.Pop@mail.cern.ch (Dan Pop) |> > I must be missing something. Yes. Frank answered it well: |> Changing the size, alignment, effective precision, etc., of a "long" or |> any other data type will break binaries. You'll be forced to recompile |> and port everything. For example, your library routine uses a structure But I'd add a few more: a) People use shared-libraries; you need to double those to support both cases, since all the binaries a customer has don't magically disappear and get replaced when you ship a new system. b) Even more, ISVs, especially some rather important ones, create complex applications that do things like dyamically loading binaries, that may well have come from 3rd parties, and again, they don't all magically get recompiled at the same time. c) Strangely enough, not every program is self-contained; some read/write data to disk. If they ever wrote data structures containing longs to disk, and the compiler then decides that longs changed size, then even a simple, single program breaks. You can't just recompile it, you've got to go through a serious cleanup. [This, of course, is where "exact" descriptors are good things, since they'd be the same under any model.] DEC changed going from Ultrix to OSF/1 and this was sensible. I note they didn't change VMS... I make no claim that every vendor that makes a difficult transition will die, just that many who've made it hard to program have, and that even the survivors have have suffered. The "reasoned decision" comment was for 32-bit machines, i.e., why everybody followed ILP32 in the 1980s. You comment on ISVs ... I don't know which ones you talk to, but I talk to some pretty serious ones fairly often ... which is where I get the opinions. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@sgi.com DDD: 415-390-3090 FAX: 415-967-8496 USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c Subject: Re: long long -- part of what ANSI standard? Date: 11 Apr 1997 00:55:49 GMT In article <5ij0f6$b7b@solutions.solon.com>, seebs@solutions.solon.com (Peter Seebach) writes: |> Exactly... For this reason, the correct thing to do is use 64 bit |> longs if you need a 64 bit integer type. Then, all existing correct |> code remains correct. "long long" *breaks existing code*. (Because |> existing code has been given an iron clad guarantee by the standard |> that long is the largest type, and yes, real code breaks mysteriously |> when this is not true.) As noted elsewhere, every choice breaks some existing code. I posted more on this, but it is *not* a correct choice, on a system that has forever used ILP32 (integer, long, pointer = 32 bits), to change long to 64-bits; software vendors will definitely kill you. |> <inttypes.h> is also in the standard, and provides (on machines |> capable of the support) 8/16/32/64 bit integral types without breaking |> the type system. <inttypes.h> came from the 1991 work mentioned in an |> earlier posting. |> A real solution, one which lets the user specify integral sizes, would have |> been preferable. If you doubt this, wait a couple of years and see what |> monstrosities are invented as all of the vendors scramble to provide the |> 128-bit type, which will probably get called "long long long", except that |> some vendors will make it "long long", and some will spell it int128_t. Given that DRAM expands at 4X (2 bits)/3 years, and that virtual memory more-or-less expands ~ memory sizes, and that we're in middle of 32-64-bit transition now (call that 1992 start), and we just added ~ 32-bits, 32 bits / (2/3 bits/year) = 48 years + 1992 = 2040, *assuming* DRAM keeps growing at same rate. *Assuming* heavier use of memory-mapping/ sparse-file techniques, maybe it gets relevant by 2020. Of course, 4X/3 (or 2X / 1.5 years = Moore's Law) is guaranteed not to run forever, or even until 2040, so it may be that we do not see 128-bit (integer) processors, in any way like we saw 32, and now 64-bit CPUs. I wouldn't be surprised to see 128-bit floating-point sometime. ====== 16->32, 32->64:P we've done this 2X thing twice; the first one was relatively easy: Dennis just added long, well before the 16->32 move, and that was that. 32->64 has been more painful, for various reasons: - There aren't enough people around who went through the previous time. - It has more constraints that didn't exist 20 years ago, such as CPUs that run both sizes of code together. So: *maybe*, if we're lucky, it will go like this: - By 2000, every microprocessor family used in general-purpose systems will have at least 1 64-bit member delivered in systems. - By 2002, 32/64-bit portability will be as well-understood as 16/32-bit portability got to be ~1980 inside Bell Labs. - If not already in C, surely the scars will be fresh enough that people may adopt extensions that will cover 128-bit, and the difference between types-sizes that want to float and ones that do not (or at least, people will settle into well-accepted #ifdefs that achieve this result). Hopefully, this could actually be in the standard by 2010. If it's not, then the problem will be forgotten; everyone will assume that chips are 64-bit, and somebody (else) will get to do this again. In any case: the *right* solution, regardless of syntax, is that first-class-support for 128-bit ints will be in place in compilers for 64- and 32-bit CPUs 2-3 years before the first 128-bitter appears, and hopefully earlier. -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 415-933-3090 FAX: 415-967-8496 USPS: Silicon Graphics/Cray Research 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c Subject: Re: extended integers Date: 13 Jun 1997 22:27:54 GMT In article <5nfjad$15d@dfw-ixnews10.ix.netcom.com>, Douglas Gwyn <gwyn@ix.netcom.com> writes: |> I just circulated a proposal to formally allow implementations to use |> extended integers in their standard headers, which provides a way to |> resolve ptrdiff_t issues etc. as well as to sanction what the Committee |> consensus was with regard to Kwan's <inttypes.h>. This is not as good |> as defining a parameterized integer type along the lines of Frank |> Farance's proposal, but it should suffice for the time being. |> |> Also, "long long int" was adopted for C9x. These have to have at least |> 64 bits. This sounds like a rational and pragmatic outcome: standards work is often maddening & usually thankless, so, thanks to the Committee for doing something reasonable in the real world, even if, in a perfect world, things might have been different. This at least means that something sensible will happen to cover the processors that we'll have around for the next decade or so, leaving some time to figure out if something better need be done. -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 415-933-3090 FAX: 415-932-3090 [415=>650 August!] USPS: Silicon Graphics/Cray Research 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c Subject: Re: Dealing with long long Date: 20 Jun 1997 23:03:31 GMT In article <5oeq4m$de1$1@eskinews.eskimo.com>, scs@eskimo.com (Steve Summit) writes: |> In article <5o9pbq$abs$1@murrow.corp.sgi.com>, |> mash@mash.engr.sgi.com (John R. Mashey) wrote: |> some day; there's clearly a lot of good information in it. |> Apologies if the point I'm about to make was in there somewhere, |> and I missed it.] Thanx; no this wasn't covered, except indirectly under the category of "It's harder than it looks, and people looked at it hard 5 years ago..." |> I think it's worth pointing out one issue which is different |> today than back during the first, 16-to-32-bit crisis: function |> prototypes. These don't solve the binary I/O problem, or the They help, for sure, and I wish they had existed in C earlier ...but... |> It seems to me that preparing new header files, containing |> new prototypes for functions in precompiled object files and |> libraries, *is* a tractable problem. (But again, I don't |> claim that this approach solves all the problems.) ...unfortunately, this tends to break most things where the parameter passed is a *pointer* to an integer whose size is changed, or to a structure containing such an integer. Suppose, using Steve's example: code compiled as: int f(x) long int x; ==> got a new declaration: extern int f(int); had started, instead: int f(x) long int *x; It does not work to provide a prototype: extern int f(int *); That is, you'd have (64-bit, in this example) longs in your code, which worked fine, but when you passed a pointer to one, the receiving code would think it had a pointer to a 32-bit item. Little-endian & big-endian machines happen to differ in which of these you might accidentally get away with: Suppose we'd started with: int f(x) long int *x; { (*x)++; } and the calling code looked like: long int y = 0; f(&y); value of y: L.E.: 0x0000000000000001; B.E.: 0x0000000100000000 This appears to work on a Little-Endian machine, but not on a Big-Endian, which fast occasionally came up years ago in letting certain unportable code sneak by on Little-Endian systems. If the example were: long in y = -1; f(&y); value of y: 0xffffffff00000000; B.E.: 0x00000000ffffffff -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 415-933-3090 FAX: 415-932-3090 [415=>650 August!] USPS: Silicon Graphics/Cray Research 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c Subject: Re: Dealing with long long Date: 23 Jun 1997 02:03:02 GMT In article <danpop.866823129@news.cern.ch>, Dan.Pop@cern.ch (Dan Pop) writes: |> But why was long defined as a 32-bit type on 32-bit platforms in the first |> place? Without this blatant proof of short-sightedness, the first real |> "crisis" with the C integral types system would have occured at the |> transition from 64-bit platforms to 128-bit platforms. |> |> >The reason DEC could move to the "sane" 8/16/32/64 system was that they |> >*were* already requiring their customers and ISVs to recompile (and |> >port) as they *were* requiring their customers to move to a new system. |> |> What was preventing the other vendors from doing the same thing when they |> introduced their 32-bit systems??? At that time, their customers were |> in exactly the same position as DEC's customers when the Alpha was |> introduced. This is where Peter's "idiots" and "kooks" actually fit in. 1) With 20 years of hindsight, one might claim that more thought might have been given to this (and I don't think any of the relevant people are under the misimpression that C is perfect)...... but "blatant short-sightedness" and "idiots" and "kooks"??? Perhaps it isn't clear to the readership that such comments effectively target Dennis Ritchie, Steve Johnson, and their colleagues' decisions of 1974-1976... 2) "long" came about ~1975; it actually *was* 64-bits on the 32-bit XDS Sigma 5, but of course XDS got out of the computer business. So, how did "long" get to be 32-bits on 32-bit systems? 2) While it might have been nice to have anticipated the industry-wide use of C, and the issues to be faced 20 years' later with 64-bit micros, these thoughts were not paramount at a time in which: - There were a few hundred systems using C, mostly inside Bell Labs. - The most common machine using C was a 248KB PDP-11/45, ~.5 VAX-MIP system supporting 10-15 simultaneous users. - People still argued about the wisdom of using languages like C for systems code; there were numerous lower-level languages, and in fact, most systems code was still done in assembler in many places. - People sometimes hoped that some FORTAN & COBOL code would be somewhat portable across operating systems. - The key efforts were (making UNIX itself more portable and doing the Portable C compiler) (1977-1978). - For most people in computing, the idea that a system language like C would be portable ... was considered at best a research topic... - For years, it has been easy enough to buy books that tell you how to write really portable C code, and the standards efforts have supported this well, and many people have the relevant experience and understand msot of the issues. Such books were *not* available in 1977... 4) "long" came earlier, ~1975: it was desperately needed on PDP-11s because neither 16-bit file pointers not int[2]s were really very pleasant; in any case, despite use of C on 36-bit Honeywell 6000s, and a few IBM S/360s, and a few other systems, PDP-11 was overpoweringly the dominant C platform. (On XDS Sigma 5, long actually was 64-bit, since it was a 32-bit CPU, but for various reasons, 64-bit integers were relevant. Xerox got out of the computer business, however, and the BTL-project got cancelled, although it actually had some good side-effects on C.) 5) After VAXen (and 3Bs, important inside BTL) appeared, there was a period in which code was shifting from PDP-11s to 32-bit machines, but both were still important. The UNIX code base had plenty of longs, because PDP-11s needed them. 6) The VAX could have had 64-bit longs, but: - Code sequences would have been necessary, so there would have been unpleasant performance hits. - They would have been especially unpleasant, since a VAX 11/780 wasn't *that* much faster than a PDP-11/70, that is, there wasn't some huge leap of technology (the PDP-11 as high-end died prematurely due to lack of address space). - People didn't understand portability so well; there was still a lot of non-typedeffed code around. - So, more code would have broken, whereas with sizeof(long) == 4, lots of stuff just worked... 7) Hence, by the time there were "vendors" of UNIX systems who actually did compilers, there was a large body of code on the VAX & elsewhere that used long as 32-bit. That gave these vendors (often startups) a choice in the 1980-1984 timeframe: A: Start with portable C compiler, existing UNIX code, retarget it (as onto MC68000), leave the typesizes alone, expect that if the code works for other people, it will work for you, and get on with it. B: Realize that 10-15 years later, there would be 64-bit micros, where it might be better if sizeof(long) == 8, and that the conversion to 64-bit might be eased, if the company were still in business. Thus: 1. Do a lot more work on pcc. 2. Clean up all of the code in BSD or ATT UNIX. 3. Explain to every ISV that this was a purer approach, and it would be good for them in the long run, even if they had to do some cleanup themselves.... 4. Be prepared to clean up each new release of UNIX that you got. 5. Accept the fact that you'd be later to market, cost more, and have less software ... but be theoretically better. (Now, perhaps some companies made this decision B; although I don't know of any offhand; in any case, they appear not to have survived.) 9. Summary: all of this got this way a *long* time ago; vendors had very little choice; in fact, UNIX vendors who made sizeof(int) 2 suffered as well. I was managing a SYS III port to MC68K inside BTL in 1982, and we already had a 68K Blit compiler, but sizeof(int) == 2 broke a lot of code... Anyway, in retrospect, it would hae been nice if, in 1974-1975, some truly brilliant people had happened to anticipate a few more of these issues, but they didn't, because a lot of other things were more important at the time. Those who've followed had to live within the constraints of the past. -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 415-933-3090 FAX: 415-932-3090 [415=>650 August!] USPS: Silicon Graphics/Cray Research 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c Subject: Re: Dealing with long long Date: 24 Jun 1997 21:33:22 GMT In article <RCB2CECHRurzEwuG@on-the-train.demon.co.uk>, "Clive D.W. Feather" <clive@on-the-train.demon.co.uk> writes: |> In article <5olq6h$79j@news.inforamp.net>, Peter Curran |> <pcurran@xacm.org> writes |> |> >But the issue, to the best of my understanding, is that an |> >implementation can introduce "long longs" into code that does not |> >contain any, by defining some of the types required by the standard, |> >such as "size_t," as "long long." If this is not true, then my |> >concerns are greatly alleviate. |> |> Indeed, and they're not (alleviated). To further understand, maybe people can provide some more information, understanding that there is a difference between: 1. What a standard says, where there is often a fine line between a. Specifying insufficently, and causing implementations to diverge for no good reason. b. Specifying the minimum necessary. c. Overspecifying, and thereby disallowing many kinds of hardware implementations and 2. What people are actually doing, where the standard might have allowed something (that wouldn't work), but nobody is building those systems anyway, so it becomes a moot point. So, the question is: Does anybody know of a specific implementation, whose plans are public, in which size_t is (unsigned long long) and sizeof (unsigned long) != sizeof (unsigned long long) ? Example: in IRIX 6's, code compiled for 32- and 64-bit modes: int long long long ptr size_t 32 32 32 64 32 32 ILP32LL 64 32 64 64 64 64 LP64 I.e., above, size_t is the same size as a long, so the problem appears not to arise. <types.h> size_t doesn't use long long as a base. As far as I know, the main place that long longs get introduced into typical 32-bit systems is for off64_t's, i.e., to use the new LFS lseek64, tell64, ftruncate64, etc, i.e., if people want to access files >2GB, but do not need larger address space [i.e., identical to the original use of long on 16-bit PDP-11s]. -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 415-933-3090 FAX: 415-932-3090 [415=>650 August!] USPS: Silicon Graphics/Cray Research 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c,comp.lang.c Subject: Re: int32_t Date: 1 Apr 1999 01:29:45 GMT In article <7du4ni$e8e$1@eskinews.eskimo.com>, scs@eskimo.com (Steve Summit) writes: |> Organization: better late than never |> |> In article <37015131.32781085@news.pathcom.com>, pcurran@acm.gov writes: |> > HP, for one, said a year ago that 64-bit longs is their standard. |> |> Good for them! This is clearly the right choice, on a processor |> that supports a 64-bit type. There may be some confusion here, since the original statement was so imprecise, so just be absolutely clear, which is easily findable in public web pages: 1) HP/UX was ILP32, and then got to be ILP32LL64, I don't recall when. 2) By HP/UX 11.0, HP provided an entire 64-bit environment, and it is LP64 (= I32LLLP64 if you like), and 64-bit HP/UX 11.0s run either flavored binaries, which do not mix. 3) long is 32 bits in the 32-bit environment, as HP, like everybody else was not about to change that. I don't know offhand of *anyone* doing IP32L64 at this time. 4) long is 64-bits in the 64-bit environment, meaning that HP made the same choice as {DEC, SGI, IBM, Sun}. -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 650-933-3090 FAX: 650-933-4392 USPS: Silicon Graphics/Cray Research 40U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c,comp.lang.c Subject: Re: int32_t Date: 1 Apr 1999 07:15:51 GMT In article <3702F312.ACDF064D@null.net>, "Douglas A. Gwyn" <DAGwyn@null.net> writes: |> John Mashey is not on the committee, and I rather suspect what |> he meant to convey was that initial discussions occurred among |> interested parties via e-mail back then. He seems to think |> that the issue was settled by an entirely non-WG14 meeting |> among UNIX vendors (HP, Sun, SGI) several years ago. But WG14 |> has made its own decisions for its own reasons after considering |> arguments from other sources, including some that originated in |> netnews discussion groups such as this one. Yes, to make sure this is perfectly clear: 1) There was about 6 months of meetings that were NOT WG14, and had no official standing whatsoever, but was just a bunch of Fellows, Chief Scientists & such from major computer companies and software companies getting together physically and by email. This was occasioned by the fact that several vendors already had 64-bit CPUs, and more were designing some, and file size pressures were growing, and anybody who could plan a few years out could see that they'd need 64-bit pointers for some codes by the mid-1990s. Compiler teams industry-wide were planning 64-bit object file formats, compilers, tools, etc. 2) Hence, it was deemed a good idea to see if we could agree on a 64-bit model, and the preceding related technologies, to try to do something that was not irrationally anarchistic. 3) We were unable to get everybody to agree on any one of the 3 choices: ILP64, LP64, or P64, and there were several strong proponents of each. We did agree on inttypes.h (to help some problems), and we did agree that we needed a type that could be 64-bits without breaking existing 32-bit code (i.e., to convert ILP32LL), and that it might as well be long long (amongst the UNIX members anyway), given Amdahl and gcc. The P64 groups needed a long long (while ILP64 and LP64 didn't), but just about everybody wanted to extend their ILP32 to ILP32LL. Of course, it turned out that the UNIX crowd generally went LP64, at least partially because several of the early implementations were that. 4) There was concern about whether doing long long would cause later trouble with C9X (which of course, being years away, meant nobody could wait), and since there were several attendess who were WG14 members, they were asked for their opinions, which basically came down to: "If you need it, go ahead: it's better that people do something consistently, and while there are no promises about what will happen with the next iteration of the standard, if a big chunk of the industry does it as a common extension that will get serious consideration." This seemed both fair and modestly encouraging, so most of us did it. Had anyone been able to compelling propose something different, we would have done something different. 5) Hence, when I say it was mostly settled in 1992, I do not mean that anything officially was settled, I mean that a substantial part of the industry agreed to do something, because people couldn't wait any longer. The Committee, while having the clear authority to set the standard, is also rational in considering substantial experience already implemented by many large hardware and software vendors. maybe it was actually "settled" in 1983, when Amdahl starting doing this, or later, when it got into gcc. 6) Anyway, there was every possible consideration given to standards, but subject to the pressing demand that many players felt to have 64-bit integers added into their 32-bit environments. 7) In some sense this shouldn't have had to happen this way, but C89 didn't provide a type available to add 64-bit into an existing 32-bit environment, and the dictates of hardware progress demanded one. 8) Once again, I reiterate, that amongst myriads of propsoals and approaches, this was viewed as "the least bad". -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 650-933-3090 FAX: 650-933-4392 USPS: Silicon Graphics/Cray Research 40U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: "Douglas A. Gwyn" <DAGwyn@null.net> Newsgroups: comp.std.c,comp.lang.c Subject: Re: int32_t Date: Thu, 01 Apr 1999 22:49:06 GMT "John R. Mashey" wrote: > 7) In some sense this shouldn't have had to happen this way, but > C89 didn't provide a type available to add 64-bit into an existing 32-bit > environment, and the dictates of hardware progress demanded one. Yes; the origin of the problem seems to be that C89 lacked the foresight to accommodate such expansion. Something had to give. It turned out to be the promise that all standard typedefs of integer types could be contained in a (possibly unsigned) long.

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c Subject: Re: coding style advice for sizeof (long) < sizeof (size_t)? Date: 5 Apr 1999 22:18:44 GMT In article <t81zhzo186.fsf@discus.anu.edu.au>, Geoffrey KEATING <geoffk@discus.anu.edu.au> writes: |> This is why GNU/linux has off64_t. You can choose between |> |> a) off_t is 'long' (and there is no off64_t) |> b) off_t is 'long', off64_t is 64 bits |> c) off_t is 64 bits (and there is no off64_t) |> b) off_t is 64 bits, off64_t is 64 bits |> |> (depending on the system, the list above may have some overlap). |> |> Don't ask about 128-bit-length files... This is simply implementing |> someone's standard. 1) This is from the Large File Summit, and is widely implemented. The LFS spec is partially derived from SGI's IRIX 5.3 XFS implementation of 1994, and was driven by ISVs (like SAS) and a bunch of systems/OS vendors. 2) To relate the above to the models often discussed: off_t off64_t size_t ILP32 32 none 32 ILP32LL (most) 32 64 32 ILP32LL (SGI -n32) 64 64 32 LP64* 64 64 64 IL32LLP64 (P64) ** 32 64 32 ILP64 64 64 64 * = what most UNIX folks do, ** = 64-bit NT In SGI's case, there are actually 3 models: -o32 (the old 32-bit ILP32, later converted to ILP32LL ~1992) -n64 (LP64, ~1994) -n32 (ILP32LL, but implemented with 64-bit registers, more FP registers, hence impossible to make binary-compatible with the others, and people decided to go ahead and make off_t bigger so that more programs would recompile and automatically get to big files without having to go to LP64. Neither -n64 nor -n32 is a "replacement" for -o32, in that many programs are still -o32, and there is still a full set of -o32 libraries. This brings me to something I still get questions on: WHY PEOPLE DON'T JUST *CHANGE* LONG TO BE 64 BITS IN 32-BIT ENVIRONMENT: In SGI's case (typical) of commerical UNIXes), if you look at a customer's system: a) UNIXkernel b) About 60 SGI-provided dynamic-linked libraries for *each* of the 3 programming models. Of course, once you ship a dynlinked library, you can never take it away, although you can replace it with a 100% upward-compatible superset. [Not everybody seems to understand this ... but customers get seriously irate when a new OS+librarires arrives, and their existing apps suddenly break because a new dynlinked library isn't quite consistent. The rules say that if you change an interface, you use versioning to make sure the old and new versions are distinguishable, and old programs get the right version.] c) Anywhere from a handful to hundreds of dynlinked libraries from ISVs, either for use only in their own app suites, or for others to use. On NT, there are probably more (DLLs, that is, in 5 minutes I found several hundred on my wife's NT system.] d) Executable apps, some with static-linked libraries, but most with dynlinked ones from b) and c), which come from SGI, multiple ISVs, and sometimes the end-customer. A typical user of ISV apps probably has 10-20 apiece, but of course, they are a different 10-20 apiece. e) Think of a giant directed graph, consisting of thousands of apps (at the top), hundreds of libraries, and a customer, to have a working system, picks a handful at the top, and all of the connected pieces undeneath have to be consistent. People have proposed using thunks to fix inconsistencies; that doesn't work, especially in the kind of intermingled dynlinked library setups now found out there. For each programming model: 1) SGI made the kernel support it. 2) SGI provided compilers, system librarries, tools. 3) ISV library vnedors recompiled their libraries. 4) ISV apps vendors recompiled their libraries, and then their apps. There are thousands of applications, many of which are clean code, portable enough to compile under any of the models. Nevertheless, binary compatibility constraints forbid flash-cut replacement of one model by another. Commercial customers expect: 1) A new OS+library setup arrives on new hardware. One can install compatible apps and then work. 2) Later, additional apps arrive (ISV schedules differ), and those apps can be installed and work, including data interchange with the existing apps. Major ISVs often have a 1-3-year major release cycle, and if you miss a release cycle, you wait until the next one. 3) Later yet, another new OS+library setup arrives, and it is installed on the old hardware, and people expect the existing apps to continue to work. A vendor who does the following probably gets to go out of business: "There's a new programming model in which long has changed from 32 to 64-bits; to make sure you switch, we've converted to this from our old way, and removed all of the old libraries. Recompile all your code, and rewrite your code to cope with binary data on disk or tape that was written with longs. Here is a list of ISV suppliers who already have converted their code. Please talk to anyone not on this list and urge them to move up their release schedules, since their apps are broken on your system until they do. Also, allow in your budget upgrade fees that you hadn't anticipiated." I have had (unnamed) people recommend this approach to me, in the effort to fight off long long... :-) 4) Hence, in practice, sane vendors *keep* the old models around (essentially forever), and add new models alongside, but don't expect to intermix binaries. Most ISVs strongly prefer to provide exactly 1 port of most of their software on any given platform, for rational economic reasons. If there is a good reason to supply two, they might. [For example, Oracle supplies 32-bit apps, and both 32- and 64-bit server]. If there are 2-3 alternate forms, they naturally pick whichever one costs them the least to supply, and covers the most machines, and switching to a new model must have some clear benefit; for example, programs that *really* want to be 64-bit are motivated to convert ... but as DEC found out early in Alpha's life, there were lots of apps that didn't need to be 64-bit, and it cost a lot to get the apps moved. 5) On true 32-bit CPUs, converting ILP32 to ILP32LL is upward-compatible, no slower, occasionally faster, and the binaries intermix with no trouble, hence most vendors did this. Simply replacing ILP32 with IP32L64 is not binary-compatible, is never faster, and often slower, and so people generally did not do this, starting with VAX UNIX. 6) On 64-bit CPUs, one can have: 6a) ILP32LL binary-compatible with earlier 32-bit family members. This is the overpoweringly popular choice, especially as augmented with the LFS off64_t, etc stuff, which lets people get to big files in a more gradual fashion. 6b) ILP32LL model, but using 64-bit registers, so that LL is fast. This is popular in the games/embedded arena, where source compatibility with 32-bitters is important, but binary compatibility is less so, and the extra performance is worthwhile. 6c) IP32L64: so far, not a choice that people generally make, because it breaks more programs than ILP32LL, and doesn't have big pointers, and is usually slower than ILP32LL. 6d) P64 or LP64: for programs that care about large data, this is a major plus, so people do use it, for those programs where it matters, and they happen to be important programs, albeit relatively few in number. 7) In summary: source code may be very clean, paranoidly portable, good, etc ... but these days, lots of code doesn't exist in a vacuum, but depends on *binary* interfaces with huge chunks of other code from multiple sources. While many data declarations are of interest only inside the programs that use them, others manifest themselves outside, onto disk, tape, or in binary interfaces, and *everybody's* idea of the sizes of such data objects needs to change together, or very carefully. -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 650-933-3090 FAX: 650-933-4392 USPS: Silicon Graphics/Cray Research 40U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c Subject: Re: coding style advice for sizeof (long) < sizeof (size_t)? Date: 6 Apr 1999 17:04:50 GMT In article <7eca2t$quu$1@shade.twinsun.com>, eggert@twinsun.com (Paul Eggert) writes: |> mash@mash.engr.sgi.com (John R. Mashey) writes: |> |> off_t off64_t size_t |> ILP32 32 none 32 |> ILP32LL (most) 32 64 32 |> ILP32LL (SGI -n32) 64 64 32 |> |> LP64* 64 64 64 |> IL32LLP64 (P64) ** 32 64 32 |> ILP64 64 64 64 |> |> * = what most UNIX folks do, ** = 64-bit NT |> |> Is this table correct? If so, then 64-bit NT |> does not have the problem of long being shorter than size_t. Sorry, I tried to get too many things into this table, editing the NT thing in later. Let me expand & correct it: off_t off64_t size_t 1) IL32LLP64 (P64) UNIX 32 64 32 2) IL32LLP64 (P64) UNIX 32 64 64 3) IL32LLP64 (P64) NT 32 64 64(sort of, ?) 1) In the first choice, the rationale would have been: a) Of the existing basic datatypes, only pointer changes size. b) Since most data structures on disk/exchanged between programs don't have pointers, such structures would stay the same size. c) Most object sizes fit into 32 bits anyway. d) Keeping most data small gains efficiency. e) yes, it is peculiar that ptr_diff must be bigger than size_t. 2) In the second choice, the rationale would have been: a) As above. b) As above, except size_t's also change size, but there aren't that many for them, and this avoids the surpise of e), and leaves more room for the future, as there get to be more larger objects. I say "would have been", because AFAIK, none of the major UNIXes went this way, although 1-2 vendors argued strenuously for it (although I don't recall whether they intended to do 1) or 2), but in any case, everybody went LP64 anyway, for consistency. 3) Regarding Microsoft, as I posted earlier, as far as I can tell, you aren't supposed to use size_t, you're supposed to use SIZE_T, which is of Pointer precision, and therefore 64-bits, and in some sense, doesn't belong in the table at all. So, the bottom line, I think is: 1) The UNIX folks generally went dual-model ILP32LL + (on 64-bit chips) LP64, in both cases size_t has the same size as long. 2) Microsoft made SIZE_T bigger than long. -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 650-933-3090 FAX: 650-933-4392 USPS: Silicon Graphics/Cray Research 40U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c,comp.lang.c Subject: Re: int32_t Date: 31 Mar 1999 01:10:51 GMT In article <37014A2C.BB88326E@technologist.com>, David R Tribble <dtribble@technologist.com> writes: |> If you don't count Microsoft Windows running on 64-bit platforms. |> As I understand it, they've decided that 'long' is still 32 bits |> wide on 64-bit platforms. (I suppose they'll call it Win64 instead |> of Win32.) If that's so, and it's possible to have more than 4 GB |> of virtual memory on your desktop (in the near future anyway), then, |> yes, it will be a problem for some mainstream platforms. Clarification: At the Intel Hardware Developer Conference a month ago, Microsoft went through the same stuff the've been telling developers for a while, so here's what they say (extracted from large numbers of foils): 1) They are indeed using "LLP64", and their goals include: - "Porting from win32 to win64 should be simple. - Supporting win64 and win32 with a single source base is our goal. - No new programming models. - Require minimal change to existing win32 code data models." 2) They basically recommend the same thing as people often do elsewhere, which is to use typedefs layered on top of the basic types, and absolutely avoid the basic types in most code. They say: 3) "Win64 sample types: Name What it is LONG32, INT32 32-bit signed LONG64, INT64 64-bit signed ULONG32, UINT32, DWORD32 32-bit unsigned ULONG32, UINT32, DWORD64 64-bit unsigned INT_PTR, LONG_PTR Signed Int of Pointer Precision UINT_PTR, ULONG_PTR, DWORD_PTR Unsigned Int of Pointer Precision SIZE_T Unsigned count of Pointer Precision SSIZE_T Signed count of Pointer Precision 4) Win64 Data Model Rules - If you need an integral pointer type, use UINT_PTR, ULONG_PTR, or DWORD_PTR. Do not assume that DWORD, LONG or ULONG can hold a pointer. - Use SIZE_T to specify byte counts that span the range of a pointer. - Make no assumptions about the length of a pointer or xxxx_PTR, or xSIZE_T. Just assume these are all compatible precision." 5) 64-bit integers map to __int64. 6) I make no value judgements on this, i.e., it is just posted to make sure the facts are clear about what they are doing. Observe, of course, that: a) Microsoft has zero interest in non-power-of-two-bits words, ever, and hence is happy to embed 32s and 64s into typenames. b) Microsoft has zero interest in code developed on NT being portable to non-MS environments ... although it actually happens, that if you follow their advice about types, and *never* use int, long, etc directly, you can get code whose non-OS-dependent pieces might port easier amongst other 32- and 64-bit OSs, i.e., because they have attempted to remove the overloaded assumptions sizes by having more types that people actually use. Hence the following odd effect occurs: Win64 code may be more portable amongst 32&64-bit UNIXes, than is slopper old UNIX code... :-) c) This is the way Win64 is, regardless of anything in C9X. -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 650-933-3090 FAX: 650-933-4392 USPS: Silicon Graphics/Cray Research 40U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c,comp.lang.c Subject: Re: coding style advice for sizeof (long) < sizeof (size_t)? Date: 22 Apr 1999 20:19:34 GMT In article <7fno4u$fje@agora.dmz.khoral.com>, "Richard Krehbiel" <rich@kastle.com> writes: |> (http://msdn.microsoft.com/developer/news/feature/win64/64bitwin.htm) and I |> saw no mention of size_t anywhere, only SIZE_T (which is new for Win64). |> |> Sounds like you're probably right, and I'm probably wrong. |> |> It means malloc can't create objects larger than 4G. I suppose that means |> Win64 programmers will begin migrating to VirtualAlloc and further away from |> Standard C - and I suspect MS sees that as A Good Thing. Microsoft presentation at Intel Developer Forum includes foil that says (look carefully at the last line): Pointer/Length Issues - Many APIs accept a pointer to data and the length of the data - In almost all cases, 4GB is more than enough to describe the length of the data. - In very rare cases, >4GB of length is needed. - We classify these as Normal objects, or Large objects Another slide says (I think I posted this before): - Use SIZE_T to specify byte counts that span the range of a pointer - Make no assumptions about the length of a pointer or xxxx_PTR or xSIZE_t. Just assume these are all compatible precision.' And yet another slide syas: - Supporting win64 and win32 with a single source base is our goal I think Microsoft is giving very clear advice: - Forget you ever heard of size_t, use SIZE_t (or SSIZE_T) in both win32 and win64 code. -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 650-933-3090 FAX: 650-933-4392 USPS: Silicon Graphics/Cray Research 40U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.std.c,comp.lang.c Subject: Re: coding style advice for sizeof (long) < sizeof (size_t)? Date: 23 Apr 1999 03:08:12 GMT In article <371FBBF3.1D1889A5@jps.net>, Dennis Yelle <dennis51@jps.net> writes: |> "John R. Mashey" wrote: |> > I think Microsoft is giving very clear advice: |> > - Forget you ever heard of size_t, use SIZE_t (or SSIZE_T) in |> > both win32 and win64 code. |> |> I don't read it that way. |> I think they are saying: |> |> Use size_t for Normal objects, and SIZE_T for Large objects. |> |> If you are correct, why did they introduce the terminology |> "Large objects" ? Beats me .. it's just that in the info I looked at, I couldn't find any mention of size_t where one might have expected it, and their directions seemed very explicit to prefer SIZE_T ... which would let them have SIZE_T to act like the long of ILP32LL and of LP64, but while using P64 (= IL32LLP64, to be explicit). -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 650-933-3090 FAX: 650-933-4392 USPS: Silicon Graphics/Cray Research 40U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389