In reply to this post by Jeff King

>

> However, I'm not 100% convinced leaving generation numbers out was a

> mistake. The git philosophy seems always to have been to keep the

> minimal required information in the DAG.



Yes.



And until I saw the patches trying to add generation numbers, I didn't

really try to push adding generation numbers to commits (although it

actually came up as early as July 2005, so the "let's use generation

numbers in commits" thing is *really* old).



In other words, I do agree that we should strive for minimal required

information.



But dammit, if you start using generation numbers, then they *are*

required information. The fact that you then hide them in some

unarchitected random file doesn't change anything! It just makes it

ugly and random, for chrissake!



I really don't understand your logic that says that the cache is

somehow cleaner. It's a random hack! It's saying "we don't have it in

the main data structure, so let's add it to some other one instead,

and now we have a consistency and cache generation problem instead".



Just look at the size of the patches in question. Your caching patches

are bigger and more complicated. Sure, part of it is that your series

adds the code to _use_ the generation number, but look purely at the

code to maintain them.



Why do you think the odd separate cache is somehow better than just

doing it right? Seriously? If we require the generation numbers, then

they have *become* that minimal information that we should save!



And I think that has served us

> well, because we're not saddled with cruft that seemed like a good idea

> early on, but isn't.



Again - we discussed adding generation numbers about 6 years ago. We

clearly *should* have done it. Instead, we went with the hacky "let's

use commit time", that everybody really knew was technically wrong,

and was a hack, but avoided the need.



Now, six years later, you clearly are saying that we need the

generation numbers, but then you go off and try to say that they

should be in some secondary non-architected random collection of data

structures that isn't covered by the security and maintenance

guarantees that the core git objects are.



Dammit, one of the things that makes git special is that the data

structures are NOT random odd ad-hoc files. There is a design to them.



> Generation numbers are _completely_ redundant with the actual structure

> of history represented by the parent pointers.



Not true. That's only true if you add ".. if you parse the whole

history" to that statement.



And we've *never* parsed the whole history, because it's just too

expensive and doesn't scale. So right now we depend on commit dates

with a few hacks.



So no, generation numbers are not at all redundant. They are

fundamental. It's why we had this discussion six years ago.



> And so that seems a bit hack-ish to me.



Um? If you feel that way, then why the hell are you pushing your EVEN

MORE HACKISH CACHE PATCHES?



That's what this really boils down to. I think that if we have a value

that we need, then it should be recorded. In the data structures. Not

in some random other location that isn't part of the real git data

structures.



We don't do caches in git, because we don't NEED to. Sure, gitk has

it's hacky cache, but that's not core functionality.



I think it's a sign of good design that we can do a "find .git" and

explain every single file, and show that it's all core functionality

(again, with the exception of "gitk.cache", and I suspect that's

because gitk is a script, not because of any really fundamental data

issues), and explain it.



I think the *cache* is a hell of a lot more hacky than just doing it right.



> I liken it somewhat to the "don't store renames" debate.



That's total and utter bullshit.



Storing renames is *wrong*. I've explained a million times why it's

wrong. Doing it is a disaster. I know. I've used systems that did it.

It's crap. It's fundamentally information that is actively misleading

and WRONG. It's not even that you can do rename detection at run-time,

it's that you *HAVE* to do rename detection at run-time, because doing

it at commit time is simply utterly and fundamentally *wrong*.



Just look at "git blame -C" to remind yourself why rename information is wrong.



But even more importantly, look at git merges. Look at how git has

gotten merging right since pretty much day #1, and has absolutely no

issues with files that got generated two different ways. Look at every

SCM that tries to do rename detection, and look at how THEY CANNOT DO

MERGES RIGHT.



It's that simple. Rename detection is not about avoiding "redundant

data". It's about doing the right thing.



Linus

--

To unsubscribe from this list: send the line "unsubscribe git" in

the body of a message to

More majordomo info at

On Thu, Jul 14, 2011 at 1:31 PM, Jeff King < [hidden email] > wrote:> However, I'm not 100% convinced leaving generation numbers out was a> mistake. The git philosophy seems always to have been to keep the> minimal required information in the DAG.Yes.And until I saw the patches trying to add generation numbers, I didn'treally try to push adding generation numbers to commits (although itactually came up as early as July 2005, so the "let's use generationnumbers in commits" thing is *really* old).In other words, I do agree that we should strive for minimal requiredinformation.But dammit, if you start using generation numbers, then they *are*required information. The fact that you then hide them in someunarchitected random file doesn't change anything! It just makes itugly and random, for chrissake!I really don't understand your logic that says that the cache issomehow cleaner. It's a random hack! It's saying "we don't have it inthe main data structure, so let's add it to some other one instead,and now we have a consistency and cache generation problem instead".Just look at the size of the patches in question. Your caching patchesare bigger and more complicated. Sure, part of it is that your seriesadds the code to _use_ the generation number, but look purely at thecode to maintain them.Why do you think the odd separate cache is somehow better than justdoing it right? Seriously? If we require the generation numbers, thenthey have *become* that minimal information that we should save!And I think that has served us> well, because we're not saddled with cruft that seemed like a good idea> early on, but isn't.Again - we discussed adding generation numbers about 6 years ago. Weclearly *should* have done it. Instead, we went with the hacky "let'suse commit time", that everybody really knew was technically wrong,and was a hack, but avoided the need.Now, six years later, you clearly are saying that we need thegeneration numbers, but then you go off and try to say that theyshould be in some secondary non-architected random collection of datastructures that isn't covered by the security and maintenanceguarantees that the core git objects are.Dammit, one of the things that makes git special is that the datastructures are NOT random odd ad-hoc files. There is a design to them.> Generation numbers are _completely_ redundant with the actual structure> of history represented by the parent pointers.Not true. That's only true if you add ".. if you parse the wholehistory" to that statement.And we've *never* parsed the whole history, because it's just tooexpensive and doesn't scale. So right now we depend on commit dateswith a few hacks.So no, generation numbers are not at all redundant. They arefundamental. It's why we had this discussion six years ago.> And so that seems a bit hack-ish to me.Um? If you feel that way, then why the hell are you pushing your EVENMORE HACKISH CACHE PATCHES?That's what this really boils down to. I think that if we have a valuethat we need, then it should be recorded. In the data structures. Notin some random other location that isn't part of the real git datastructures.We don't do caches in git, because we don't NEED to. Sure, gitk hasit's hacky cache, but that's not core functionality.I think it's a sign of good design that we can do a "find .git" andexplain every single file, and show that it's all core functionality(again, with the exception of "gitk.cache", and I suspect that'sbecause gitk is a script, not because of any really fundamental dataissues), and explain it.I think the *cache* is a hell of a lot more hacky than just doing it right.> I liken it somewhat to the "don't store renames" debate.That's total and utter bullshit.Storing renames is *wrong*. I've explained a million times why it'swrong. Doing it is a disaster. I know. I've used systems that did it.It's crap. It's fundamentally information that is actively misleadingand WRONG. It's not even that you can do rename detection at run-time,it's that you *HAVE* to do rename detection at run-time, because doingit at commit time is simply utterly and fundamentally *wrong*.Just look at "git blame -C" to remind yourself why rename information is wrong.But even more importantly, look at git merges. Look at how git hasgotten merging right since pretty much day #1, and has absolutely noissues with files that got generated two different ways. Look at everySCM that tries to do rename detection, and look at how THEY CANNOT DOMERGES RIGHT.It's that simple. Rename detection is not about avoiding "redundantdata". It's about doing the right thing.Linus--To unsubscribe from this list: send the line "unsubscribe git" inthe body of a message to [hidden email] More majordomo info at http://vger.kernel.org/majordomo-info.html