This morning I got an email about my blog post discussing the history of deep learning which rattled me back into a time of my academic career which I rather not think about. It was a low point which nearly ended my Master studies at the University of Lugano, and it made me feel so bad about blogging that I took two long years to recover. So what has happened?

When I started my masters, I worked on blog posts for NVIDIA which featured introductions into deep learning. Part of this blog post series also discusses the history of deep learning. I hence discussed what I thought to be the historical milestones with the largest impact but in doing so, I inadvertently assigned credit to researchers that I thought had a good impact on the field. I worked on this blog post and circulated it in my deep learning class’s forums to the dismay of my then advisor who holds the opposite view of mine.

To evaluate the credit that a research idea deserves, I believe that it is not only important who has the first idea, but I also believe that it is equally important to actually make it work (the implementation). My ex-advisor believed that it only really matters who was the first who published the idea.

My advisor scolded me in class for my views since he felt very strongly that the first idea counts and that my view is plain wrong. To redeem myself and to salvage the relationship with him, I felt coerced to change my blog post to his wishes.

This quasi-censorship of my blog post eviscerated me, and in consequence, I lost all desire to blog for two years. Despite my efforts, the relationship with my then advisor deteriorated further, and I had to look for a new advisor.

Looking back at the blog post that I produced, I feel ashamed. It does not express my personal views. I value integrity, and my behavior did not reflect who I want to be.

I write this blog post to discuss my true beliefs about credit assignment and why I believe that the idea, its communication and its implementation are all equally important.

Who Deserves Credit for Deep Learning Ideas?

There has been a lot of discussion about how to assign credit to researchers, or in other words, how to determine whose work had a large impact. Note that I do not discuss here who deserves credit for discovering an idea, I look at who deserves credit for the impact that an idea has. Looking at this, there are two main camps: The first believes that ideas and implementation count equally, and, the second believes that it counts who had the ideas first.

The problem with this discussion is that it is not a scientific topic, but a philosophical one. How do we determine what has how much value? We use the scientific method. What is the scientific method in philosophy? Use reductions to arrive at simple statements, then use logic to derive other factual statements, failing that — like in this case — we make thought experiments where we isolate variables which we then take to extremes. Let’s do this now to get insight into the issue.

All Ideas, No Communication, No Implementation

Let’s imagine there exists a person that has come up with all ideas in deep learning of the past and all ideas in deep learning of the future. However, this person cannot communicate with either words or writing. This person also cannot write code. How much credit deserves such a person?

I would argue that such a person deserves zero credit. In fact, I think it is epistemologically correct that this person deserves no credit because nobody can know that he or she deserves credit.

All Ideas, 1 Communication + No Ideas, Full Communication

We have a Person 1 that invented everything in deep learning. Now this person can communicate, but he or she is so unclear that only a single Person 2 can understand these ideas.

Now, Person 2 has no creativity but is a perfect communicator. Person 2 basically just translates what Person 1 said and the entire world understands. Who deserves credit here?

It is tempting to think that Person 2 deserves all the credit because Person 1 is useless without Person 2. But similarly, Person 2 is useless without Person 1.

Both people thus deserve equal credit — no one can achieve anything without the other.

All Ideas, Full Communication, 1 Implementation

Let’s increase the complexity of the problem. Let us say the duo of Person 1 and Person 2 spread the ideas so that the entire world understands deep learning, but let us assume that all people are implementation agnostic. Nobody can make deep learning work. The world knows about all deep learning ideas but cannot solve any problem with it. In such a world, the ideas of deep learning are quickly abandoned by the large majority due to their uselessness (just like the majority of the population does not care much about pure mathematics, e.g., few care if an + bn = cn is true for all integer n >2).

Enter Person 3. Person 3 has no creativity, cannot communicate, but he or she can implement all the deep learning ideas in a practical manner. The world looks at this person’s code and suddenly is able to solve all problems which are solvable with deep learning.

Who deserves the most credit: Person 1, Person 2, or Person 3?

As discussed before, Person 1 and Person 2 deserve equal credit, and also here, I would argue, that Person 3 deserves equal credit.

This becomes apparent when we think about the value of ideas. Ideas are useful when they have an affect. If they have no or only a small effect they just deserve no recognition or little recognition. If deep learning ideas have no practical value then they would not deserve more recognition than, say, the idea that there might be something beyond the observable universe — it is a nice idea, but it will never produce anything of much value.

Comparative Individual Value For Collective Contributions

The evaluation changes if we distribute the contributions of ideas, communication, and implementation among many individuals. If we can take the three scenarios above, expand Person 1-3 into groups of people and subject them to comparative evaluation, that is, how much value the contributions of each individual has compared to all the other people have we arrive at the following thought experiment.

1 Ideas, 1000 Communication, 1000 Implementation

We have 1 person who has all the ideas, 1000 people who can understand these ideas and communicate them to the world, and 1000 people who can implement them to yield practical value, then how do we assign credit?

As discussed it is reasonable that each of the areas, (1) ideas, (2) communication, (3) implementation deserve equal credit. If now the groups of 1000 people made contributions (communications and implementations) of equal value, it would be fair to say that:

1 Ideas: 1/3 credit

1000 Communication: 1/3000 credit each

1000 Implementation: 1/3000 credit each.

We see in this case the one person with the idea should receive the largest amount of credit.

Similarly, if we weight the numbers differently, and if we assume contributions of individuals in groups are equal, then this credit assignment holds for all other combinations like (1000, 1, 1000), or (10000, 1000, 1).

Timing and Relational Effects

In the real world, we have timing effects and relational effects. Not all 1000 Ideas, Communication, or Implementation people will publish their work at the same time, but they will have a specific sequence. In this sequence, they will influence and build on each other — they stand on the shoulders of giants. Who are the giants? Who deserves what amount of credit?

If we think about it, it is not much different than our first analysis. Lets take Person 1 that only has ideas and can communicate his or her ideas to only one other Person 2; Person 2, standing on Person 1’s shoulders, is only able to communicate the ideas to another person Person 3; Person 3, standing on Person 2’s shoulders, in turn, can communicate the ideas clearly to the entire world.

If we express the ability of people as numbers which represent the fraction of all value ideas, communication, and implementation we could weight Person 1, Person 2, and Person 3 in this way:

Person1: [1, 1/10^10, 0]

Person2: [0, 1/10^10, 0]

Person3: [0, 1, 0]

Which means that Person 1, has all the ideas (1), could communicate these ideas to 1 person (we assume a total population of 10 billion people to make the math easier). Person 2 has no ideas, could understand Person 1’s idea but could only communicate this idea to one other person, Person 3. Person 3 has no ideas, understands the idea of Person 2 and can communicate it so that everybody understands. Note that this example is simplified so that all people are implementation agnostic.

From these fractions, we see that Person 2 has almost no fraction of contributions since Person 2 is not creative and also not a good communicator. However, if we look at the relational effects we know Person 3 would have no value without Person 2, and Person 1 would also have no value without Person 2. So how do we solve this credit assignment problem?

We can try to solve this problem by expressing it as a weighted graph which expressed relationships over time and the relationships of the fractions with respect to the world.

How we weight the contribution of each person in this case? There are many answers to this, but here PageRank would be a good fit. PageRank works exactly as we discussed above, the credit is assigned comparatively, that is if we have a (1, 1000, 1000) distribution, the largest chunk of PageRank will be distributed by the single person. Thus it reflects our evaluation system. PageRank also takes into account the relationships between nodes and their recursive weight (standing on the shoulders of giants).

Using the scenario above, we find the contributions as follows:

Name PageRank Relative Contribution P2 0.3450 0.4319 P1 0.2697 0.3376 P3 0.1841 0.2305

We see that P2 has the largest contribution despite being only the bridge between P1 and P3 who have the largest fractions (all the ideas and full communication abilities). However, P1’s success depends on P2, and P3’s success depends on P2 and as such P2 is the most critical link in the entire system.

This is quite insightful. If you understand some obscure research and communicate this to just a few researchers who, in turn, influence many other researchers then you will have made a substantial contribution to the deep learning community.

It would not feel this way because you will probably not experience any fame or recognition here. The recognition will come for P1 (having ideas) and P3 (communicating ideas). But still, the numbers do not lie here.

This experiment was quite interesting, and if you want to experiment a bit by yourself, you can download the code to see what happens if you add more people and more relationships among these people. This exercise can give quite some insight into what is valuable for research.

Response to Criticism on Reddit

There has been some sharp criticism on Reddit concerning ideas expressed in this blog post. The user metacurse makes the point that in science we credit usually those researchers who had the idea first and that communication and implementation are not valued. For example we value Albert Einstein more highly for the discovery of general relativity and the photoelectric effect and not its communication by Neil deGrasse Tyson; similarly, Cocks is credited for RSA even though he never implemented it in any way that was widely used (and he could not produce public implementations due to the classified status of RSA). However, this entire argument is rather weak and unfair:

I do not discuss who should be credited for an idea or the usage of the idea, I discuss who should be credited for the overall impact of an idea. These are very different questions.

He uses examples to try to prove his own hypothesis when we know that examples cannot prove anything (he uses classical philosophic techniques, which has some value, but it does not generate any reliable knowledge like analytical philosophy does). He mocks me for not using examples myself.

He appeals to the emotion of the readers, by saying that my views endorse unethical ideas like “stealing olds ideas and rebranding them as your own” when it has nothing to do with my argument (reductio ad Hitlerum). He does this quite successfully swaying many emotional readers. I do not think this is helpful.

To make a sharper contrast why metacurse’s argument is not relevant to mine take this thought experiment.

We have a super genius who knows about all possible ideas and writes them down so that everybody can understand it easily. Then she locks these notes away in a locker and dies the next second. Over the next billions of years humanity rediscovers all ideas and uses them to build a flourishing society where all living things live in harmony and every being is fulfilled and so forth. One second before the last human dies in heat death, that human discovers the notebook.

Metacurse’s argument would look for the answer to the question: Should our super genius be credited for inventing everything? Metacurse would argue, yes, and I would totally agree.

What I discuss in this blog post: How much impact did our super genius have on the overall impact of all ideas? Very little, she never had any direct or even indirect effect with any of the ideas; the only impact she had was that one other person understood that she had the ideas before others had them. That is the total impact of her ideas. Her impact is almost zero.

Conclusion

Here I discussed how it is best to think about contributions in deep learning. From thought experiments, we could see that ideas, their communication, and their implementation are equally important contributions.

We also discussed how timing effects and dependencies could be modeled in a relational graph. We found that people that link ideas to communicators can make substantial contributions to the research community even if they themselves are not creative or good communicators. Creating the links between influential ideas and influential communicators (or people that implement) are important here.