I saw that Fast.ai is shifting on PyTorch, I saw that PyTorch is utmost favourable for research prototyping. So, I decided to implement some research paper in PyTorch. I have already worked on C-DSSM model at Parallel Dots. But there my implementation was in Keras. I will emphasize on the hacker perspective, of porting the code from Keras to PyTorch, than the research perspective in the blog here.

My implementation is at nishnik/Deep-Semantic-Similarity-Model-PyTorch, I have documented the code too.

More about C-DSSM model here.

C-DSSM model takes multiple input. One is the query, positive doc and negative docs. In Keras you can pass a list:

Model(inputs = [query, pos_doc] + neg_docs, outputs = prob)

# where query and pos_doc is the numpy array and neg_docs is a list of numpy array

In PyTorch you can do:

def forward(self, q, pos, negs):

# this is in the class definition itself, you can easily access negs[0] for the 0th element of list. I was surprised to find that it works with list

# Another way would have been:

# def forward(self, x):

# q = x[0]

# pos = x[1]

# negs = x[2:]

2. The first layer in C-DSSM model is the Conv1d layer. So, I compared conv1d docs of both Keras and PyTorch. In Keras:

keras.layers.convolutional.Conv1D(filters, kernel_size, strides=1, padding='valid', dilation_rate=1, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

In PyTorch:

class torch.nn.Conv1d (in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

I had to specify:

kernel_size: the same as Keras kernel_size out_channels: the same as Keras filters in_channels: number of channels in input

I was confused about in_channels, I had visualized an input of 5*90000, where the kernels will stride in a row and give an output of 5*300 . But I was wrong

PyTorch has dynamic graph, and this is why it is awesome for programmers, you can see what is happening at the time of graph creation.

So I had input:

lq

( 0 ,.,.) =

1.8577e-01 8.3356e-01 8.5541e-01 … 1.1579e-01 6.4221e-01 6.4712e-01

8.3658e-01 1.4647e-01 2.0220e-01 … 2.2165e-01 2.1841e-01 3.0833e-01

7.1619e-01 1.8811e-01 1.6903e-01 … 9.2867e-01 5.6902e-01 9.1074e-01

9.4578e-01 9.4889e-01 8.4584e-01 … 3.7839e-01 3.6997e-01 2.7487e-01

6.4675e-01 2.5806e-01 3.1640e-01 … 9.8110e-01 8.6193e-01 1.0357e-02

[torch.FloatTensor of size 3x5x90000]

I ran a convolutional kernel over it, having weight:

( 0 ,.,.) =

3.0515e-01

-3.3296e-01

-4.1675e-01

1.3134e-01

4.1769e-01

And bias:

0.1725

And got the output:

qc

4.5800e-02 5.3139e-01 5.3828e-01 … 2.0578e-01 4.6649e-01 -7.2540e-02

-7.0118e-02 -5.8567e-01 -5.5335e-01 … 3.0976e-01 -3.3483e-02 -1.3638e-01

-8.8448e-02 -4.1210e-02 -8.5710e-02 … -6.0755e-01 -5.5320e-01 -4.0962e-01

… ⋱ …

-2.8095e-01 1.1081e-01 7.9184e-02 … -3.7869e-01 -1.7801e-01 -4.1227e-01

-8.7498e-01 -8.1998e-01 -8.2176e-01 … -7.7750e-01 -8.6171e-01 -5.1366e-01

-5.7234e-01 -2.5415e-01 -2.5126e-01 … -5.8857e-01 -4.8405e-01 -2.4303e-01

[torch.FloatTensor of size 3x300x90000]

Let’s do the mathematics manualy: (weights of kernel * input) + bias

0.30515 * 0.18577 + -0.33296 * 0.83658 + -0.41675 * 0.71619 + 0.13134 * 0.94578 + 0.41769 * 0.645 + 0.1725

This comes out to be:

0.045796651399999944

Which is close to qc[0][0][0]

Note: I have just shown 0th element of the first dimension. (Only the ones which are relevant for the mathematics)

So, this dynamic graph got my intuition clear about how Conv1d operates in PyTorch, column-wise.

Code for the above was simply:

query_len = 5

lq = np.random.rand(3, query_len, WORD_DEPTH)

lq = Variable(torch.from_numpy(lq).float())

query_conv = nn.Conv1d(WORD_DEPTH, K, FILTER_LENGTH)

# print(lq)

# print (query_conv.weight)

# print (query_conv.bias)

qc = query_conv(lq)

# print (qc)

3. After getting all of the vectors for query, pos and negative docs. We had to compute the cosine similarity, in Keras you had to create a new layer for it. In PyTorch:

dots = [q_s.dot(pos_s)]

dots = dots + [q_s.dot(neg_s) for neg_s in neg_ss]

As simple as it becomes :)

4. Now dots is a list, to convert it to PyTorch variable, we just do:

dots = torch.stack(dots)

PS: This was my first implementation in PyTorch, if there is any issue in the code or something you can’t understand, Get in touch with me.

My twitter handle is nishantiam and github handle is nishnik.