# matrix – cosine similarity built-in function in matlab

## matrix – cosine similarity built-in function in matlab

Short version by calculating the similarity with `pdist`

:

```
S2 = squareform(1-pdist(S1,cosine)) + eye(size(S1,1));
```

### Explanation:

`pdist(S1,cosine)`

calculates the cosine distance between all combinations of rows in `S1`

. Therefore the similarity between all combinations is `1 - pdist(S1,cosine)`

.

We can turn that into a square matrix where element `(i,j)`

corresponds to the similarity between rows `i`

and `j`

with `squareform(1-pdist(S1,cosine))`

.

Finally we have to set the main diagonal to 1 because the similaritiy of a row with itself is obviously 1 but that is not explicitly calculated by `pdist`

.

Your code loops over all rows, and for each row loops over (about) half the rows, computing the dot product for each unique combination of rows:

```
n_row = size(S1,1);
norm_r = sqrt(sum(abs(S1).^2,2)); % same as norm(S1,2,rows)
S2 = zeros(n_row,n_row);
for i = 1:n_row
for j = i:n_row
S2(i,j) = dot(S1(i,:), S1(j,:)) / (norm_r(i) * norm_r(j));
S2(j,i) = S2(i,j);
end
end
```

(Ive taken the liberty to complete your code so it actually runs. Note the initialization of `S2`

before the loop, this saves a lot of time!)

If you note that the dot product is a matrix product of a row vector with a column vector, you can see that the above, without the normalization step, is identical to

```
S2 = S1 * S1.;
```

This runs much faster than the explicit loop, even if it is (maybe?) not able to use the symmetry. The normalization is simply dividing each row by `norm_r`

and each column by `norm_r`

. Here I multiply the two vectors to produce a square matrix to normalize with:

```
S2 = (S1 * S1.) ./ (norm_r * norm_r.);
```