0

How to compute cosine similarity between 2 Spark Vector. I am using the new ml package.

Spark 2.1.1

EDIT:

Spark provide RowMatrix which can be used to compute similarity but it accepts mllib.vector not an ml.vector.

Is there a way to convert Vectors from the different packages? Is there an implementation that uses ml.vector?

1
  • 1
    You could create an UDF which takes the two vectors as input and make the calculations there. Commented May 19, 2017 at 15:29

1 Answer 1

2

The easiest way to convert from an mllib vector to an ml vector is to use the Vectors.fromML method, see Vectors documentation. Example:

val mlVector = org.apache.spark.ml.linalg.Vectors.dense((Array(1.0,2.0,3.0)))
println(mlVector.getClass())

val mllibVector = org.apache.spark.mllib.linalg.Vectors.fromML(mlVector)
println(mllibVector.getClass())

Gives an output:

class org.apache.spark.ml.linalg.DenseVector
class org.apache.spark.mllib.linalg.DenseVector
Sign up to request clarification or add additional context in comments.

1 Comment

That is what I needed thanks. And to go from mllib.Vector to ml.Vector just use asML() from the mllib.Vector instance directly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.