add similarity_search.py in machine_learning by SteveKimSR · Pull Request #3864 · TheAlgorithms/Python

SteveKimSR · 2020-11-05T09:53:48Z

adding similarity_search algorithm in machine_learning

Describe your change:

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Documentation change?

Checklist:

adding similarity_search algorithm in machine_learning

cclauss · 2020-11-05T14:21:16Z

Please format your code with psf/black as discussed in CONTRIBUTING.md.

mrmaxguns · 2020-11-05T14:43:19Z

Isort (import sorting) failed. Make sure to install isort pip install isort and run it.
Codespell failed. Change datas ==> data

machine_learning/similarity_search.py

mrmaxguns · 2020-11-05T15:26:35Z

machine_learning/similarity_search.py

+    return None
+
+
+def similarity_search(dataset: np, value: np) -> list:


These type hints seem off. Do the arguments dataset and value require the numpy module type? This should probably changed to:

def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list: ...

isort, codespell changed. applied feedback(np -> np.ndarray)

add type hints to euclidean method

machine_learning/similarity_search.py

cclauss · 2020-11-06T07:22:33Z

machine_learning/similarity_search.py

+                "Wrong input data's shape... dataset : ",
+                dataset.shape[1],
+                ", value : ",
+                value.shape[1],


machine_learning/similarity_search.py

cclauss · 2020-11-06T07:23:51Z

machine_learning/similarity_search.py

+            "Input data have different datatype... dataset : ",
+            dataset.dtype,
+            ", value : ",
+            value.dtype,


machine_learning/similarity_search.py

- changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation

cclauss

Use iteration of keys, values, and items more and use indexes less.

cclauss · 2020-11-11T07:32:23Z

machine_learning/similarity_search.py

+    dist = 0
+
+    try:
+        for index, v in enumerate(input_a):


We should be zip() ing these lists together.

cclauss · 2020-11-11T08:10:32Z

machine_learning/similarity_search.py

+        raise TypeError("Euclidean's input types are not right ...")
+
+
+def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:


Suggested change

def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:

def similarity_search(dataset: np.ndarray, value_array: np.ndarray) -> list:

This is not a single value but an array of values.

cclauss · 2020-11-11T08:14:54Z

machine_learning/similarity_search.py

+import numpy as np
+
+
+def euclidean(input_a: np.ndarray, input_b: np.ndarray):


Suggested change

def euclidean(input_a: np.ndarray, input_b: np.ndarray):

def euclidean(input_a: np.ndarray, input_b: np.ndarray) -> float:

machine_learning/similarity_search.py

cclauss · 2020-11-11T08:20:51Z

machine_learning/similarity_search.py

+    >>> a = np.array([[0], [1], [2]])
+    >>> b = np.array([[0]])
+    >>> similarity_search(a, b)


Suggested change

>>> a = np.array([[0], [1], [2]])

>>> b = np.array([[0]])

>>> similarity_search(a, b)

>>> dataset = np.array([[0], [1], [2]])

>>> value_array = np.array([[0]])

>>> similarity_search(dataset, value_array)

Repeat for other these below...

Please add tests that raise errors.

cclauss · 2020-11-11T08:26:43Z

machine_learning/similarity_search.py

+    for index, v in enumerate(value):
+        dist = euclidean(value[index], dataset[0])
+        vector = dataset[0].tolist()
+
+        for index2 in range(1, len(dataset)):
+            temp_dist = euclidean(value[index], dataset[index2])
+
+            if dist > temp_dist:
+                dist = temp_dist
+                vector = dataset[index2].tolist()


Suggested change

for index, v in enumerate(value):

dist = euclidean(value[index], dataset[0])

vector = dataset[0].tolist()

for index2 in range(1, len(dataset)):

temp_dist = euclidean(value[index], dataset[index2])

if dist > temp_dist:

dist = temp_dist

vector = dataset[index2].tolist()

for value in value_array.values():

dist = euclidean(value, dataset[0])

vector = dataset[0].tolist()

for dataset_value in dataset[1:].values():

temp_dist = euclidean(value, dataset_value)

if dist > temp_dist:

dist = temp_dist

vector = dataset_value.tolist()

cclauss · 2020-11-13T06:22:54Z

Please add some tests that raise errors like https://github.com/TheAlgorithms/Python/blob/master/arithmetic_analysis/bisection.py does and then I think we are ready to merge this one.

- deleted try/catch in euclidean - added error tests - name change(value -> value_array)

machine_learning/similarity_search.py

SteveKimSR · 2020-11-13T14:04:51Z

@cclauss When adding error examples, one of the examples couldn't pass flake8. Is there any ways to avoid this?
(line 91, TypeError: Input data have different datatype... dataset : float32, value_array : int32)
Or should i change error outputs??

cclauss

Nice!!!

* add similarity_search.py in machine_learning adding similarity_search algorithm in machine_learning * fix pre-commit test, apply feedback isort, codespell changed. applied feedback(np -> np.ndarray) * apply feedback add type hints to euclidean method * apply feedback - changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation * apply feedback - deleted try/catch in euclidean - added error tests - name change(value -> value_array) * # doctest: +NORMALIZE_WHITESPACE * Update machine_learning/similarity_search.py * placate flake8 Co-authored-by: Christian Clauss <cclauss@me.com>

add similarity_search.py in machine_learning

40a6503

adding similarity_search algorithm in machine_learning

mrmaxguns suggested changes Nov 5, 2020

View reviewed changes

SteveKimSR added 2 commits November 6, 2020 11:05

fix pre-commit test, apply feedback

09caa3b

isort, codespell changed. applied feedback(np -> np.ndarray)

apply feedback

7ce2cce

add type hints to euclidean method

cclauss requested changes Nov 6, 2020

View reviewed changes

apply feedback

f38fb3e

- changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation

SteveKimSR requested a review from cclauss November 11, 2020 06:57

cclauss requested changes Nov 11, 2020

View reviewed changes

apply feedback

ebfe05a

- deleted try/catch in euclidean - added error tests - name change(value -> value_array)

cclauss reviewed Nov 13, 2020

View reviewed changes

machine_learning/similarity_search.py Outdated Show resolved Hide resolved

cclauss approved these changes Nov 13, 2020

View reviewed changes

cclauss added 3 commits November 13, 2020 15:17

# doctest: +NORMALIZE_WHITESPACE

2d2c6b8

Update machine_learning/similarity_search.py

9637de7

placate flake8

6f7c9ce

cclauss merged commit ae4d7d4 into TheAlgorithms:master Nov 13, 2020

		return None


		def similarity_search(dataset: np, value: np) -> list:

		raise TypeError("Euclidean's input types are not right ...")


		def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:

		import numpy as np


		def euclidean(input_a: np.ndarray, input_b: np.ndarray):

Uh oh!

Conversation

SteveKimSR commented Nov 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your change:

Checklist:

Uh oh!

cclauss commented Nov 5, 2020

Uh oh!

mrmaxguns commented Nov 5, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cclauss left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cclauss commented Nov 13, 2020

Uh oh!

Uh oh!

SteveKimSR commented Nov 13, 2020

Uh oh!

cclauss left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SteveKimSR commented Nov 5, 2020 •

edited

Loading