add similarity_search.py in machine_learning#3864
add similarity_search.py in machine_learning#3864cclauss merged 8 commits intoTheAlgorithms:masterfrom
Conversation
adding similarity_search algorithm in machine_learning
|
Please format your code with psf/black as discussed in CONTRIBUTING.md. |
|
| return None | ||
|
|
||
|
|
||
| def similarity_search(dataset: np, value: np) -> list: |
There was a problem hiding this comment.
These type hints seem off. Do the arguments dataset and value require the numpy module type? This should probably changed to:
def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:
...isort, codespell changed. applied feedback(np -> np.ndarray)
add type hints to euclidean method
| "Wrong input data's shape... dataset : ", | ||
| dataset.shape[1], | ||
| ", value : ", | ||
| value.shape[1], |
| "Input data have different datatype... dataset : ", | ||
| dataset.dtype, | ||
| ", value : ", | ||
| value.dtype, |
- changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation
cclauss
left a comment
There was a problem hiding this comment.
Use iteration of keys, values, and items more and use indexes less.
| dist = 0 | ||
|
|
||
| try: | ||
| for index, v in enumerate(input_a): |
There was a problem hiding this comment.
We should be zip() ing these lists together.
| raise TypeError("Euclidean's input types are not right ...") | ||
|
|
||
|
|
||
| def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list: |
There was a problem hiding this comment.
| def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list: | |
| def similarity_search(dataset: np.ndarray, value_array: np.ndarray) -> list: |
This is not a single value but an array of values.
| import numpy as np | ||
|
|
||
|
|
||
| def euclidean(input_a: np.ndarray, input_b: np.ndarray): |
There was a problem hiding this comment.
| def euclidean(input_a: np.ndarray, input_b: np.ndarray): | |
| def euclidean(input_a: np.ndarray, input_b: np.ndarray) -> float: |
| >>> a = np.array([[0], [1], [2]]) | ||
| >>> b = np.array([[0]]) | ||
| >>> similarity_search(a, b) |
There was a problem hiding this comment.
| >>> a = np.array([[0], [1], [2]]) | |
| >>> b = np.array([[0]]) | |
| >>> similarity_search(a, b) | |
| >>> dataset = np.array([[0], [1], [2]]) | |
| >>> value_array = np.array([[0]]) | |
| >>> similarity_search(dataset, value_array) |
Repeat for other these below...
Please add tests that raise errors.
| for index, v in enumerate(value): | ||
| dist = euclidean(value[index], dataset[0]) | ||
| vector = dataset[0].tolist() | ||
|
|
||
| for index2 in range(1, len(dataset)): | ||
| temp_dist = euclidean(value[index], dataset[index2]) | ||
|
|
||
| if dist > temp_dist: | ||
| dist = temp_dist | ||
| vector = dataset[index2].tolist() |
There was a problem hiding this comment.
| for index, v in enumerate(value): | |
| dist = euclidean(value[index], dataset[0]) | |
| vector = dataset[0].tolist() | |
| for index2 in range(1, len(dataset)): | |
| temp_dist = euclidean(value[index], dataset[index2]) | |
| if dist > temp_dist: | |
| dist = temp_dist | |
| vector = dataset[index2].tolist() | |
| for value in value_array.values(): | |
| dist = euclidean(value, dataset[0]) | |
| vector = dataset[0].tolist() | |
| for dataset_value in dataset[1:].values(): | |
| temp_dist = euclidean(value, dataset_value) | |
| if dist > temp_dist: | |
| dist = temp_dist | |
| vector = dataset_value.tolist() |
|
Please add some tests that raise errors like https://github.com/TheAlgorithms/Python/blob/master/arithmetic_analysis/bisection.py does and then I think we are ready to merge this one. |
- deleted try/catch in euclidean - added error tests - name change(value -> value_array)
|
@cclauss When adding error examples, one of the examples couldn't pass flake8. Is there any ways to avoid this? |
* add similarity_search.py in machine_learning adding similarity_search algorithm in machine_learning * fix pre-commit test, apply feedback isort, codespell changed. applied feedback(np -> np.ndarray) * apply feedback add type hints to euclidean method * apply feedback - changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation * apply feedback - deleted try/catch in euclidean - added error tests - name change(value -> value_array) * # doctest: +NORMALIZE_WHITESPACE * Update machine_learning/similarity_search.py * placate flake8 Co-authored-by: Christian Clauss <cclauss@me.com>
* add similarity_search.py in machine_learning adding similarity_search algorithm in machine_learning * fix pre-commit test, apply feedback isort, codespell changed. applied feedback(np -> np.ndarray) * apply feedback add type hints to euclidean method * apply feedback - changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation * apply feedback - deleted try/catch in euclidean - added error tests - name change(value -> value_array) * # doctest: +NORMALIZE_WHITESPACE * Update machine_learning/similarity_search.py * placate flake8 Co-authored-by: Christian Clauss <cclauss@me.com>
* add similarity_search.py in machine_learning adding similarity_search algorithm in machine_learning * fix pre-commit test, apply feedback isort, codespell changed. applied feedback(np -> np.ndarray) * apply feedback add type hints to euclidean method * apply feedback - changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation * apply feedback - deleted try/catch in euclidean - added error tests - name change(value -> value_array) * # doctest: +NORMALIZE_WHITESPACE * Update machine_learning/similarity_search.py * placate flake8 Co-authored-by: Christian Clauss <cclauss@me.com>
* add similarity_search.py in machine_learning adding similarity_search algorithm in machine_learning * fix pre-commit test, apply feedback isort, codespell changed. applied feedback(np -> np.ndarray) * apply feedback add type hints to euclidean method * apply feedback - changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation * apply feedback - deleted try/catch in euclidean - added error tests - name change(value -> value_array) * # doctest: +NORMALIZE_WHITESPACE * Update machine_learning/similarity_search.py * placate flake8 Co-authored-by: Christian Clauss <cclauss@me.com>
adding similarity_search algorithm in machine_learning
Describe your change:
Checklist:
Fixes: #{$ISSUE_NO}.