1

I'm writing code to group points around each point in a curve into concentric regions. Then I find the geometric median of each region as way of determining its "shape".

The point indices are determined by np.where statements for each region and all the np.where statement are placed into one np array to make it easier to index and handle if there are no points within a region. This master array dtype is also an object because each region has a variable amount of points in them.

However, in the rare case that each region has the same number of points, the array structure doesn't change, but the integers within each np.where array change from numpy ints into python ints. That is a problem because when I go to use those values as indices, I get an Index Error:

IndexError: arrays used as indices must be of integer (or boolean) type

This is the code I have and what I've attempted:

z = np.array([0,0.04,0.8,1.2,1.6,2])
    
    for j in range(1,n1-1): #1 to n1-2, ignoring the initial and final parts 
        print(j) 
        
        r_a = j_r[j-1,0] r_b = j_r[j-1,1]
    
        dist_a = s_dist[j][:j]
        dist_b = s_dist[j][j+1:]
    
        dist_pt_a = np.array([np.where((dist_a <= r_a*z[1]) & (dist_a >= r_a*z[0]))[0], #a: r1
                              np.where((dist_a <= r_a*z[2]) & (dist_a >= r_a*z[1]))[0], #a: r2
                              np.where((dist_a <= r_a*z[3]) & (dist_a >= r_a*z[2]))[0], #a: r3
                              np.where((dist_a <= r_a*z[4]) & (dist_a >= r_a*z[3]))[0], #a: r4
                              np.where((dist_a <= r_a*z[5]) & (dist_a >= r_a*z[4]))[0]], dtype=object) #a: r5
    
        dist_pt_b = np.array([np.where((dist_b <= r_b*z[1]) & (dist_b >= r_b*z[0]))[0] + (j+1), #b: r1
                              np.where((dist_b <= r_b*z[2]) & (dist_b >= r_b*z[1]))[0] + (j+1), #b: r2
                              np.where((dist_b <= r_b*z[3]) & (dist_b >= r_b*z[2]))[0] + (j+1), #b: r3
                              np.where((dist_b <= r_b*z[4]) & (dist_b >= r_b*z[3]))[0] + (j+1), #b: r4
                              np.where((dist_b <= r_b*z[5]) & (dist_b >= r_b*z[4]))[0] + (j+1)], dtype=object) #b: r5

#What's happening:

#THE GOOD

#the array with 5 regions, each of differing length
dist_pt_b: [array([1081, 1082, 1083], dtype=int64)
array([1084, 1085, 1086], dtype=int64) 
array([1087, 1088], dtype=int64)
array([1089, 1090, 1091, 1092], dtype=int64)
array([1093, 1094, 1095], dtype=int64)]
#the first region
r (0): [1081 1082 1083], <class 'numpy.ndarray'>
#the first index in the first region
r (0): 1081, <class 'numpy.int64'>
#is indexable to that cartesian point
r (0): [[ 0.96494    23.29605851][ 0.97084    23.29796217][ 0.98314    23.30293551]]

#THE BAD

#the array with 5 regions, each of same length

dist_pt_b: [[1082 1083 1084][1085 1086 1087][1088 1089 1090][1091 1092 1093][1094 1095 1096]]
#the first region
r (0): [1082 1083 1084], <class 'numpy.ndarray'>
#the first index in the first region
r (0): 1082, <class 'int'>
#INDEX ERROR
127     print(f'r ({r}): {dist_pt_b[r]}, {type(dist_pt_b[r])}')
128     print(f'r ({r}): {dist_pt_b[r][0]}, {type(dist_pt_b[r][0])}')
--> 129     print(f'r ({r}): {s_n[dist_pt_b[r]]}')
130 dist_pt_b[r] = np.mean(s_n[dist_pt_b[r]], axis = 0)
131 theta_s_b[2r], theta_s_b[2r+1] = dist_pt_b[r]
IndexError: arrays used as indices must be of integer (or boolean) type
7
  • Mhm! That's how I get multiple conditions within the np.where statement. Otherwise I didn't know how to find the indices that were >= "2" and <= "4" Commented Jan 14 at 18:45
  • 1
    It looks like you're dealing with a case of numpy trying to be too smart... I can tell you need the output to always be a 1D numpy array containing variable length array objects, but it assumes you wanted a 2D array containing int objects in the edge-case... I'm not sure there is any convenient way around it. Are you certain you need the "outer" container to be a "ragged" numpy array? If I'm reading everything correctly, your code would work as is if you used a simple python list as "outer" container (thus omitting the np.array(..., dtype=object) altogether) Commented Jan 14 at 19:48
  • 1
    Not a huge fan of this, but you might also consider forcing the indexer to int when you index: s_n[dist_pt_b[r].astype(int)]) Commented Jan 14 at 19:50
  • @Chrysophylaxs Thank you so much! I couldn't nail down what was happening until you said it, "numpy is being too smart." Forcing it into int does solves the problem though. Are you not a fan of your solution because it's just more computationally intensive? Also if you make your comment an answer, I could send some kudos your way. Commented Jan 14 at 20:04
  • @lastchance For whatever reason, trying to use "and" within np.where gives me a value error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), so I settled on the bitwise and. Is there a reason why I should be looking to use "&&" vs "&" in this case? Commented Jan 14 at 20:07

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.