I need to calculate the running sum of np.array. Let's say a is a given np.array of length l and n is an integer step, then I need an array b of length l - n + 1, where
b= [a[0]+a[1]+...+a[n-1], a[1]+a[2]+...+a[n], ..., a[l-n]+a[l-n+1]+...+a[l-1]]
I tried the following:
def run_sum(a, n):
a_cumsum = np.r_[0, np.cumsum(a)]
b = a_cumsum[n:] - a_cumsum[:-n]
return b
It works fine BUT if array a has very different elements (for example some of them are 1e8 and some of them are 1e-8) like in my case then this method gives too much error, which is accumulated in np.cumsum. Do you have any ideas on how to do this directly, without using np.cumsum?
Since a has lengh int(1e8) I don't want to use for cycle. Woud like to have something like:
ind = np.arange(a.size - n + 1)
b[ind] = np.sum(a[ind:ind+n])
but it doesn't work(
Update, example of a loss of precision:
a = np.array([-9999.999966666666, -5.7847991744735696e-08, -2.983181169098924e-08, -2.093995193995538e-08, -1.6894298480290635e-08, -1.4869270858640853e-08, -1.3961300790320832e-08, -1.3834651535607362e-08, -1.4395718046705324e-08, -1.570696320519037e-08])
run_sum(a, 3) =
np.array([-9999.999966754345, -1.0861913324333727e-07, -6.766640581190586e-08, -5.2703398978337646e-08, -4.5723936636932194e-08, -4.2664396460168064e-08, -4.219145921524614e-08, -4.393768904265016e-08])
a[:-2] + a[1:1] + a[2:] =
np.array([-9999.999966754345, -1.0861975537568031e-07, -6.766606211123524e-08, -5.2703521278886865e-08, -4.572487012925232e-08, -4.266522318456905e-08, -4.2191670372633516e-08, -4.393733278750305e-08])
The difference in the 5th figure is important for me!
Update 2: Think this will work. At least this toy example is processed well. The idea is to split initial array into arrays with large and small elements and to count cumsum independently.
def run_sum_new(a, n, max_exp_sep=3, exp_sep_lim=4):
"""
Finds the running sum of n consecutive elements of array a. If exponents of array elements
vary greatly, divides them into different arrays to minimize the final error.
Receives:
a - the original array
n - number of elements to sum
max_exp_sep - the maximum number of arrays the original array can be divided into (regardless for positive and negative elements).
In total, the maximum number of arrays is 2*max_exp_sep+1
exp_sep_lim - the minimum of the difference between the exponents of two elements at which they must be separated.
Returns:
run_sum - array of running sum of the original array
"""
#finds maximal exponents of pos and neg elements and minimal exponent of elements of a
a_pos = a[a>0]
a_neg = a[a<0]
if a_pos.size == 0:
a_pos_max = np.nan
else:
a_pos_max = np.max(a_pos)
if a_neg.size == 0:
a_neg_max = np.nan
else:
a_neg_max = np.max(np.abs(a_neg))
a_min = np.min(np.abs(a))
exp_pos_max = np.ceil(np.log10(a_pos_max))
exp_neg_max = np.ceil(np.log10(a_neg_max))
exp_min = np.floor(np.log10(a_min))
del(a_pos)
del(a_neg)
#finds separation limits list
d_exp_pos = exp_pos_max - exp_min
if np.isnan(d_exp_pos):
exp_pos_sep_list = []
elif d_exp_pos <= exp_sep_lim * (max_exp_sep + 1):
exp_pos_sep_list = [10**(exp_min + exp_sep_lim * i) for i in range(1, max_exp_sep+1) if 10**(exp_min + exp_sep_lim * i) < a_pos_max]
else:
new_exp_sep_lim = np.ceil(d_exp_pos / (max_exp_sep + 1))
print(f"Warning: positive elements of array have too large exponent spread {d_exp_pos} for the given max_exp_sep={max_exp_sep} and exp_sep_lim={exp_sep_lim}.",
f"New exp_sep_lim={new_exp_sep_lim} has been taken.", sep='\n')
exp_pos_sep_list = [10**(exp_min + new_exp_sep_lim * i) for i in range(1, max_exp_sep+1)]
d_exp_neg = exp_neg_max - exp_min
if np.isnan(d_exp_neg):
exp_neg_sep_list = []
elif d_exp_neg <= exp_sep_lim * (max_exp_sep + 1):
exp_neg_sep_list = [-10**(exp_min + exp_sep_lim * i) for i in range(1, max_exp_sep+1) if 10**(exp_min + exp_sep_lim * i) < a_neg_max]
else:
new_exp_sep_lim = np.ceil(d_exp_neg / (max_exp_sep + 1))
print(f"Warning: negative elements of array have too large exponent spread {d_exp_neg} for the given max_exp_sep={max_exp_sep} and exp_sep_lim={exp_sep_lim}.",
f"New exp_sep_lim={new_exp_sep_lim} has been taken.", sep='\n')
exp_neg_sep_list = [10**(exp_min + new_exp_sep_lim * i) for i in range(1, max_exp_sep+1)]
exp_neg_sep_list.reverse()
exp_sep_list = [-np.inf, ] + exp_neg_sep_list + exp_pos_sep_list + [np.inf,]
#separates a
a_sep_list = [np.where((i <= a) & (a < j), a, 0) for (i, j) in zip(exp_sep_list[:-1], exp_sep_list[1:])]
a_sep_arr = np.array(a_sep_list)
#finds run sum
a_sep_cumsum = np.cumsum(a_sep_arr, axis=-1)
a_sep_cumsum = np.hstack(([[0]]*a_sep_cumsum.shape[0], a_sep_cumsum))
run_sum = np.sum(a_sep_cumsum[:,n:] - a_sep_cumsum[:,:-n], axis=0)
return run_sum
run_sum_new(a, 3) = np.array([-9999.999966754345, -1.0861975537568031e-07, -6.766606211123526e-08, -5.270352127888686e-08, -4.5724870129252317e-08, -4.266522318456904e-08, -4.2191670372633516e-08, -4.393733278750304e-08])
a[:-2] + a[1:1] + a[2:] =
np.array([-9999.999966754345, -1.0861975537568031e-07, -6.766606211123524e-08, -5.2703521278886865e-08, -4.572487012925232e-08, -4.266522318456905e-08, -4.2191670372633516e-08, -4.393733278750305e-08])
timeit.timeit('run_sum_new(a, 3)', number=1000, globals=globals()) = 0.12470354699999997
np.cumsum(a, dtype=np.longdouble)?print(b)and see only very large values it doesn't mean that individual items are rounded, it only says thatprintfunction truncates decimals.