問題描述
是否有更快的方法來檢查數據條目的符號是否與給定熊貓列的前一行中數據條目的符號不同? (Is there a faster way to check if the sign of the data entry is different than the sign of the data entry in previous row for a given pandas column?)
我試圖找出“Delta”列中的數據條目在固定行數內改變符號的次數(從正數變為負數,反之亦然)
這是我正在處理的數據框以及到目前為止我嘗試過的方法:
import pandas as pd
df = pd.DataFrame({'id': [0,1,2,3,4,5], 'Delta':[1,2,‑3,4,5,‑7]})
j=1
step=4
for i in range(1,df.shape[0]):
if j<df.shape[0]‑3:
for j in range(j,j+3):
df['Spike'].iloc[j]=(df['Delta'].iloc[i]>0)‑(df['Delta'].iloc[i‑1]>0)
j=j+step
for j
循環的目的是對給定的固定數字執行此“符號更改檢查”行數。
所需的輸出將是 3,因為 'Delta' 列中的值已更改符號 3 次
不幸的是,這段代碼需要很多時間。我的猜測是,pandas 會一一檢查每一行,從而使整個代碼運行速度變慢。
參考解法
方法 1:
To detect whether a row has changed sign from the previous row, you can multiply them and check whether the result is less than zero. You can use Series.shift
to align an entry with the row above it in order to compare them in a single Series operation:
df['Delta'] * df['Delta'].shift() < 0
This will return a new Series with True's every time sign has changed.
Now in order to find how many times sign has changed in a window of 4 rows, you can use Series.rolling
on this derived series, and then simply sum()
the entries together (in a sum()
, True will be 1 and False will be 0, so the result of sum()
will be the number of True's in the window.)
df['Spike'] = (df['Delta'] * df['Delta'].shift() < 0).rolling(4).sum()
This seems to be close to what you're doing now, perhaps slightly different in that this code is looking at past entries while your code is looking at future entries (perhaps another use of shift(3)
or so would fix that if you need it?)
方法 2:
Use:
>>> s = df.groupby(df['Delta'].lt(0).ne(df['Delta'].lt(0).shift())).size()
>>> s[s.index == False].item()
2
>>>
(by Masail Guliyev、filbranden、U12‑Forward)