問題描述
如果日期較新,則將日期從一個數據幀替換為另一個數據幀 (Replace date from one dataframe to another if it's newer)
所以我在這裡碰壁了,我覺得它很簡單,但無法讓它工作。
我有兩個數據框,我想比較日期並更新一個與另一個日期如果它是新的。
這是第一個數據幀(df1):
Unit Number Date Score
SAH‑459 1/24/2019 0.951
SAH‑450 2/3/2019 0.852
SAH‑399 10/5/2019 0.354
這是第二個(df2):
Unit Number ActualDate
SAH‑459 1/30/2019
SAH‑450 2/3/2019
SAH‑399 9/28/2019
所以我只想從 SAH‑459 更新到 2019 年 1 月 30 日的日期,因為它更新了
這是我到目前為止所做的:
unitlist = df1['Unit Number'].unique() #gets array with unique Unit Numbers
for unit in unitlist:
date1 = df1.loc[df1['Unit Number'] == unit, 'Date'].values
date2 = df2.loc[df2['Unit number'] == unit, 'ActualDate].values
if date2 > date1:
df1.at[[unit],'Date'] = date2
但是它拋出了這個錯誤:
p>"None of [Index(['SAH‑459'], dtype='object')] are in the [index]"
我必須說兩個數據幀的唯一單元列表不一樣。從 df1 到 df2 可能缺少一些單位,反之亦然。
提前致謝!
參考解法
方法 1:
Case 1: same units in both dataframes Let's first create the 2 dataframes with:
df1 = pd.DataFrame(
{
"Unit_Number": ["SAH‑459", "SAH‑450", "SAH‑399"],
"Date": ["1/24/2019", "2/3/2019", "10/5/2019"],
"Score": [0.951, 0.852, 0.354],
}
)
df2 = pd.DataFrame(
{
"Unit_Number": ["SAH‑459", "SAH‑450", "SAH‑399"],
"ActualDate": ["1/30/2019", "2/3/2019", "9/28/2019"],
}
)
In order to do the update you should set Unit_Number
as the index and transform the dates columns to datetime with:
# Set index
df1 = df1.set_index("Unit_Number")
df2 = df2.set_index("Unit_Number")
# Transform to date
df1["Date"] = pd.to_datetime(df1["Date"])
df2["ActualDate"] = pd.to_datetime(df2["ActualDate"])
So now you can create a filter of which units have a date in df2
that is greater than the date in df1
and use it to update the dates of those units in df1
with:
mfilter = df2["ActualDate"] > df1["Date"]
df1.loc[mfilter, "Date"] = df2.loc[mfilter, "ActualDate"]
This will work if both dataframes have the same units and in the same order.
And if you need it you can reset the indexs after doing this update with:
df1 = df1.reset_index()
df2 = df2.reset_index()
Case 2: different units in both dataframes
Let's add more units to test this case:
df1 = pd.DataFrame(
{
"Unit_Number": ["SAH‑459", "SAH‑450", "SAH‑399", "SAH‑777"],
"Date": ["1/24/2019", "2/3/2019", "10/5/2019", "4/29/2019"],
"Score": [0.951, 0.852, 0.354, 0.982],
}
)
df2 = pd.DataFrame(
{
"Unit_Number": ["SAH‑459", "SAH‑450", "SAH‑399", "SAH‑999"],
"ActualDate": ["1/30/2019", "2/3/2019", "9/28/2019", "1/12/2019"],
}
)
You will need to do the same:
df1 = df1.set_index("Unit_Number")
df2 = df2.set_index("Unit_Number")
# Transform to date
df1["Date"] = pd.to_datetime(df1["Date"])
df2["ActualDate"] = pd.to_datetime(df2["ActualDate"])
They key is to reindex mfilter
using the index from df1
and then reindex mfilter
in df2.loc
:
mfilter = df2["ActualDate"].reindex(df1.index) > df1["Date"]
df1.loc[mfilter, "Date"] = df2.loc[mfilter.reindex(df2.index).fillna(False), "ActualDate"]
This will always work