如果日期較新,則將日期從一個數據幀替換為另一個數據幀 (Replace date from one dataframe to another if it's newer)


問題描述

如果日期較新,則將日期從一個數據幀替換為另一個數據幀 (Replace date from one dataframe to another if it's newer)

所以我在這裡碰壁了,我覺得它很簡單,但無法讓它工作。

我有兩個數據框,我想比較日期並更新一個與另一個日期如果它是新的。

這是第一個數據幀(df1):

Unit Number    Date         Score
SAH‑459        1/24/2019    0.951
SAH‑450        2/3/2019     0.852
SAH‑399        10/5/2019    0.354

這是第二個(df2):

Unit Number    ActualDate
SAH‑459        1/30/2019
SAH‑450        2/3/2019
SAH‑399        9/28/2019

所以我只想從 SAH‑459 更新到 2019 年 1 月 30 日的日期,因為它更新了

這是我到目前為止所做的:

unitlist = df1['Unit Number'].unique()    #gets array with unique Unit Numbers

for unit in unitlist:
    date1 = df1.loc[df1['Unit Number'] == unit, 'Date'].values
    date2 = df2.loc[df2['Unit number'] == unit, 'ActualDate].values

    if date2 > date1:
        df1.at[[unit],'Date'] = date2

但是它拋出了這個錯誤:

p>
"None of [Index(['SAH‑459'], dtype='object')] are in the [index]"

我必須說兩個數據幀的唯一單元列表不一樣。從 df1 到 df2 可能缺少一些單位,反之亦然。

提前致謝!


參考解法

方法 1:

Case 1: same units in both dataframes Let's first create the 2 dataframes with:

df1 = pd.DataFrame(
    {
        "Unit_Number": ["SAH‑459", "SAH‑450", "SAH‑399"],
        "Date": ["1/24/2019", "2/3/2019", "10/5/2019"],
        "Score": [0.951, 0.852, 0.354],
    }
)
df2 = pd.DataFrame(
    {
        "Unit_Number": ["SAH‑459", "SAH‑450", "SAH‑399"],
        "ActualDate": ["1/30/2019", "2/3/2019", "9/28/2019"],
    }
)

In order to do the update you should set Unit_Number as the index and transform the dates columns to datetime with:

# Set index
df1 = df1.set_index("Unit_Number")
df2 = df2.set_index("Unit_Number")

# Transform to date
df1["Date"] = pd.to_datetime(df1["Date"])
df2["ActualDate"] = pd.to_datetime(df2["ActualDate"])

So now you can create a filter of which units have a date in df2 that is greater than the date in df1 and use it to update the dates of those units in df1 with:

mfilter = df2["ActualDate"] > df1["Date"]
df1.loc[mfilter, "Date"] = df2.loc[mfilter, "ActualDate"]

This will work if both dataframes have the same units and in the same order.

And if you need it you can reset the indexs after doing this update with:

df1 = df1.reset_index()
df2 = df2.reset_index()

Case 2: different units in both dataframes

Let's add more units to test this case:

df1 = pd.DataFrame(
    {
        "Unit_Number": ["SAH‑459", "SAH‑450", "SAH‑399", "SAH‑777"],
        "Date": ["1/24/2019", "2/3/2019", "10/5/2019", "4/29/2019"],
        "Score": [0.951, 0.852, 0.354, 0.982],
    }
)
df2 = pd.DataFrame(
    {
        "Unit_Number": ["SAH‑459", "SAH‑450", "SAH‑399", "SAH‑999"],
        "ActualDate": ["1/30/2019", "2/3/2019", "9/28/2019", "1/12/2019"],
    }
)

You will need to do the same:

df1 = df1.set_index("Unit_Number")
df2 = df2.set_index("Unit_Number")

# Transform to date
df1["Date"] = pd.to_datetime(df1["Date"])
df2["ActualDate"] = pd.to_datetime(df2["ActualDate"])

They key is to reindex mfilter using the index from df1 and then reindex mfilter in df2.loc:

mfilter = df2["ActualDate"].reindex(df1.index) > df1["Date"]
df1.loc[mfilter, "Date"] = df2.loc[mfilter.reindex(df2.index).fillna(False), "ActualDate"]

This will always work

(by adrianvilloro)

參考文件

  1. Replace date from one dataframe to another if it's newer (CC BY‑SA 2.5/3.0/4.0)

#if-statement #python-3.x #pandas






相關問題

Python 和 if 語句 (Python and if statement)

Ruby 一種在條件下執行函數的巧妙方法 (Ruby a clever way to execute a function on a condition)

為什麼我的 php 代碼繞過了一些 if 語句? (Why is my php code bypassing a few if statements?)

為什麼“如果”不是C中的表達式 (Why isn't "if" an expression in C)

如何對此查詢進行選擇案例? (How can I do select case to this query?)

我應該使用方法還是常量標誌? (Should I use methods or constant flags?)

PHP - 使用哪個條件測試? (PHP - Which conditional test to use?)

如果日期較新,則將日期從一個數據幀替換為另一個數據幀 (Replace date from one dataframe to another if it's newer)

BASH:在 for 循環中使用 continue (BASH: Using a continue in a for loop)

有沒有辦法從 Tableau 中的 regexp_match 語句中排除某些關鍵字? (Is there a way to exclude certain keywords from a regexp_match statement in Tableau?)

Excel 如果單元格為空白單元格總數的空白單元格總數 (Excel If cell is blank sum number of blank cells for a total)

使用另一個數據框的條件創建一個新列 (Create a new column with a condition of another dataframe)







留言討論