問題描述
如何使用條件將一個數據框列的值與另一個數據框列的值匹配? (How do you match the value of one dataframe's column with another dataframe's column using conditionals?)
我有兩個數據框:
Row No. Subject
1 Apple
2 Banana
3 Orange
4 Lemon
5 Strawberry
row_number Subjects Special?
1 Banana Yes
2 Lemon No
3 Apple No
4 Orange No
5 Strawberry Yes
6 Cranberry Yes
7 Watermelon No
我想更改第一個數據框的行號以匹配第二個數據框。應該是這樣的:
Row No. Subject
3 Apple
1 Banana
4 Orange
2 Lemon
5 Strawberry
我已經試過這個代碼:
for index, row in df1.iterrows():
if df1['Subject'] == df2['Subjects']:
df1['Row No.'] = df2['row_number']
但是我得到了錯誤:
ValueError: Can only compare identically‑labeled Series objects
這是否意味著數據幀必須有相同數量的行和列?它們也必須貼上相同的標籤嗎?有沒有辦法繞過這個限制?
編輯:我找到了一個有前途的替代公式:
for x in df1['Subject']:
if x in df2['Subjects'].values:
df2.loc[df2['Subjects'] == x]['row_number'] = df1.loc[df1['Subject'] == x]['Row No.']
但它似乎並沒有像我想要的那樣修改第一個數據幀. 任何提示為什麼?此外,我收到此警告:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
參考解法
方法 1:
I would avoid using for
loops especially when pandas
has such great methods to handle these types of problems already.
Using pd.Series.replace
Here is a vectorized way of doing this ‑
d
is the dictionary that maps the fruit to the number in second dataframe- You can use
df.Subject.replace(d)
to now simply replace the keys in the dictd
to their values. - Overwrite the
Row No.
column with this now.
d = dict(zip(df2['Subjects'], df2['row_number']))
df1['Row No.'] = df1.Subject.replace(d)
print(df1)
Subject Row No.
0 Apple 3
1 Banana 1
2 Orange 4
3 Lemon 2
4 Strawberry 5
Using pd.merge
Let's try simply merging the 2 dataframe and replace the column completely.
ddf = pd.merge(df1['Subject'],
df2[['row_number','Subjects']],
left_on='Subject',
right_on='Subjects',
how='left').drop('Subjects',1)
ddf.columns = df1.columns[::‑1]
print(ddf)
Subject Row No.
0 Apple 3
1 Banana 1
2 Orange 4
3 Lemon 2
4 Strawberry 5
方法 2:
Assuming the first is df1 and the second is df2, this should do what you want it to:
import pandas as pd
d1 = {'Row No.': [1, 2, 3, 4, 5], 'Subject': ['Apple', 'Banana', 'Orange',
'Lemon', 'Strawberry']}
df1 = pd.DataFrame(data=d1)
d2 = {'row_number': [1, 2, 3, 4, 5, 6, 7], 'Subjects': ['Banana', 'Lemon', 'Apple',
'Orange', 'Strawberry', 'Cranberry', 'Watermelon'], 'Special?': ['Yes', 'No',
'No', 'No',
'Yes', 'Yes', 'No']}
df2 = pd.DataFrame(data=d2)
for x in df1['Subject']:
if x in df2['Subjects'].values:
df1.loc[df1['Subject'] == x, 'Row No.'] = (df2.loc[df2['Subjects'] == x]['row_number']).item()
#print(df1)
#print(df2)
In your edited answer it looks like you had the dataframes swapped and you were missing the item() to get the actual row_number value and not the Series object.
(by ACan、Akshay Sehgal、Kel Varnsen)