ValueError:發現樣本數量不一致的輸入變量:[2935848、2935849] (ValueError: Found input variables with inconsistent numbers of samples: [2935848, 2935849])


問題描述

ValueError:發現樣本數量不一致的輸入變量:[2935848、2935849] (ValueError: Found input variables with inconsistent numbers of samples: [2935848, 2935849])

當我運行這段代碼時:

feature_names = ["date","shop_id", "item_id", "item_price", "item_cnt_day"]
feature_names

X_train = train[feature_names]
print(X_train.shape)
X_train.head()

X_sales = sales[feature_names]
print(X_sales.shape)
X_sales.head()

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X_train, X_sales, y_train, y_sales = train_test_split(X_train, X_sales, test_size=0.3)


feature_names = ["date","shop_id", "item_id", "item_price", "item_cnt_day"]
feature_names
​
X_train = train[feature_names]
print(X_train.shape)
X_train.head()
​
X_sales = sales[feature_names]
print(X_sales.shape)
X_sales.head()
​
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
​
X_train, X_sales, y_train, y_sales = train_test_split(X_train, X_sales, test_size=0.3)
​
(2935848, 5)
(2935849, 5)

我得到這個 ValueError:

ValueError Traceback(最近一次調用最後一次)來自 13 sklearn.metrics 導入 mean_squared_error 14 ‑‑‑> 15 X_train, X_sales, y_train, y_sales = train_test_split(X_train, X_sales, test_size=0.3) 16

~/anaconda3/envs/aiffel/lib/python3.7/site‑packages/sklearn/model_selection/_split .py in train_test_split(*arrays, **options) 2125 raise TypeError(“傳遞的參數無效:%s”% str(options)) 2126 ‑> 2127 數組 = 可索引(*數組) 2128 2129 n_samples = _num_samples(數組 [0])

~/anaconda3/envs/aiffel/lib/python3.7/site‑packages/sklearn/utils/validation。py in indexable(*iterables) 291 “” 292 結果 = [_make_indexable(X) for X in iterables] ‑‑> 293 check_consistent_length(*result) 294 返回結果 295

~/anaconda3/envs/aiffel/lib/python3.7/site‑packages/sklearn/utils/validation.py in check_consistent_length(*arrays) 255 if len(唯一)> 1: 256 raise ValueError("Found input variables with distinct numbers of" ‑‑> 257" samples: %r" % [int(l) for l in lengths]) 258 259

ValueError :發現樣本數量不一致的輸入變量:[2935848, 2935849]

293 check_consistent_length(*result) 294 返回結果 295

~/anaconda3/envs/aiffel/lib/python3.7/site‑packages/sklearn/utils/validation.py in check_consistent_length(*arrays) 255 if len(唯一)> 1: 256 raise ValueError("Found input variables with distinct numbers of" ‑‑> 257" samples: %r" % [int(l) for l in lengths]) 258 259

ValueError :發現樣本數量不一致的輸入變量:[2935848, 2935849]

293 check_consistent_length(*result) 294 返回結果 295

~/anaconda3/envs/aiffel/lib/python3.7/site‑packages/sklearn/utils/validation.py in check_consistent_length(*arrays) 255 if len(唯一)> 1: 256 raise ValueError("Found input variables with distinct numbers of" ‑‑> 257" samples: %r" % [int(l) for l in lengths]) 258 259

ValueError :發現樣本數量不一致的輸入變量:[2935848, 2935849]

找到具有不一致數量的“”的輸入變量 ‑‑> 257” 樣本:%r” % [int(l) for l in lengths]) 258 259

ValueError: 發現輸入變量的樣本數不一致:[2935848, 2935849]

找到具有不一致數量的“”的輸入變量 ‑‑> 257” 樣本:%r” % [int(l) for l in lengths]) 258 259

ValueError: 發現輸入變量的樣本數不一致:[2935848, 2935849]


參考解法

方法 1:

Your problem is reached because you two dataframe (train and sales) have different length. Your train dataset has 2935848 samples and the sales dataset has 2935849. Both dataset has to have the same length in order to work properly. Check why this length is not matching and add one row or drop one to match them.

Secondly, but no least, you should understand what are you doing with train_test_split and which is your goal. This function inputs are X and Y, and outputs X_train, X_test, y_train, y_test. Reading your code, you are inputting two X (X_train and X_sales) with same 5 features. I hope you are doing this because some reason, be aware of this.

X are all the samples with their features, and Y are the corresponding outputs value you want to predict. Check that and evaluate is using train_test_split is the function you are looking for.

方法 2:

I have this error while I'm trying to do my confusion matrix: Found input variables with inconsistent numbers of samples: [1527, 1]

This is my code:

x = df[['gender', 'age', 'hypertension', 'ever_married', 'work_type', 'Residence_type', 'avg_glucose_level', 'bmi', 'smoking_status', 'work_type_cat', 'gender_cat', 'Residence_type_cat']]
y = df['stroke']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=20)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

scaler = StandardScaler()
x_train_scale = scaler.fit_transform(x_train)
x_test_scale = scaler.fit_transform(x_test)

KNN = KNeighborsClassifier()
x = df[['gender', 'age', 'hypertension', 'heart_disease', 'ever_married', 'work_type', 'Residence_type', 'avg_glucose_level', 'bmi', 'smoking_status', 'work_type_cat', 'gender_cat', 'Residence_type_cat']]
y = df['stroke']
print(x.head())
print(y.head())
KNN = KNN.fit(x, y)
test = pd.DataFrame()
test['gender'] = [2]
test['age'] = [3]
test['hypertension'] = [0]
test['heart_disease'] = [0]
test['ever_married'] = [2]
test['work_type'] = [4]
test['Residence_type'] = [2]
test['avg_glucose_level'] = [95.12]
test['bmi'] = [18]
test['smoking_status'] = [2]
test['work_type_cat'] = [4]
test['gender_cat'] = [1]
test['Residence_type_cat'] = [1]
y_predict = KNN.predict(test)

print(y_predict)
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, y_predict))

(by 이진규Alex Serra MarrugatRachel)

參考文件

  1. ValueError: Found input variables with inconsistent numbers of samples: [2935848, 2935849] (CC BY‑SA 2.5/3.0/4.0)

#split #Python #Testing #scikit-learn #train-test-split






相關問題

將 xml 元素內容拆分為固定行數 (Split xml element content into fix number of lines)

是否有任何標准說明“aba”.split(/a/) 是否應該返回 1,2 或 3 個元素? (Is there any standard which says if "aba".split(/a/) should return 1,2, or 3 elements?)

Cố gắng gọi các phương thức trong phương thức main với biến được khởi tạo trong các phương thức khác (Trying to call methods in main method with variable initialized in other methods)

使用 Java-Regex 與 Regex 成對拆分多行文本 (Split text with Java-Regex in pairs with Regex over several lines)

如何分割字節數組 (How to split a byte array)

String componentsSeparatedByString 做一次 (String componentsSeparatedByString do one time)

從一行文本中獲取特定數據 (Get specific data from a line of text)

(Python)拆分字符串多個分隔符更有效?1) 使用多重替換方法然後使用拆分 2) 使用正則表達式 ((Python) which is more efficient to split a string multiple separators? 1) Using multiple replace method then using split 2) using regular Expressions)

ValueError:發現樣本數量不一致的輸入變量:[2935848、2935849] (ValueError: Found input variables with inconsistent numbers of samples: [2935848, 2935849])

在 Powershell 中拆分和添加字符串 (Splitting and Adding String in Powershell)

在 python 函數中檢查月份的有效性時出錯 (Error in checking validity of month in python function)

如何將 .obj 文件拆分為其他兩個文件(python、open3d)? (How to split a .obj file into two other files (python, open3d)?)







留言討論