JavaEar 专注于收集分享传播有价值的技术资料

TypeError: cannot do slice indexing on with these indexers [0.0] of

I am try to classification income<=50k or >50k and write cross-validation function to get each Accuracy

X = df[['age','workclass','fnlwgt','education','marital_status','occupation','relationship','race','sex']]
y = df['income']
k_fold = 10


def k_fold_generator(X, y, k_fold):
subset_size = len(X) / k_fold  
for k in range(k_fold):
    X_train = X[:k * subset_size] + X[(k + 1) * subset_size:]
    X_test = X[k * subset_size:][:subset_size]
    y_train = y[:k * subset_size] + y[(k + 1) * subset_size:]
    y_test = y[k * subset_size:][:subset_size]

    yield X_train, y_train, X_test, y_test

above are ok

but in

for X_train, y_train, X_test, y_test in k_fold_generator(X, y, k_fold):
        print("Error")

TypeError: cannot do slice indexing on "class 'pandas.core.indexes.numeric.Int64Index'" with these indexers [0.0] of "class 'float'"

1个回答

    最佳答案
  1. subset_size is a float.

    Which is exactly why slicing, which expects integers, does not work, as the error message tells you. I suggest working your way through a quick, basic Python tutorial before trying advanced tasks. :)

    Presumably you happened upon some example code in Python 2, where the default division behavior is integer division and now try to execute it in a Python 3.x version. You can try subset_size = len(X) // k_fold which forces integer division. Alternatively you can do int(round(k*subset_size)) every time. I suggest the former.

    Additionally, as your training data X_Train seems to be a proper pandas.DataFrame, you will probably have to resort to explicit integer slicing using .iloc.