JavaEar 专注于收集分享传播有价值的技术资料

error creating chisq.test() in R - invalid 'type' (character) of argument

I am creating a chi-squared test for independence on a data.frame called Comp1 with two binary variables and 13109 obs.

I am using the test before clustering consumers based on demographics. If the two variables are dependent on one another, then certain values will be in a cluster. The two variables are a subset from another data.frame with 36 variables.

I got an error saying the data.frame had character variables instead of factors that the str() function shows.

Why does the error say the data.frame has character values?

data:

> str(Comp1)
'data.frame':   13109 obs. of  2 variables:
 $ HomeOwnerStatus: Factor w/ 2 levels "Own","Rent": 1 2 2 2 1 2 1 1 2 2 ...
 $ MaritalStatus  : Factor w/ 2 levels "Married","Single": 2 1 1 1 2 1 2 1 1 1 ...

example:

> #Create dataset
> homeownerstatus <- c("Own", "Rent", "Own", "Own", "Rent", "Own")
> maritalstatus <- c("Married", "Married", "Married", "Single", "Single", "Married")
> Comp1 <- data.frame(homeownerstatus, maritalstatus)

error with solution:

> #Test binary variables for independence 
> #Create matrix from data.frame
> DF4 <- as.matrix(Comp1)
> #Comparison of marital status and home owner status
> #Perform chi-squared test for independence of two variables
> chisq.test(table(Comp1))

    Chi-squared test for given probabilities

data:  table(DF4)
X-squared = 295149.5, df = 71, p-value < 2.2e-16

1个回答

    最佳答案
  1. chisq.test either wants a factor vector for both its x and y arguments or a matrix or data.frame for the x argument. When a data.frame is passed, this gets converted to a matrix by the function as.matrix. This step coerces the factor columns in your data.frame to character.

    > as.matrix(Comp1)
         homeownerstatus maritalstatus
    [1,] "Own"           "Married"    
    [2,] "Rent"          "Married"    
    [3,] "Own"           "Married"    
    [4,] "Own"           "Single"     
    [5,] "Rent"          "Single"     
    [6,] "Own"           "Married"
    

    So, my suggestion would be to pass two factor vectors:

    chisq.test(Comp1$homeownerstatus, Comp1$maritalstatus)
    
            Pearson's Chi-squared test with Yates' continuity correction
    
    data:  Comp1$homeownerstatus and Comp1$maritalstatus
    X-squared = 0, df = 1, p-value = 1
    
    Warning message:
    In chisq.test(Comp1$homeownerstatus, Comp1$maritalstatus) :
      Chi-squared approximation may be incorrect
    

    EDIT

    When you pass a matrix or a data.frame to the x argument, that object is taken to be a contingency table, which is not what you want here. You have two binary variables whose contingency table should be calculated and then tested according to the chi-squared test. Therefore you should pass each factor vector as described above or, alternatively, calculate the contingency table and pass that to chisq.test.

    chisq.test(table(Comp1))