JavaEar 专注于收集分享传播有价值的技术资料

Pandas pivot table - ValueError: Index contains duplicate entries, cannot reshape

Seaborn Heatmap I want to add columns (data for additional years) to my seaborn heatmap. This is the code I am using:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame({'Month': ['January','February','March','April','May','June','July','August','September','October','November','December',
                             'January','February','March','April','May','June','July','August','September','October','November','December'],
                   'Year': [2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,
                            2012,2012,2012,2012,2012,2012,2012,2012,2012,2012,2012,2012,],
                   'hPM2.5': [18,17,21,14,7,7,8,7,9,11,23,5,
                              18,17,21,14,7,7,8,7,9,11,23,5,
                              18,17,21,14,7,7,8,7,9,11,23,5,
                              18,17,21,14,7,7,8,7,9,11,23,5]})

cats = ['January','February','March','April','May','June',
    'July','August','September','October','November','December']
df['Month'] = df['Month'].astype('category', 
                              ordered=True,
                              categories=cats)

df2 = df.pivot("Month", "Year", "hPM2.5")
sns.heatmap(df2, annot=True)

So to try get 2012 data in, the pivot table seems to require 24 entries of Jan, Feb etc and of 2011, 2011...2012, 2012.. otherwise I get: ValueError: arrays must all be same length. But as I repeat Jan, Feb. etc it gives this duplicate values error. I cannot seem to get the heatmap to work without using pivot table given by seaborn example. How can I get round this problem?

2个回答

    最佳答案
  1. The problem is the construction of your dataframe, you're passing a list of length 48 for hPM2.5 and only 24 for both Month and Year.

    This works fine:

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    df = pd.DataFrame({'Month': ['January','February','March','April','May','June','July','August','September','October','November','December',
                                 'January','February','March','April','May','June','July','August','September','October','November','December'],
                       'Year': [2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,
                                2012,2012,2012,2012,2012,2012,2012,2012,2012,2012,2012,2012,],
                       'hPM2.5': [18,17,21,14,7,7,8,7,9,11,23,5,
                                  18,17,21,14,7,7,8,7,9,11,23,5]})
    
    cats = ['January','February','March','April','May','June',
        'July','August','September','October','November','December']
    df['Month'] = df['Month'].astype('category', 
                                  ordered=True,
                                  categories=cats)
    
    df2 = df.pivot("Month", "Year", "hPM2.5")
    sns.heatmap(df2, annot=True)
    

    Seaborn heatmap

  2. 参考答案2
  3. As I understand your problem, you don't want to have to repeat Jan-Dec for each year and duplicate the years for each month in your input data. If that's the case, all you really need to do is enter it in the representation that you get out of the .pivot() instead. After a little cleanup of your input data, df2.to_dict(orient="list") gives:

    {2011: [18, 17, 21, 14, 7, 7, 8, 7, 9, 11, 23, 5],
     2012: [18, 17, 21, 14, 7, 7, 8, 7, 9, 11, 23, 5]}
    

    You can then just do:

    df = pd.DataFrame({2011: [18, 17, 21, 14, 7, 7, 8, 7, 9, 11, 23, 5], 
                       2012: [18, 17, 21, 14, 7, 7, 8, 7, 9, 11, 23, 5]}, index=cats)
    sns.heatmap(df, annot=True)