python – Pandas Can only compare identically-labeled DataFrame objects error

python – Pandas Can only compare identically-labeled DataFrame objects error

Heres a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):

In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])

In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])

In [3]: df1 == df2
Exception: Can only compare identically-labeled DataFrame objects

One solution is to sort the index first (Note: some functions require sorted indexes):

In [4]: df2.sort_index(inplace=True)

In [5]: df1 == df2
Out[5]: 
      0     1
0  True  True
1  True  True

Note: == is also sensitive to the order of columns, so you may have to use sort_index(axis=1):

In [11]: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1)
Out[11]: 
      0     1
0  True  True
1  True  True

Note: This can still raise (if the index/columns arent identically labelled after sorting).

You can also try dropping the index column if it is not needed to compare:

print(df1.reset_index(drop=True) == df2.reset_index(drop=True))

I have used this same technique in a unit test like so:

from pandas.util.testing import assert_frame_equal

assert_frame_equal(actual.reset_index(drop=True), expected.reset_index(drop=True))

python – Pandas Can only compare identically-labeled DataFrame objects error

At the time when this question was asked there wasnt another function in Pandas to test equality, but it has been added a while ago: pandas.equals

You use it like this:

df1.equals(df2)

Some differenes to == are:

  • You dont get the error described in the question
  • It returns a simple boolean.
  • NaN values in the same location are considered equal
  • 2 DataFrames need to have the same dtype to be considered equal, see this stackoverflow question

Leave a Reply

Your email address will not be published.