I"m making use of Pandas to to compare the outputs of two files loaded into two data frames (uat, prod):...

uat = uat<<"Customer Number","Product">>prod = prod<<"Customer Number","Product">>print uat<"Customer Number"> == prod<"Customer Number">print uat<"Product"> == prod<"Product">print uat == prodThe very first two match exactly:74357 True74356 TrueName: client Number, dtype: bool74357 True74356 TrueName: Product, dtype: boolFor the third print, I acquire an error:Can just compare identically-labeled DataFrame objects. If the very first two compared fine, what"s wrong with the 3rd?

Thanks




You are watching: Valueerror: can only compare identically-labeled series objects

python pandas
share
monitor
edited Apr 7 "16 at 9:27
*

Ben
61433 silver- badges1212 bronze title
request Aug 31 "13 in ~ 12:54
*

user1804633user1804633
98711 gold badge66 silver badges88 bronze title
1
include a comment |

4 answers 4


active oldest Votes
84
Here"s a small example to show this (which only used to DataFrames, no Series, until Pandas 0.19 where it applies to both):

In <1>: df1 = pd.DataFrame(<<1, 2>, <3, 4>>)In <2>: df2 = pd.DataFrame(<<3, 4>, <1, 2>>, index=<1, 0>)In <3>: df1 == df2Exception: have the right to only compare identically-labeled DataFrame objectsOne solution is to kind the index very first (Note: some functions require sorted indexes):

In <4>: df2.sort_index(inplace=True)In <5>: df1 == df2Out<5>: 0 10 True True1 True TrueNote: == is likewise sensitive to the bespeak of columns, so girlfriend may need to use sort_index(axis=1):

In <11>: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1)Out<11>: 0 10 True True1 True TrueNote: This can still progressive (if the index/columns aren"t identically labelled ~ sorting).


re-superstructure
follow
edited Nov 14 "17 at 20:06
answer Aug 31 "13 at 13:53
*

Andy HaydenAndy Hayden
310k8686 yellow badges582582 silver badges507507 bronze title
2
add a comment |
40
You can also try dropping the index pillar if that is not needed to compare:

print(df1.reset_index(drop=True) == df2.reset_index(drop=True))I have actually used this same technique in a unit test like so:

from pandas.util.testing import assert_frame_equalassert_frame_equal(actual.reset_index(drop=True), expected.reset_index(drop=True))
re-superstructure
monitor
edited Apr 28 "16 at 22:06
reply Apr 28 "16 in ~ 21:57

*

CoreDumpCoreDump
61166 silver badges44 bronze badges
3
add a comment |
11
At the time when this inquiry was asked over there wasn"t another role in Pandas to test equality, yet it has actually been included a when ago: pandas.equals

You use it favor this:

df1.equals(df2)Some differenes come == are:

You don"t acquire the error defined in the questionIt return a basic boolean.NaN values in the same place are thought about equal
re-publishing
follow
edited Dec 4 "20 at 10:36
answer Oct 16 "20 in ~ 8:37
*

flyingdutchmanflyingdutchman
71099 silver badges1515 bronze title
add a comment |


See more: Baku Ane: Otouto Shibocchau Zo! The Animation, Baku Ane Otouto Shibocchau Zo Episode 1

2
When you compare two DataFrames, you have to ensure that the number of records in the very first DataFrame matches v the variety of records in the second DataFrame. In ours example, every of the two DataFrames had actually 4 records, v 4 products and 4 prices.

If, because that example, one of the DataFrames had actually 5 products, if the other DataFrame had actually 4 products, and you do the efforts to operation the comparison, you would gain the following error:

ValueError: can only to compare identically-labeled collection objects

this should work

import pandas as pdimport numpy as npfirstProductSet = "Product1": <"Computer","Phone","Printer","Desk">, "Price1": <1200,800,200,350> df1 = pd.DataFrame(firstProductSet,columns= <"Product1", "Price1">)secondProductSet = "Product2": <"Computer","Phone","Printer","Desk">, "Price2": <900,800,300,350> df2 = pd.DataFrame(secondProductSet,columns= <"Product2", "Price2">)df1<"Price2"> = df2<"Price2"> #add the Price2 pillar from df2 to df1df1<"pricesMatch?"> = np.where(df1<"Price1"> == df2<"Price2">, "True", "False") #create new column in df1 to check if price matchdf1<"priceDiff?"> = np.where(df1<"Price1"> == df2<"Price2">, 0, df1<"Price1"> - df2<"Price2">) #create brand-new column in df1 because that price diff publish (df1)example indigenous https://datatofish.com/compare-values-dataframes/