您的位置：首页 > 编程语言 > Java开发

How to Write an Equality Method in Java

2009-06-22 18:02 691 查看

by Martin Odersky, Lex Spoon, and Bill Venners
June 1, 2009

Summary
This article describes a technique for overriding the

equals

method that preserves the contract of

equals

even when subclassses of concrete classes add new fields.
In Item 8 of Effective Java1, Josh Bloch describes the difficulty of preserving the

equals

contract when subclassing as a “fundamental problem of equivalence relations in object-oriented languages.” Bloch writes:

There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you are willing to forgo the benefits of object-oriented abstraction.
Chapter 28 of Programming in Scala shows an approach that allows subclasses to extend an instantiable class, add a value component, and nevertheless preserve the

equals

contract. Although in the technique is described in the book in the context of defining Scala classes, it is also applicable to classes defined in Java. In this article, we present the technique using text adapted from the relevant section of Programming in Scala, but with the code examples translated from Scala into Java.

Commmon equality pitfalls

Class

java.lang.Object

defines an

equals

method, which subclasses may override. Unfortunately, it turns out that writing a correct equality method is surprisingly difficult in object-oriented languages. In fact, after studying a large body of Java code, the authors of a 2007 paper concluded that almost all implementations of

equals

methods are faulty.2

This is problematic, because equality is at the basis of many other things. For one, a faulty equality method for a type

might mean that you cannot reliably put an object of type

in a collection. You might have two elements

elem1

elem2

of type

that are equal, i.e., “

elem1.equals(elem2)

” yields

true

. Nevertheless, with commonly occurring faulty implementations of the

equals

method, you could still see behavior like the following:

Set<C> hashSet = new java.util.HashSet<C>();
hashSet.add(elem1);
hashSet.contains(elem2);    // returns false!

Here are four common pitfalls3 that can cause inconsistent behavior when overriding

equals

:

Defining

equals

with the wrong signature.

Changing

equals

without also changing

hashCode

.

Defining

equals

in terms of mutable fields.

Failing to define

equals

as an equivalence relation.

These four pitfalls are discussed in the remainder of this section.

Pitfall #1: Defining

equals

with the wrong signature.

Consider adding an equality method to the following class of simple points:

public class Point {

private final int x;
private final int y;

public Point(int x, int y) {
this.x = x;
this.y = y;
}

public int getX() {
return x;
}

public int getY() {
return y;
}

// ...
}

A seemingly obvious, but wrong way would be to define it like this:

// An utterly wrong definition of equals
public boolean equals(Point other) {
return (this.getX() == other.getX() && this.getY() == other.getY());
}

What's wrong with this method? At first glance, it seems to work OK:

Point p1 = new Point(1, 2);
Point p2 = new Point(1, 2);

Point q = new Point(2, 3);

System.out.println(p1.equals(p2)); // prints true

System.out.println(p1.equals(q)); // prints false

However, trouble starts once you start putting points into a collection:

import java.util.HashSet;

HashSet<Point> coll = new HashSet<Point>();
coll.add(p1);

System.out.println(coll.contains(p2)); // prints false

How can it be that

coll

does not contain

p2

, even though

p1

was added to it, and

p1

and

p2

are equal objects? The reason becomes clear in the following interaction, where the precise type of one of the compared points is masked. Define

p2a

as an alias of

p2

, but with type

Object

instead of

Point

Object p2a = p2;

Now, were you to repeat the first comparison, but with the alias

p2a

instead of

p2

, you would get:

System.out.println(p1.equals(p2a)); // prints false

What went wrong? In fact, the version of

equals

given previously does not override the standard method

equals

, because its type is different. Here is the type of the

equals

method as it is defined in the root class

Object

public boolean equals(Object other)

Because the

equals

method in

Point

takes a

Point

instead of an

Object

as an argument, it does not override

equals

Object

. Instead, it is just an overloaded alternative. Overloading in Java is resolved by the static type of the argument, not the run-time type. So as long as the static type of the argument is

Point

, the

equals

method in

Point

is called. However, once the static argument is of type

Object

, the

equals

method in

Object

is called instead. This method has not been overridden, so it is still implemented by comparing object identity. That's why the comparison “

p1.equals(p2a)

” yields

false

even though points

p1

and

p2a

have the same

and

values. That's also why the

contains

method in

HashSet

returned

false

. Since that method operates on generic sets, it calls the generic

equals

method in

Object

instead of the overloaded variant in

Point

.

A better

equals

method is the following:

// A better definition, but still not perfect
@Override public boolean equals(Object other) {
boolean result = false;
if (other instanceof Point) {
Point that = (Point) other;
result = (this.getX() == that.getX() && this.getY() == that.getY());
}
return result;
}

Now

equals

has the correct type. It takes a value of type

Object

as parameter and it yields a

boolean

result. The implementation of this method uses

instanceof

and a cast. It first tests whether the

other

object is also of type

Point

. If it is, it compares the coordinates of the two points and returns the result. Otherwise the result is

false

Pitfall #2: Changing

equals

without also changing

hashCode

If you repeat the comparison of

p1

and

p2a

with the latest definition of

Point

defined previously, you will get

true

, as expected. However, if you repeat the

HashSet.contains

test, you will probably still get

false

Point p1 = new Point(1, 2);
Point p2 = new Point(1, 2);

HashSet<Point> coll = new HashSet<Point>();
coll.add(p1);

System.out.println(coll.contains(p2)); // prints false (probably)

In fact, this outcome is not 100% certain. You might also get

true

from the experiment. If you do, you can try with some other points with coordinates 1 and 2. Eventually, you'll get one which is not contained in the set. What goes wrong here is that

Point

redefined

equals

without also redefining

hashCode

.

Note that the collection in the example above is a

HashSet

. This means elements of the collection are put in “hash buckets” determined by their hash code. The

contains

test first determines a hash bucket to look in and then compares the given elements with all elements in that bucket. Now, the last version of class

Point

did redefine

equals

, but it did not at the same time redefine

hashCode

. So

hashCode

is still what it was in its version in class

Object

: some transformation of the address of the allocated object. The hash codes of

p1

and

p2

are almost certainly different, even though the fields of both points are the same. Different hash codes mean with high probability different hash buckets in the set. The

contains

test will look for a matching element in the bucket which corresponds to

p2

's hash code. In most cases, point

p1

will be in another bucket, so it will never be found.

p1

and

p2

might also end up by chance in the same hash bucket. In that case the test would return

true

.

The problem was that the last implementation of

Point

violated the contract on

hashCode

as defined for class

Object

:

If two objects are equal according to the

equals(Object)

method, then calling the

hashCode

method on each of the two objects must produce the same integer result.
In fact, it's well known in Java that

hashCode

and

equals

should always be redefined together. Furthermore,

hashCode

may only depend on fields that

equals

depends on. For the

Point

class, the following would be a suitable definition of

hashCode

public class Point {

private final int x;
private final int y;

public Point(int x, int y) {
this.x = x;
this.y = y;
}

public int getX() {
return x;
}

public int getY() {
return y;
}

@Override public boolean equals(Object other) {
boolean result = false;
if (other instanceof Point) {
Point that = (Point) other;
result = (this.getX() == that.getX() && this.getY() == that.getY());
}
return result;
}

@Override public int hashCode() {
return (41 * (41 + getX()) + getY());
}
}

This is just one of many possible implementations of

hashCode

. Adding the constant

to one integer field

, multiplying the result with the prime number

, and adding to that result the other integer field

gives a reasonable distribution of hash codes at a low cost in running time and code size.

Adding

hashCode

fixes the problems of equality when defining classes like

Point

. However, there are still other trouble spots to watch out for.

Pitfall #3: Defining

equals

in terms of mutable fields

Consider the following slight variation of class

Point

public class Point {

private int x;
private int y;

public Point(int x, int y) {
this.x = x;
this.y = y;
}

public int getX() {
return x;
}

public int getY() {
return y;
}

public void setX(int x) { // Problematic
this.x = x;
}

public void setY(int y) {
this.y = y;
}

@Override public boolean equals(Object other) {
boolean result = false;
if (other instanceof Point) {
Point that = (Point) other;
result = (this.getX() == that.getX() && this.getY() == that.getY());
}
return result;
}

@Override public int hashCode() {
return (41 * (41 + getX()) + getY());
}
}

The only difference is that the fields

and

are no longer final, and two

set

methods have been added that allow clients to change the

and

values. The

equals

and

hashCode

methods are now defined in terms of these mutable fields, so their results change when the fields change. This can have strange effects once you put points in collections:

Point p = new Point(1, 2);

HashSet<Point> coll = new HashSet<Point>();
coll.add(p);

System.out.println(coll.contains(p)); // prints true

Now, if you change a field in point

, does the collection still contain the point? We'll try it:

p.setX(p.getX() + 1);

System.out.println(coll.contains(p)); // prints false (probably)

This looks strange. Where did

go? More strangeness results if you check whether the iterator of the set contains

Iterator<Point> it = coll.iterator();
boolean containedP = false;
while (it.hasNext()) {
Point nextP = it.next();
if (nextP.equals(p)) {
containedP = true;
break;
}
}

System.out.println(containedP); // prints true

So here's a set that does not contain

, yet

is among the elements of the set! What happened, of course, is that after the change to the

field, the point

ended up in the wrong hash bucket of the set

coll

. That is, its original hash bucket no longer corresponded to the new value of its hash code. In a manner of speaking, the point

“dropped out of sight” in the set

coll

even though it still belonged to its elements.

The lesson to be drawn from this example is that when

equals

and

hashCode

depend on mutable state, it may cause problems for users. If they put such objects into collections, they have to be careful never to modify the depended-on state, and this is tricky. If you need a comparison that takes the current state of an object into account, you should usually name it something else, not

equals

. Considering the last definition of

Point

, it would have been preferable to omit a redefinition of

hashCode

and to name the comparison method

equalContents

, or some other name different from

equals

Point

would then have inherited the default implementation of

equals

and

hashCode

. So

would have stayed locatable in

coll

even after the modification to its

field.

Pitfall #4: Failing to define

equals

as an equivalence relation

The contract of the

equals

method in

Object

specifies that

equals

must implement an equivalence relation on non-

null

objects:

It is reflexive: for any non-null value

It is symmetric: for any non-null values

It is transitive: for any non-null values

It is consistent: for any non-null values

For any non-null value

, the expression

x.equals(x)

should return

true

and

x.equals(y)

should return

true

if and only if

y.equals(x)

returns

true

, and

, if

x.equals(y)

returns

true

and

y.equals(z)

returns

true

, then

x.equals(z)

should return

true

and

, multiple invocations of

x.equals(y)

should consistently return

true

or consistently return

false

, provided no information used in equals comparisons on the objects is modified.

x.equals(null)

should return

false

.
The definition of

equals

developed so far for class

Point

satisfies the contract for

equals

. However, things become more complicated once subclasses are considered. Say there is a subclass

ColoredPoint

Point

that adds a field

color

of type

Color

. Assume

Color

is defined as an enum:

public enum Color {
RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET;
}

ColoredPoint

overrides

equals

to take the new

color

field into account:

public class ColoredPoint extends Point { // Problem: equals not symmetric

private final Color color;

public ColoredPoint(int x, int y, Color color) {
super(x, y);
this.color = color;
}

@Override public boolean equals(Object other) {
boolean result = false;
if (other instanceof ColoredPoint) {
ColoredPoint that = (ColoredPoint) other;
result = (this.color.equals(that.color) && super.equals(that));
}
return result;
}
}

This is what many programmers would likely write. Note that in this case, class

ColoredPoint

need not override

hashCode

. Because the new definition of

equals

ColoredPoint

is stricter than the overridden definition in

Point

(meaning it equates fewer pairs of objects), the contract for

hashCode

stays valid. If two colored points are equal, they must have the same coordinates, so their hash codes are guaranteed to be equal as well.

Taking the class

ColoredPoint

by itself, its definition of

equals

looks OK. However, the contract for

equals

is broken once points and colored points are mixed. Consider:

Point p = new Point(1, 2);

ColoredPoint cp = new ColoredPoint(1, 2, Color.RED);

System.out.println(p.equals(cp)); // prints true

System.out.println(cp.equals(p)); // prints false

The comparison “

p equals cp

” invokes

equals

method, which is defined in class

Point

. This method only takes into account the coordinates of the two points. Consequently, the comparison yields

true

. On the other hand, the comparison “

cp equals p

” invokes

cp

equals

method, which is defined in class

ColoredPoint

. This method returns

false

, because

is not a

ColoredPoint

. So the relation defined by

equals

is not symmetric.

The loss in symmetry can have unexpected consequences for collections. Here's an example:

Set<Point> hashSet1 = new java.util.HashSet<Point>();
hashSet1.add(p);
System.out.println(hashSet1.contains(cp));    // prints false

Set<Point> hashSet2 = new java.util.HashSet<Point>();
hashSet2.add(cp);
System.out.println(hashSet2.contains(p));    // prints true

So even though

and

cp

are equal, one

contains

test succeeds whereas the other one fails.

How can you change the definition of

equals

so that it becomes symmetric? Essentially there are two ways. You can either make the relation more general or more strict. Making it more general means that a pair of two objects,

and

, is taken to be equal if either comparing

with

or comparing

with

yields

true

. Here's code that does this:

public class ColoredPoint extends Point { // Problem: equals not transitive

private final Color color;

public ColoredPoint(int x, int y, Color color) {
super(x, y);
this.color = color;
}

@Override public boolean equals(Object other) {
boolean result = false;
if (other instanceof ColoredPoint) {
ColoredPoint that = (ColoredPoint) other;
result = (this.color.equals(that.color) && super.equals(that));
}
else if (other instanceof Point) {
Point that = (Point) other;
result = that.equals(this);
}
return result;
}
}

The new definition of

equals

ColoredPoint

checks one more case than the old one: If the

other

object is a

Point

but not a

ColoredPoint

, the method forwards to the

equals

method of

Point

. This has the desired effect of making

equals

symmetric. Now, both “

cp.equals(p)

” and “

p.equals(cp)

” result in

true

. However, the contract for

equals

is still broken. Now the problem is that the new relation is no longer transitive! Here's a sequence of statements that demonstrates this. Define a point and two colored points of different colors, all at the same position:

ColoredPoint redP = new ColoredPoint(1, 2, Color.RED);
ColoredPoint blueP = new ColoredPoint(1, 2, Color.BLUE);

Taken individually,

redp

is equal to

and

is equal to

bluep

System.out.println(redP.equals(p)); // prints true

System.out.println(p.equals(blueP)); // prints true

However, comparing

redP

and

blueP

yields

false

System.out.println(redP.equals(blueP)); // prints false

Hence, the transitivity clause of

equals

's contract is violated.

Making the

equals

relation more general seems to lead to a dead end. We'll try to make it stricter instead. One way to make

equals

stricter is to always treat objects of different classes as different. That could be achieved by modifying the

equals

methods in classes

Point

and

ColoredPoint

. In class

Point

, you could add an extra comparison that checks whether the run-time class of the other

Point

is exactly the same as this

Point

's class, as follows:

// A technically valid, but unsatisfying, equals method
public class Point {

private final int x;
private final int y;

public Point(int x, int y) {
this.x = x;
this.y = y;
}

public int getX() {
return x;
}

public int getY() {
return y;
}

@Override public boolean equals(Object other) {
boolean result = false;
if (other instanceof Point) {
Point that = (Point) other;
result = (this.getX() == that.getX() && this.getY() == that.getY()
&& this.getClass().equals(that.getClass()));
}
return result;
}

@Override public int hashCode() {
return (41 * (41 + getX()) + getY());
}
}

You can then revert class

ColoredPoint

's implementation back to the version that previously had violated the symmetry requirement:4

public class ColoredPoint extends Point { // No longer violates symmetry requirement

private final Color color;

public ColoredPoint(int x, int y, Color color) {
super(x, y);
this.color = color;
}

@Override public boolean equals(Object other) {
boolean result = false;
if (other instanceof ColoredPoint) {
ColoredPoint that = (ColoredPoint) other;
result = (this.color.equals(that.color) && super.equals(that));
}
return result;
}
}

Here, an instance of class

Point

is considered to be equal to some other instance of the same class only if the objects have the same coordinates and they have the same run-time class, meaning

.getClass()

on either object returns the same value. The new definitions satisfy symmetry and transitivity because now every comparison between objects of different classes yields

false

. So a colored point can never be equal to a point. This convention looks reasonable, but one could argue that the new definition is too strict.

Consider the following slightly roundabout way to define a point at coordinates

(1, 2)

Point pAnon = new Point(1, 1) {
@Override public int getY() {
return 2;
}
};

pAnon

equal to

? The answer is no because the

java.lang.Class

objects associated with

and

pAnon

are different. For

it is

Point

, whereas for

pAnon

it is an anonymous subclass of

Point

. But clearly,

pAnon

is just another point at coordinates

(1, 2)

. It does not seem reasonable to treat it as being different from

The

canEqual

method

So it seems we are stuck. Is there a sane way to redefine equality on several levels of the class hierarchy while keeping its contract? In fact, there is such a way, but it requires one more method to redefine together with

equals

and

hashCode

. The idea is that as soon as a class redefines

equals

(and

hashCode

), it should also explicitly state that objects of this class are never equal to objects of some superclass that implement a different equality method. This is achieved by adding a method

canEqual

to every class that redefines

equals

. Here's the method's signature:

public boolean canEqual(Object other)

The method should return

true

if the

other

object is an instance of the class in which

canEqual

is (re)defined,

false

otherwise. It is called from

equals

to make sure that the objects are comparable both ways. Here is a new (and final) implementation of class

Point

along these lines:

public class Point {

private final int x;
private final int y;

public Point(int x, int y) {
this.x = x;
this.y = y;
}

public int getX() {
return x;
}

public int getY() {
return y;
}

@Override public boolean equals(Object other) {
boolean result = false;
if (other instanceof Point) {
Point that = (Point) other;
result = (that.canEqual(this) && this.getX() == that.getX() && this.getY() == that.getY());
}
return result;
}

@Override public int hashCode() {
return (41 * (41 + getX()) + getY());
}

public boolean canEqual(Object other) {
return (other instanceof Point);
}
}

The

equals

method in this version of class

Point

contains the additional requirement that the other object can equal this one, as determined by the

canEqual

method. The implementation of

canEqual

Point

states that all instances of

Point

can be equal.

Here's the corresponding implementation of

ColoredPoint

public class ColoredPoint extends Point { // No longer violates symmetry requirement

private final Color color;

public ColoredPoint(int x, int y, Color color) {
super(x, y);
this.color = color;
}

@Override public boolean equals(Object other) {
boolean result = false;
if (other instanceof ColoredPoint) {
ColoredPoint that = (ColoredPoint) other;
result = (that.canEqual(this) && this.color.equals(that.color) && super.equals(that));
}
return result;
}

@Override public int hashCode() {
return (41 * super.hashCode() + color.hashCode());
}

@Override public boolean canEqual(Object other) {
return (other instanceof ColoredPoint);
}
}

It can be shown that the new definition of

Point

and

ColoredPoint

keeps the contract of

equals

. Equality is symmetric and transitive. Comparing a

Point

to a

ColoredPoint

always yields

false

. Indeed, for any point

and colored point

cp

, “

p.equals(cp)

” will return

false

because “

cp.canEqual(p)

” will return

false

. The reverse comparison, “

cp.equals(p)

”, will also return

false

, because

is not a

ColoredPoint

, so the first

instanceof

check in the body of

equals

ColoredPoint

will fail.

On the other hand, instances of different subclasses of

Point

can be equal, as long as none of the classes redefines the equality method. For instance, with the new class definitions, the comparison of

and

pAnon

would yield true. Here are some examples:

Point p = new Point(1, 2);

ColoredPoint cp = new ColoredPoint(1, 2, Color.INDIGO);

Point pAnon = new Point(1, 1) {
@Override public int getY() {
return 2;
}
};
Set<Point> coll = new java.util.HashSet<Point>();
coll.add(p);

System.out.println(coll.contains(p)); // prints true

System.out.println(coll.contains(cp)); // prints false

System.out.println(coll.contains(pAnon)); // prints true

These examples demonstrate that if a superclass

equals

implementation defines and calls

canEqual

, then programmers who implement subclasses can decide whether or not their subclasses may be equal to instances of the superclass. Because

ColoredPoint

overrides

canEqual

, for example, a colored point may never be equal to a plain-old point. But because the anonymous subclass referenced from

pAnon

does not override

canEqual

, its instance can be equal to a

Point

instance.

One potential criticism of the

canEqual

approach is that it violates the Liskov Substitution Principle (LSP). For example, the technique of implementing

equals

by comparing run-time classes, which led to the inability to define a subclass whose instances can equal instances of the superclass, has been described as a violation of the LSP.5 The reasoning is that the LSP states you should be able to use (substitute) a subclass instance where a superclass instance is required. In the previous example, however, “

coll.contains(cp)

” returned

false

even though

cp

and

values matched those of the point in the collection. Thus it may seem like a violation of the LSP, because you can't use a

ColoredPoint

here where a

Point

is expected. We believe this is the wrong interpretation, though, because the LSP doesn't require that a subclass behaves identically to its superclass, just that it behaves in a way that fulfills the contract of its superclass.

The problem with writing an

equals

method that compares run-time classes is not that it violates the LSP, but that it doesn't give you a way to create a subclass whose instances can equal superclass instances. For example, had we used the run-time class technique in the previous example, “

coll.contains(pAnon)

” would have returned

false

, and that's not what we wanted. By contrast, we really did want “

coll.contains(cp)

” to return

false

, because by overriding

equals

ColoredPoint

, we were basically saying that an indigo-colored point at coordinates (1, 2) is not the same thing as an uncolored point at (1, 2). Thus, in the previous example we were able to pass two different

Point

subclass instances to the collection's

contains

method, and we got back two different answers, both correct.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

How to Write an Equality Method in Java

Commmon equality pitfalls

Pitfall #1: Defining
equals
with the wrong signature.

Pitfall #2: Changing
equals
without also changing
hashCode

Pitfall #3: Defining
equals
in terms of mutable fields

Pitfall #4: Failing to define
equals
as an equivalence relation

The
canEqual
method