(泛型)Java theory and practice: Generics gotchas
2016-10-08 22:05
399 查看
Generic types (or generics) bear a superficial resemblance to templates in C++, both in their syntax and in their expected use cases (such as container classes). But the similarity is only skin-deep -- generics in the Java language are implemented almost entirely
in the compiler, which performs type checking and type inference, and then generates ordinary, non-generic bytecodes. This implementation technique, called erasure (where
the compiler uses the generic type information to ensure type safety, but then erases it before generating the bytecode), has some surprising, and sometimes confusing, consequences. While generics are a big step forward for type safety in Java classes, learning
to use generics will almost certainly provide some opportunity for head-scratching (and sometimes cursing) along the way.
Note: This article assumes familiarity
with the basics of generics in JDK 5.0.
While you might find it helpful to think of collections as being an abstraction of arrays, they have some special properties that collections do not. Arrays in the Java language are covariant -- which means that if
it does), then not only is an
a
but an
also a
and you are free to pass or assign an
a
called for. (More formally, if
a supertype of
then
a supertype of
You might think the same is true of generic types as well -- that
a supertype of
and that you can pass a
a
expected. Unfortunately, it doesn't work that way.
It turns out there's a good reason it doesn't work that way: It would break the type safety generics were supposed to provide. Imagine you could assign a
a
Then the following code would allow you to put something that wasn't an
a
Because
a
adding a
it seems perfectly legal. But if
aliased with
then it would break the type-safety promise implicit in the definition of
that it is a list of integers, which is why generic types cannot be covariant.
Another consequence of the fact that arrays are covariant but generics are not is that you cannot instantiate an array of a generic type (
The last line will throw a
because you've managed to cram a
what should have been a
Because array covariance would have allowed you to subvert the type safety of generics, instantiating arrays of generic types (except for types whose type arguments are unbounded wildcards, like
has been disallowed.
Back
to top
Because of erasure,
the same class, and the compiler only generates one class when compiling
in C++). As a result, the compiler doesn't know what type is represented by
compiling the
and so you can't do certain things with a type parameter (the
within the class definition of
you could do if you knew what class was being represented.
Because the runtime cannot tell a
a
runtime, they're both just
constructing variables whose type is identified by a generic type parameter is problematic. This lack of type information at runtime poses a problem for generic container classes and for generic classes that want to make defensive copies.
Consider the generic class
Suppose the
wants to make a defensive copy of the
on entry? You don't have many options. You'd like to implement
this:
But you cannot use a type parameter to access a constructor because, at compile time, you don't know what class is being constructed and therefore what constructors are available. There's no way to express a constraint like "
have a copy constructor" (or even a no-arg constructor) using generics, so accessing constructors for classes represented by generic type parameters is out.
What about
Let's say
defined to make
Unfortunately, you still can't call
Why? Because
protected access in
to call
you have to call it through a reference to a class that has overridden
be public. But
not known to redeclare
public, so cloning is also out.
OK, so you can't copy a reference to a type whose class is totally unknown at compile time. What about wildcard types? Suppose you want to make a defensive copy of a parameter whose type is
You know that
a copy constructor. You've also been told it's better to use
of the raw type
you don't know the type of the set's contents, because that approach is likely to emit fewer unchecked conversion warnings. So you try this:
Unfortunately, you can't invoke a generic constructor with a wildcard type argument, even though you know such a constructor exists. However, you can do this:
This construction is not the most obvious, but it is type-safe and will do what you think
How would you implement
The
is supposed to manage an array of
so you might expect the constructor for
create an array of
But this code does not work -- you cannot instantiate an array of a type represented by a type parameter. The compiler doesn't know what type
represents, so it cannot instantiate an array of
The Collections classes use an ugly trick to get around this problem, one that generates an unchecked conversion warning when the Collections classes are compiled. The constructor for the real implementation of
likes this:
Why does this code not generate an
accessed? After all, you can't assign an array of
an array of
Well, because generics are implemented by erasure, the type of
actually
because
the erasure of
This means that the class is really expecting
be an array of
but the compiler does extra type checking to ensure that it contains only objects of type
So this approach will work, but it's ugly, and not really something to emulate (even the authors of the generified Collections framework say so -- see Resources).
An alternate approach would have been to declare
an array of
and cast it to
it is used. You would still get unchecked conversion warnings (as you do with the previous approach), but it would have made some unstated assumptions (such as the fact that
not escape the implementation of
more clear.
The best approach would be to pass a class literal (
into the constructor, so that the implementation could know, at runtime, the value of
The reason this approach was not taken was for backward compatibility -- then the new generified collection classes would not be compatible with previous versions of the Collections framework.
Here's how
have looked under this approach:
But wait! There's still an ugly, unchecked cast there when calling
Why? Again, it's because of backward compatibility. The signature of
instead of the type-safe:
Why was
this way? Again, the frustrating answer is to preserve backward compatibility. To create an array of a primitive type, such as
you call
the
from the appropriate wrapper class (in the case of
you would pass
the class literal). Generifying
a parameter of
of
have been more type-safe for reference types, but would have made it impossible to use
create an instance of a primitive array. Perhaps in the future, an alternate version of
be provided for reference types so you can have it both ways.
You may see a pattern beginning to emerge here -- many of the problems, or compromises, associated with generics are not issues with generics themselves, but side effects of the requirement to preserve backward compatibility with existing code.
Back
to top
Converting existing library classes to work smoothly with generics was no small feat, but as is often the case, backward-compatibility does not come for free. I've already talked about two examples where backward compatibility limited the generification of
the class libraries.
Another method that would probably have been generified differently had backward compatibility not been an issue is
The array passed into
two purposes -- if the collection is small enough to fit in the array provided, the contents will simply be placed in that array. Otherwise, a new array of the same type will be created, using reflection, to receive the results. If the Collections framework
were being rewritten from scratch, it is likely that the argument to
not be an array, but instead a class literal:
Because the Collections framework is widely emulated as an example of good class design, it is worth noting the areas in which its design was constrained by backwards compatibility, so that those aspects are not emulated blindly.
One element of the generifed Collections API that is often confusing at first is the signatures of
and
You might expect the signatures for
be:
But it is in fact:
Why is this? Again, the answer lies in backward compatibility. The interface contract of
"if
contained in
remove it; otherwise, do nothing." If
a generic collection,
not have to be type-compatible with the type parameter of
If
generified to only be callable if its argument was type-compatible (
If the above fragment were generified in the obvious way (making
then the code above would not compile if the signature of
its argument to be a
and
to be defined with a weaker type constraint than they might have had they been redesigned from scratch for generics.
Existing classes that were designed before generics may have semantics that resist the "obvious" generification approach. In these cases, you will probably make compromises like the ones described here, but when designing new generic classes from scratch, it
is valuable to understand which idioms from the Java class libraries are the result of backward compatibility, so they are not emulated inappropriately.
Back
to top
Because generics are implemented almost entirely in the Java compiler, and not in the runtime, nearly all type information about generic types has been "erased" by the time the bytecode is generated. In other words, the compiler generates pretty much the same
code you would have written by hand without generics, casts and all, after checking the type-safety of your program. Unlike in C++,
the same class (although they are different types, both subtypes of
a distinction that is more important in JDK 5.0 than in previous versions of the language).
One of the implications of erasure is that a class cannot implement both
because both of these are in fact the same interface, specifying the same
It might seem sensible to want to declare a
that is comparable to both
and
but to the Java compiler, you would be trying to declare the same method twice:
Another consequence of erasure is that using casts or
generic type parameters doesn't make any sense. The following code will not improve the type safety of your code at all:
The compiler will simply emit an unchecked conversion warning, because it doesn't know if the cast is safe or not. The
will not in fact do any casting at all --
simply be replaced with its erasure (
and the object passed in will be cast to
not what was intended.
Erasure is also responsible for the construction issues described above -- that you cannot create an object of generic type because the compiler doesn't know what constructor to call. If your generic class needs to be constructing objects whose type is specified
by generic type parameters, its constructors should take a class literal (
and store it so that instances can be created with reflection.
Back
to top
Generics are a big step forward in the type-safety of the Java language, but the design of the generics facility, and the generification of the class library, were not without compromises. Extending the virtual machine instruction set to support generics was
deemed unacceptable, because that might make it prohibitively difficult for JVM vendors to update their JVM. Accordingly, the approach of erasure, which could be implemented entirely in the compiler, was adopted. Similarly, in generifying the Java class libraries,
the desire to maintain backward compatibility placed many constraints on how the class libraries could be generified, resulting in some confusing and frustrating constructions (like
These are not problems with generics per se, but with the practicality of language evolution and compatibility. But they can make generics a little more confusing, and frustrating, to learn and use.
in the compiler, which performs type checking and type inference, and then generates ordinary, non-generic bytecodes. This implementation technique, called erasure (where
the compiler uses the generic type information to ensure type safety, but then erases it before generating the bytecode), has some surprising, and sometimes confusing, consequences. While generics are a big step forward for type safety in Java classes, learning
to use generics will almost certainly provide some opportunity for head-scratching (and sometimes cursing) along the way.
Note: This article assumes familiarity
with the basics of generics in JDK 5.0.
Generics are not covariant
While you might find it helpful to think of collections as being an abstraction of arrays, they have some special properties that collections do not. Arrays in the Java language are covariant -- which means that if Integerextends
Number(which
it does), then not only is an
Integeralso
a
Number,
but an
Integer[]is
also a
Number[],
and you are free to pass or assign an
Integer[]where
a
Number[]is
called for. (More formally, if
Numberis
a supertype of
Integer,
then
Number[]is
a supertype of
Integer[].)
You might think the same is true of generic types as well -- that
List<Number>is
a supertype of
List<Integer>,
and that you can pass a
List<Integer>where
a
List<Number>is
expected. Unfortunately, it doesn't work that way.
It turns out there's a good reason it doesn't work that way: It would break the type safety generics were supposed to provide. Imagine you could assign a
List<Integer>to
a
List<Number>.
Then the following code would allow you to put something that wasn't an
Integerinto
a
List<Integer>:
List<Integer> li = new ArrayList<Integer>(); List<Number> ln = li; // illegal ln.add(new Float(3.1415));
Because
lnis
a
List<Number>,
adding a
Floatto
it seems perfectly legal. But if
lnwere
aliased with
li,
then it would break the type-safety promise implicit in the definition of
li--
that it is a list of integers, which is why generic types cannot be covariant.
More covariance troubles
Another consequence of the fact that arrays are covariant but generics are not is that you cannot instantiate an array of a generic type (new List<String>[3]is illegal), unless the type argument is an unbounded wildcard (
new List<?>[3]is legal). Let's see what would happen if you were allowed to declare arrays of generic types:
List<String>[] lsa = new List<String>[10]; // illegal Object[] oa = lsa; // OK because List<String> is a subtype of Object List<Integer> li = new ArrayList<Integer>(); li.add(new Integer(3)); oa[0] = li; String s = lsa[0].get(0);
The last line will throw a
ClassCastException,
because you've managed to cram a
List<Integer>into
what should have been a
List<String>.
Because array covariance would have allowed you to subvert the type safety of generics, instantiating arrays of generic types (except for types whose type arguments are unbounded wildcards, like
List<?>)
has been disallowed.
Back
to top
Construction delays
Because of erasure, List<Integer>and
List<String>are
the same class, and the compiler only generates one class when compiling
List<V>(unlike
in C++). As a result, the compiler doesn't know what type is represented by
Vwhen
compiling the
List<V>class,
and so you can't do certain things with a type parameter (the
Vin
List<V>)
within the class definition of
List<V>that
you could do if you knew what class was being represented.
Because the runtime cannot tell a
List<String>from
a
List<Integer>(at
runtime, they're both just
Lists),
constructing variables whose type is identified by a generic type parameter is problematic. This lack of type information at runtime poses a problem for generic container classes and for generic classes that want to make defensive copies.
Consider the generic class
Foo:
class Foo<T> { public void doSomething(T param) { ... } }
Suppose the
doSomething()method
wants to make a defensive copy of the
paramargument
on entry? You don't have many options. You'd like to implement
doSomething()like
this:
public void doSomething(T param) { T copy = new T(param); // illegal }
But you cannot use a type parameter to access a constructor because, at compile time, you don't know what class is being constructed and therefore what constructors are available. There's no way to express a constraint like "
Tmust
have a copy constructor" (or even a no-arg constructor) using generics, so accessing constructors for classes represented by generic type parameters is out.
What about
clone()?
Let's say
Foowas
defined to make
Textend
Cloneable:
class Foo<T extends Cloneable> { public void doSomething(T param) { T copy = (T) param.clone(); // illegal } }
Unfortunately, you still can't call
param.clone().
Why? Because
clone()has
protected access in
Objectand,
to call
clone(),
you have to call it through a reference to a class that has overridden
clone()to
be public. But
Tis
not known to redeclare
clone()as
public, so cloning is also out.
Constructing wildcard references
OK, so you can't copy a reference to a type whose class is totally unknown at compile time. What about wildcard types? Suppose you want to make a defensive copy of a parameter whose type is Set<?>.
You know that
Sethas
a copy constructor. You've also been told it's better to use
Set<?>instead
of the raw type
Setwhen
you don't know the type of the set's contents, because that approach is likely to emit fewer unchecked conversion warnings. So you try this:
class Foo { public void doSomething(Set<?> set) { Set<?> copy = new HashSet<?>(set); // illegal } }
Unfortunately, you can't invoke a generic constructor with a wildcard type argument, even though you know such a constructor exists. However, you can do this:
class Foo { public void doSomething(Set<?> set) { Set<?> copy = new HashSet<Object>(set); } }
This construction is not the most obvious, but it is type-safe and will do what you think
new HashSet<?>(set)would do.
Constructing arrays
How would you implement ArrayList<V>?
The
ArrayListclass
is supposed to manage an array of
V,
so you might expect the constructor for
ArrayList<V>to
create an array of
V:
class ArrayList<V> { private V[] backingArray; public ArrayList() { backingArray = new V[DEFAULT_SIZE]; // illegal } }
But this code does not work -- you cannot instantiate an array of a type represented by a type parameter. The compiler doesn't know what type
Vreally
represents, so it cannot instantiate an array of
V.
The Collections classes use an ugly trick to get around this problem, one that generates an unchecked conversion warning when the Collections classes are compiled. The constructor for the real implementation of
ArrayListlooks
likes this:
class ArrayList<V> { private V[] backingArray; public ArrayList() { backingArray = (V[]) new Object[DEFAULT_SIZE]; } }
Why does this code not generate an
ArrayStoreExceptionwhen
backingArrayis
accessed? After all, you can't assign an array of
Objectto
an array of
String.
Well, because generics are implemented by erasure, the type of
backingArrayis
actually
Object[],
because
Objectis
the erasure of
V.
This means that the class is really expecting
backingArrayto
be an array of
Objectanyway,
but the compiler does extra type checking to ensure that it contains only objects of type
V.
So this approach will work, but it's ugly, and not really something to emulate (even the authors of the generified Collections framework say so -- see Resources).
An alternate approach would have been to declare
backingArrayas
an array of
Object,
and cast it to
V[]everywhere
it is used. You would still get unchecked conversion warnings (as you do with the previous approach), but it would have made some unstated assumptions (such as the fact that
backingArrayshould
not escape the implementation of
ArrayList)
more clear.
The road not taken
The best approach would be to pass a class literal (Foo.class)
into the constructor, so that the implementation could know, at runtime, the value of
T.
The reason this approach was not taken was for backward compatibility -- then the new generified collection classes would not be compatible with previous versions of the Collections framework.
Here's how
ArrayListwould
have looked under this approach:
public class ArrayList<V> implements List<V> { private V[] backingArray; private Class<V> elementType; public ArrayList(Class<V> elementType) { this.elementType = elementType; backingArray = (V[]) Array.newInstance(elementType, DEFAULT_LENGTH); } }
But wait! There's still an ugly, unchecked cast there when calling
Array.newInstance().
Why? Again, it's because of backward compatibility. The signature of
Array.newInstance()is:
public static Object newInstance(Class<?> componentType, int length)
instead of the type-safe:
public static<T> T[] newInstance(Class<T> componentType, int length)
Why was
Arraygenerified
this way? Again, the frustrating answer is to preserve backward compatibility. To create an array of a primitive type, such as
int[],
you call
Array.newInstance()with
the
TYPEfield
from the appropriate wrapper class (in the case of
int,
you would pass
Integer.TYPEas
the class literal). Generifying
Array.newInstance()with
a parameter of
Class<T>instead
of
Class<?>would
have been more type-safe for reference types, but would have made it impossible to use
Array.newInstance()to
create an instance of a primitive array. Perhaps in the future, an alternate version of
newInstance()will
be provided for reference types so you can have it both ways.
You may see a pattern beginning to emerge here -- many of the problems, or compromises, associated with generics are not issues with generics themselves, but side effects of the requirement to preserve backward compatibility with existing code.
Back
to top
Generifying existing classes
Converting existing library classes to work smoothly with generics was no small feat, but as is often the case, backward-compatibility does not come for free. I've already talked about two examples where backward compatibility limited the generification ofthe class libraries.
Another method that would probably have been generified differently had backward compatibility not been an issue is
Collections.toArray(Object[]).
The array passed into
toArray()serves
two purposes -- if the collection is small enough to fit in the array provided, the contents will simply be placed in that array. Otherwise, a new array of the same type will be created, using reflection, to receive the results. If the Collections framework
were being rewritten from scratch, it is likely that the argument to
Collections.toArray()would
not be an array, but instead a class literal:
interface Collection<E> { public T[] toArray(Class<T super E> elementClass); }
Because the Collections framework is widely emulated as an example of good class design, it is worth noting the areas in which its design was constrained by backwards compatibility, so that those aspects are not emulated blindly.
One element of the generifed Collections API that is often confusing at first is the signatures of
containsAll(),
removeAll(),
and
retainAll().
You might expect the signatures for
remove()and
removeAll()to
be:
interface Collection<E> { public boolean remove(E e); // not really public void removeAll(Collection<? extends E> c); // not really }
But it is in fact:
interface Collection<E> { public boolean remove(Object o); public void removeAll(Collection<?> c); }
Why is this? Again, the answer lies in backward compatibility. The interface contract of
x.remove(o)means
"if
ois
contained in
x,
remove it; otherwise, do nothing." If
xis
a generic collection,
odoes
not have to be type-compatible with the type parameter of
x.
If
removeAll()were
generified to only be callable if its argument was type-compatible (
Collection<? extends E>), then certain sequences of code that were legal before generics would become illegal, like this one:
// a collection of Integers Collection c = new HashSet(); // a collection of Objects Collection r = new HashSet(); c.removeAll(r);
If the above fragment were generified in the obvious way (making
ca
Collection<Integer>and
ra
Collection<Object>),
then the code above would not compile if the signature of
removeAll()required
its argument to be a
Collection<? extends E>, instead of being a no-op. One of the key goals of generifying the class libraries was to not break or change the semantics of existing code, so
remove(),
removeAll(),
retainAll(),
and
containsAll()had
to be defined with a weaker type constraint than they might have had they been redesigned from scratch for generics.
Existing classes that were designed before generics may have semantics that resist the "obvious" generification approach. In these cases, you will probably make compromises like the ones described here, but when designing new generic classes from scratch, it
is valuable to understand which idioms from the Java class libraries are the result of backward compatibility, so they are not emulated inappropriately.
Back
to top
Implications of erasure
Because generics are implemented almost entirely in the Java compiler, and not in the runtime, nearly all type information about generic types has been "erased" by the time the bytecode is generated. In other words, the compiler generates pretty much the samecode you would have written by hand without generics, casts and all, after checking the type-safety of your program. Unlike in C++,
List<Integer>and
List<String>are
the same class (although they are different types, both subtypes of
List<?>--
a distinction that is more important in JDK 5.0 than in previous versions of the language).
One of the implications of erasure is that a class cannot implement both
Comparable<String>and
Comparable<Number>,
because both of these are in fact the same interface, specifying the same
compareTo()method.
It might seem sensible to want to declare a
DecimalStringclass
that is comparable to both
Strings
and
Numbers,
but to the Java compiler, you would be trying to declare the same method twice:
public class DecimalString implements Comparable<Number>, Comparable<String> { ... } // nope
Another consequence of erasure is that using casts or
instanceofwith
generic type parameters doesn't make any sense. The following code will not improve the type safety of your code at all:
public <T> T naiveCast(T t, Object o) { return (T) o; }
The compiler will simply emit an unchecked conversion warning, because it doesn't know if the cast is safe or not. The
naiveCast()method
will not in fact do any casting at all --
Twill
simply be replaced with its erasure (
Object),
and the object passed in will be cast to
Object--
not what was intended.
Erasure is also responsible for the construction issues described above -- that you cannot create an object of generic type because the compiler doesn't know what constructor to call. If your generic class needs to be constructing objects whose type is specified
by generic type parameters, its constructors should take a class literal (
Foo.class)
and store it so that instances can be created with reflection.
Back
to top
Summary
Generics are a big step forward in the type-safety of the Java language, but the design of the generics facility, and the generification of the class library, were not without compromises. Extending the virtual machine instruction set to support generics wasdeemed unacceptable, because that might make it prohibitively difficult for JVM vendors to update their JVM. Accordingly, the approach of erasure, which could be implemented entirely in the compiler, was adopted. Similarly, in generifying the Java class libraries,
the desire to maintain backward compatibility placed many constraints on how the class libraries could be generified, resulting in some confusing and frustrating constructions (like
Array.newInstance()).
These are not problems with generics per se, but with the practicality of language evolution and compatibility. But they can make generics a little more confusing, and frustrating, to learn and use.
相关文章推荐
- (泛型)Java theory and practice: Generics gotchas
- Java theory and practice: Thread pools and work queues--reference
- The Java™ Tutorials — Generics :Generic Methods and Bounded Type Parameters 泛型方法和受限类型参数
- [转]《Java Generics and Collections》读书笔记一:java泛型基本问题
- Java theory and practice: Dealing with InterruptedException
- Java theory and practice: Fixing the Java Memory Model, Part 1
- Java theory and practice: Fixing the Java Memory Model, Part 1
- Java theory and practice: More flexible, scalable locking in JDK 5.0
- Java theory and practice
- Java theory and practice: Fixing the Java Memory Model, Part 2
- Java theory and practice: Fixing the Java Memory Model, Part 2
- Java theory and practice
- Java theory and practice: Good housekeeping practices
- Java theory and practice: Safe construction techniques
- Java theory and practice: Hashing it out
- Java theory and practice: Thread pools and work queues
- The Java™ Tutorials — Generics :Wildcards and Subtyping 泛型和子类
- Java theory and practice: More flexible, scalable locking in JDK 5.0
- 解读【Java theory and practice: Managing volatility】
- JAVA generics 泛型