Map implementations in java

A number of classes implement the Map interface, including HashMap, TreeMap, LinkedHashMap, WeakHashMap, ConcurrentHashMap, and Properties. The most generally useful class is HashMap.

  • java.util.HashMap is implemented with a hash table. Access time is O(1). Entries are unsorted.
    – similar to HashTable but allows null keys and values
    – not thread safe
  • java.util.LinkedHashMap is implemented with a hash table. Access time is O(1). Entries are sorted in either entry order or order of last access, which is useful for implementing a LRU (least recently used) caching policy.
  • java.util.TreeMap is implemented as a balanced binary tree. Access time is O(log N). Entries are sorted. 
  • java.util.HashTable(old classes)
    – updated class from earlier Java versions
    – does not allow null keys or values
    – thread safe
  • java.util.WeakHashMap –
    – like HashMap
    – entry is automatically removed from HashMap if no more “ordinary” references to key
  • java.util.IdentityHashMap
  • java.util.EnumMap
  • java.util.Properties

Custom Implementations in collections

Many programmers will never need to implement their own collections classes. You can go pretty far using the implementations described in the previous sections of this lesson. Someday, however, you might find yourself wanting to write your own implementation of a core collection interface. It turns out that it’s easy to do with the abstract implementations provided by the Java platform. But before we discuss how to write an implementation, let’s discuss why you might want to do such a thing.

Reasons to Write Your Own Implementation

This list enumerates several kinds of collections you might want to implement. It is not intended to be exhaustive.

  • Persistent: All of the built-in collection implementations reside in main memory, and vanish when the VM exits. Suppose you want a collection that will still be present the next time the VM starts. One fine way to implement such a collection is to build a veneer over an external database. Such a collection might conceivably be concurrently accessible by multiple VMs, since it resides outside the VM.
  • Application-specific: This is a very broad category. One example is an unmodifiable Map containing real-time telemetry data. The keys might represent locations, and the values could be read from sensors at these locations in response to the get operation.
  • Highly Concurrent: The built-in collections are not designed to support high concurrency. The synchronization wrappers (and the legacy implementations) lock the entire collection every time it’s accessed. Suppose you’re building a server and you need a Map implementation that can be accessed by many threads concurrently. It is reasonably straightforward to build a hash table that locks each bucket separately, allowing multiple threads to access the table concurrently (assuming they’re accessing keys that hash to different buckets).
  • High-performance, Special-purpose: There are many data structures that take advantage of restricted usage to offer better performance than is possible with general-purpose implementations. For example, consider a Set whose elements are restricted to a small, fixed universe. Such a Set can be represented as a bit-vector, which offers blindingly fast performance as well low memory usage. Another example concerns a List containing long runs of identical element values. Such lists, which occur frequently in text processing, can be run-length encoded: runs can be represented as a single object containing the repeated element and the number of consecutive repetitions. This example is interesting because it trades off two aspects of performance: It requires far less space than an ArrayList, but more time.
  • High-performance, General-purpose: The engineers who designed the collections framework tried to provide the best general-purpose implementations for each interface, but there are many, many data structures that could have been used, and new ones are invented every day. Maybe you can come up with something faster!
  • Enhanced functionality: Suppose you need a Map (or Set) implementation that offers constant time access, as well as insertion-order iteration. This combination can be achieved with a hash table, all of whose elements are further joined, in insertion order, into a doubly-linked list. Alternatively, suppose you need an efficient bag implementation (also known as a multiset): a Collection that offers constant time access while allowing duplicate elements. It’s reasonably straightforward to implement such a collection atop a HashMap.
  • Convenience: You may want additional convenience implementations beyond those offered by the Java platform. For instance, you may have a frequent need for immutable Map objects representing a single key-value mapping, or List objects representing a contiguous range of Integers, or whatever.
  • Adapter: Suppose you are using some legacy API that has its own ad hoc collections API. You can write an adapter implementation that permits these collections to operate in the Java Collections Framework. An adapter implementation is a thin veneer that wraps objects of one type and makes them behave like objects of another type by translating operations on the latter type into operations on the former.

How to Write a Custom Implementation

Writing a custom implementation is surprisingly easy with the aid of the abstract implementations furnished by the Java platform. Abstract implementations are skeletal implementations of the core collection interfaces designed expressly to facilitate custom implementations. We’ll start with an example. Here’s an implementation of Arrays.asList.

public static List asList(Object[] a) {
return new ArrayList(a);
}

private static class ArrayList extends AbstractList
implements java.io.Serializable
{
private Object[] a;

ArrayList(Object[] array) {
a = array;
}

public Object get(int index) {
return a[index];
}

public Object set(int index, Object element) {
Object oldValue = a[index];
a[index] = element;
return oldValue;
}

public int size() {
return a.length;
}
}

Believe it or not, this is almost exactly the implementation contained in the JDK. It’s that simple! You provide a constructor, the get, set, and size methods, and AbstractList does all the rest. You get the ListIterator, bulk operations, search operations, hash code computation, comparison and string representation for free. Suppose you want to make the implementation a bit faster. The API documentation for the abstract implementations describes precisely how each method is implemented so you’ll know which methods to override in order to get the performance that you want. The performance of the implementation above is fine, but it can be improved a bit. In particular, the toArray method iterates over the List copying one element at a time. Given the internal representation, it’s a lot faster and more sensible just to clone the array:

public Object[] toArray() {
return (Object[]) a.clone();
}

With the addition of this override, and a similar one for toArray(Object[]), this implementation is exactly the one found in the JDK. In the interests of full disclosure, it’s a bit tougher to use the other abstract implementations because they require you to write your own iterator, but it’s still not that hard. The abstract implementations are summarized below:

  • AbstractCollection(in the API reference documentation): A Collection that is neither a Set nor a List, such as a bag. At a minimum, you must provide the iterator and the size method.
  • AbstractSet(in the API reference documentation): A Set. Use is identical to AbstractCollection.
  • AbstractList(in the API reference documentation): A List backed by a random-access data store (such as an array). At a minimum, you must provide the positional access methods (get(int) and, optionally, set(int), remove(int), and add(int)) and the size method. The abstract class takes care of listIterator (and iterator).
  • AbstractSequentialList(in the API reference documentation): A List backed by a sequential-access data store (such as a linked list). At a minimum, you must provide the listIterator and size methods. The abstract class takes care of the positional access methods. (This is the opposite of AbstractList.)
  • AbstractMap(in the API reference documentation): A Map. At a minimum you must provide the entrySet view. This is typically implemented with the AbstractSet class. If the Map is modifiable, you must also provide the put method.

The process of writing a custom implementation is summarized below:

  1. Choose the appropriate abstract implementation class from the list above.
  2. Provide implementations for all of the class’s abstract methods. If your custom collection is to be modifiable, you’ll have to override one or more concrete methods as well. The API documentation for the abstract implementation class will tell you which methods to override.
  3. Test and, if necessary, debug the implementation. You now have a working custom collection implementation!
  4. If you’re concerned about performance, read the abstract implementation class’s API documentation for all of the methods whose implementations you’re inheriting. If any of them seem too slow, override them. If you override any methods, be sure to measure the performance of the method before and after the override! How much effort you put into tweaking the performance should be a function of how much use the implementation will get, and how performance-critical the use. (Often, this step is best omitted.)

Convenience Implementations

This section describes several mini-implementations that can be more convenient and more efficient then the general purpose implementations when you don’t need the full power of a general purpose implementation. All of the implementations in this section are made available via static factory methods or exported constants, rather than public classes.

List-view of an Array

The Arrays.asList(in the API reference documentation)method returns a List-view of its array argument. Changes to the List write through to the array and vice-versa. The size of the collection is that of the array, and cannot be changed. If the add or remove method is called on the List, an UnsupportedOperationException will result. The normal use of this implementation is as a bridge between array-based and collection-based APIs. It allows you to pass an array to a method expecting a Collection or a List. However, this implementation has another use. If you need a fixed-size List, it’s more efficient than any general-purpose List implementation. Here’s the idiom:

List l = Arrays.asList(new Object[size]);

Note that a reference to the backing array is not retained.

Immutable Multiple-Copy List

Occasionally you’ll need an immutable List consisting of multiple copies of the same element. The Collections.nCopies(in the API reference documentation)method returns such a List. This implementation has two main uses. The first is to initialize a newly created List. For example, suppose you want an ArrayList initially consisting of 1000 null elements. The following incantation does the trick:

List l = new ArrayList(Collections.nCopies(1000, null));

Of course, the initial value of each element needn’t be null. The second main use is to grow an existing List. For example, suppose you want to add 69 copies of the string fruit bat to the end of a List. It’ not clear why you’d want to do such a thing, but let’s just suppose you did. Here’s how you’d do it:

lovablePets.addAll(Collections.nCopies(69, "fruit bat"));

By using the form of addAll that takes an index as well as a Collection, you can add the new elements to the middle of a List instead of at the end.

Immutable Singleton Set

Sometimes you’ll need an immutable singleton Set, which consists of a single, specified element. The Collections.singleton(in the API reference documentation)method returns such a Set. One use of this implementation is this idiom to remove all occurrences of a specified element from a Collection:

c.removeAll(Collections.singleton(e));

There’s a related idiom to remove from a Map all elements that map to a specified value. For example, suppose you have a Map, profession, that maps people to their line of work. Suppose you want to eliminate all of the lawyers. This one-liner will do the deed:

profession.values().removeAll(Collections.singleton(LAWYER));

One more use of this implementation is to provide a single input value to a method that is written to accept a Collection of values.

Empty Set and Empty List Constants

The Collections class provides two constants, representing the empty Set and the empty List, Collections.EMPTY_SET(in the API reference documentation)and Collections.EMPTY_LIST(in the API reference documentation). It’s not clear that these constants really qualify as implementations, but this lesson seemed like as good a place to mention them as any. The main use of these constants is as input to methods that take a Collection of values, when you don’t want to provide any values at all.

General Purpose Implementations

The general-purpose implementations are summarized in the table below. The table highlights their regular naming pattern: names are all of the form <Implementation> <Interface>, where <Interface> is the core collection interface implemented by the class, and <Implementation> signifies the data structure underlying the implementation.

Implementations
Hash Table Resizable Array Balanced Tree Linked List
Interfaces Set HashSet   TreeSet  
List   ArrayList   LinkedList
Map HashMap   TreeMap  

JDK 1.2 provides two implementations of each interface (with the exception of Collection(in the API reference documentation), which has no direct implementations, but serves as a least common denominator for the other collection interfaces). In each case, one implementation is clearly the primary implementation: the one to use, all other things being equal. The primary implementations are HashSet, ArrayList and HashMap. Note that the SortedSet(in the API reference documentation)and SortedMap(in the API reference documentation)interfaces do not have rows in the table above. Each of these interfaces has one implementation and these implementations (TreeSet and TreeMap) are listed in the Set and Map rows. Not only do the implementations have consistent names, but they have consistent behavior as well. All of them implement all the optional operations contained in their interfaces. All permit null elements, keys and values. Each one is unsynchronized. All have fail-fast iterators, which detect illegal concurrent modification during iteration and fail quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future. All are Serializable, and all support a public clone method.
The fact that the new implementations are unsynchronized represents a break with the past: Vector and Hashtable were synchronized in versions of the JDK prior to 1.2. The new approach was taken because it was recognized that collections are frequently used in a manner where the synchronization is of no benefit. Such uses include single-threaded use, read-only use, and use as part of a larger data object that does its own synchronization. In general, it is good API design practice not to make users pay for a feature that they generally don’t use. Further, unnecessary synchronization can result in deadlock under certain circumstances.
If you need a synchronized collection, the synchronization wrappers, described in the next section, allow any collection to be transformed into a synchronized collection. Thus, synchronization is optional for the new collection implementations where it was mandatory for the old.
As a rule of thumb, you should be thinking about the interfaces rather than the implementations. That is why there are no programming examples in this lesson. For the most part, the choice of implementation affects only performance. The preferred style, as was mentioned in the interfaces lesson, is to choose an implementation when a collection is created, and immediately assign the new collection to a variable of the corresponding interface type (or pass the collection to a method expecting an argument of the interface type). In this way, the program does not become dependent on any added methods in a given implementation, leaving the programmer or maintainer with the freedom to change implementations at the drop of a hat, if performance concerns dictate that it is the right thing to do.
The general purposes implementations are briefly discussed below. The performance of the implementations is described using words like constant, log, linear, n log(n) and quadratic. These words refer to the asymptotic upper bound on the time complexity of performing the operation. All of this is quite a mouthful, and it doesn’t matter much if you don’t know what it means. If you’re interested, any good algorithms textbook explains this stuff. One thing to keep in mind is that this sort of performance metric has its limitations. Sometimes the nominally slower implementation may be faster for the collection size that you’re actually using. When in doubt, measure the performance.

Set

The two general purpose Set(in the API reference documentation)implementations are HashSet(in the API reference documentation)and TreeSet(in the API reference documentation). It’s very straightforward to decide which of these two to use. HashSet is much faster (constant time vs. log time for most operations), but offers no ordering guarantees. If you need to use the operations in the SortedSet, or in-order iteration is important to you, use TreeSet. Otherwise, use HashSet. It’s a fair bet that you’ll end up using HashSet most of the time. One thing worth keeping in mind about HashSet is that iteration is linear in the sum of the number of entries and the number of buckets (the capacity). Thus, it’s important to choose an appropriate initial capacity if iteration performance is important. Choosing a capacity that’s too high can waste space as well as time. The default initial capacity is 101, and that’s often more that you need. The initial capacity may be specified using the int constructor. To allocate a HashSet whose initial capacity is 17:

Set s= new HashSet(17);

HashSets have one other “tuning parameter” called the load factor. If you care deeply about the space consumption of your HashSet, read the HashSet documentation for more information. Otherwise just live with the default. If you accept the default load factor but you do want to specify an initial capacity, pick a number that’s about twice the size that you expect the Set to grow to. If your guess is way off, it may have to grow or you may waste a bit of space, but either way it’s no big problem. If you know a prime number of about the right size, use it. If not, use an odd number. Or use an even number. It doesn’t really matter much; these things might make the HashSet perform a wee bit better, but nothing to write home about.
TreeSet has no tuning parameters. With the exception of clone, neither HashSet nor TreeSet have any operations other than those required by their respective interfaces (Set and TreeSet).

List

The two general purpose List(in the API reference documentation)implementations are ArrayList(in the API reference documentation)and LinkedList(in the API reference documentation). Most of the time, you’ll probably use ArrayList. It offers constant time positional access, and it’s just plain fast, because it does not have to allocate a node object for each element in the List, and it can take advantage of the native method System.arraycopy when it has to move multiple elements at once. Think of ArrayList as Vector without the synchronization overhead. If you frequently add elements to the beginning of the List, or iterate over the List deleting elements from its interior, you might want to consider LinkedList. These operations are constant time in a LinkedList but linear time in an ArrayList. But you pay a big price! Positional access is linear time in a LinkedList and constant time in an ArrayList. Furthermore, the constant factor for LinkedList is much worse. If you think that you want to use a LinkedList, measure the performance with both LinkedList and ArrayList. You may be surprised.
ArrayList has one tuning parameter, the initial capacity. It refers to the number of elements the ArrayList can hold before it has to grow. There’s not much to say about it. The only ArrayList operations that are not required by List are ensureCapacity and trimToSize (which alter the excess capacity), and clone.
LinkedList has no tuning parameters, and seven optional operations, one of which is clone. The other six are addFirst, getFirst, removeFirst, addLast, getLast, and removeLast; I have very mixed feelings about them. They make it a bit more convenient to use a LinkedList as a queue or a double-ended queue (dequeue), but they prevent you from easily switching representations when you discover that ArrayList is faster.
If you need synchronization, a Vector(in the API reference documentation)will be slightly faster than an ArrayList synchronized with Collections.synchronizedList, but Vector has loads of legacy operations, so be extra careful to always manipulate the Vector with the List interface, or you’ll be stuck with it for life.
If your List is fixed in size (that is, you’ll never use remove, add or any of the bulk operations other than containsAll) you have a third option that’s definitely worth considering. See Arrays.asList in the convenience implementations section.

Map

The two general purpose Map(in the API reference documentation)implementations are HashMap(in the API reference documentation)and TreeMap(in the API reference documentation). The situation for Map is exactly analogous to Set. If you need SortedMap operations or in-order Collection-view iteration, go for TreeMap; otherwise, go for HashMap. Everything else in the Set section (above) also applies to Map so just re-read it. Completeness requires that we mention Hashtable(in the API reference documentation). As with Vector and ArrayList, if you need synchronization, a Hashtable will be slightly faster than a HashMap synchronized with Collections.synchronizedMap. Again, Hashtable has loads of legacy operations, so be extra careful always to manipulate it with the Map interface, or you’ll be stuck with it for life.

Wrapper Implementations

Wrapper implementations are implementations that delegate all of their real work to a specified collection, but add some extra functionality on top of what this collection offers. For design patterns fans, this is an example of the decorator pattern. While it may seem a bit exotic, it’s really pretty straightforward.
These implementations are anonymous: rather than providing a public class, the JDK provides a static factory method. All of these implementations are found in the Collections(in the API reference documentation) API which consists solely of static methods.

Synchronization Wrappers

The synchronization wrappers add automatic synchronization (thread-safety) to an arbitrary collection. There is one static factory method for each of the six core collection interfaces:

public static Collection synchronizedCollection(Collection c);
public static Set synchronizedSet(Set s);
public static List synchronizedList(List list);
public static Map synchronizedMap(Map m);
public static SortedSet synchronizedSortedSet(SortedSet s);
public static SortedMap synchronizedSortedMap(SortedMap m);

Each of these methods returns a synchronized (thread-safe) Collection backed by the specified collection. In order to guarantee serial access, it is critical that all access to the backing collection is accomplished through the returned collection. The easy way to guarantee this is to not to keep a reference to the backing collection. Creating the synchronized collection like this does the trick:

List list = Collections.synchronizedList(new ArrayList());

A collection created in this fashion is every bit as thread-safe as as a “normally” synchronized collection like a Vector(in the API reference documentation). In the face of concurrent access, it is imperative that the user manually synchronize on the returned collection when iterating over it. This is because iteration is accomplished via multiple calls into the collection, which must be composed into a single atomic operation. The idiom to iterate over a wrapper-synchronized collection is:

Collection c = Collections.synchronizedCollection(myCollection);
synchronized(c) {
Iterator i = c.iterator(); // Must be in the synchronized block!
while (i.hasNext())
foo(i.next());
}

Failure to follow this advice may result in non-deterministic behavior. The idiom for iterating over a Collection-view of a synchronized Map is similar, but with one wrinkle. It is imperative that the user manually synchronize on the synchronized Map when iterating over any of its Collection-views, rather than synchronizing on the Collection-view itself:

Map m = Collections.synchronizedMap(new HashMap());
...
Set s = m.keySet(); // Needn't be in synchronized block
...
synchronized(m) { // Synchronizing on m, not s!
Iterator i = s.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}

One minor downside of the wrapper implementation approach is that you do not have the ability to execute any non-interface operations of a wrapped implementation. So, for instance, in the List example above, one cannot call ArrayList‘s ensureCapacity operation on the wrapped ArrayList.

Unmodifiable Wrappers

The unmodifiable wrappers are conceptually similar to the synchronization wrappers, but simpler. Rather than adding functionality to the wrapped collection, they take it away. In particular, they take away the ability to modify the collection, by intercepting all of the operations that would modify the collection, and throwing an UnsupportedOperationException. The unmodifiable wrappers have two main uses:

  • To make a collection immutable once it has been built. In this case, it’s good practice not to maintain a reference to the backing collection. This absolutely guarantees immutability.
  • To allow “second-class citizens” read-only access to your data structures. You keep a reference to the backing collection, but hand out a reference to the wrapper. In this way, the second-class citizens can look but not touch, while you maintain full access.

Like the synchronization wrappers, there is one static factory method for each of the six core collection interfaces:

public static Collection unmodifiableCollection(Collection c);
public static Set unmodifiableSet(Set s);
public static List unmodifiableList(List list);
public static Map unmodifiableMap(Map m);
public static SortedSet unmodifiableSortedSet(SortedSet s);
public static SortedMap unmodifiableSortedMap(SortedMap m);

Implementations in collections

Implementations are the actual data objects used to store collections, which implement the core collection interfaces described in the previous lesson. The sections that follow describe three kinds of implementations:

General-purpose Implementations

General-purpose implementations are the public classes that provide the primary implementations of the core collection interfaces.

Wrapper Implementations

Wrapper implementations are used in combination with other implementations (often the general-purpose implementations) to provide added functionality.

Convenience Implementations

Convenience implementations are mini-implementations, typically made available via static factory methods that provide convenient, efficient alternatives to the general-purpose implementations for special collections (like singleton sets).

Additionally, you can build your own implementations, based on the JDK’s abstract implementations. This is described in a separate lesson because it’s an advanced topic. It’s not particularly hard, but relatively few people will need to do it.

Vector Tutorial in java

Introduction to Vectors in java

Operating on Vector in java – Common operations like create,add, Delete

How To Iterate over the elements of a Vector

Common Vector Methods

Insuring use of the new Vector API Methods

Replacements for old methods

 

 

ArrayList<E> – Some basic methods and constructors

java.util.ArrayList allows for expandable arrays, and is basically the same as the older the Collections Vector class.

ArrayList extends AbstractList and implements List, Cloneable, SerializableArrayList capacity .grows automatically. The ArrayList is not synchronized. It permits all elements including null.
An ArrayList has these characteristics:

  • An ArrayList automatically expands as data is added.
  • Access to any element of an ArrayList is O(1). Insertions and deletions are O(N).
  • An ArrayList has methods for inserting, deleting, and searching.
  • An ArrayList can be traversed using a foreach loop, iterators, or indexes.

Implementation. ArrayLists are implemented with an underlying array, and when that array is full and an additional element is added, a new, larger, array is allocated and the elements are copied from the old to the new. Because it takes time to create a bigger array and copy the elements from the old array to the new array, it is a slightly faster to create an ArrayList with a size that it will commonly be when full. Of course, if you knew the final size, you could simply use an array. However, for non-critical sections of code programmers typically don’t specify an initial size.

Common ArrayList methods and constructors

Here are some of the most useful ArrayList methods. Assume these declarations. Note that E is the notation for the type of an element in a collection. Sun recommends using single upper case letters for generic types.

int i;
ArrayList<E> a;
E e;
Iterator<E> iter;
ListIterator<E>liter;
E[] earray;
Object[] oarray;
Result Method Description
Constructors
a = new ArrayList() Creates ArrayList with initial default capacity 10.
a = new ArrayList(cap) Creates ArrayList with initial int capacity cap.
a = new ArrayList(coll) Creates ArrayList from the Collection coll.
Adding elements
a.add(e) adds e to end of ArrayList a
a.add(i, e) Inserts e at index i, shifting elements up as necessary.
Replacing an element
a.set(i,e) Sets the element at index i to e.
Getting the elements
e = a.get(i) Returns the object at index i.
oarray = a.toArray() Returns values in array of objects.
earray = a.toArray(E[]) The array parameter should be of the E class. Returns values in that array (or a larger array is allocated if necessary).
Iterators
iter = a.iterator() Returns an Iterator for forward traversal.
liter = a.listIterator(i) Returns a ListIterator for forward / backward / modifying traversal, starting at index i. Start from end with a.listIterator(a.size())
liter = a.listIterator() Returns a ListIterator for forward / backward / modifying traversal.
Searching
b = a.contains(e) Returns true if ArrayList a contains e
i = a.indexOf(e) Returns index of first occurrence of e, or -1 if not there.
i = a.lastIndexOf(e) Returns index of last occurrence of e, or -1 if not there.
Removing elements
a.clear() removes all elements from ArrayList a
a.remove(i) Removes the element at position i.
a.removeRange(i, j) Removes the elements from positions i thru j.
Other
i = a.size() Returns the number of elements in ArrayList a.

 

Adding elements to the end of an ArrayList, getting them by index

ArrayList<E> a = new ArrayList<E>();  // Default size.
E s; // Declare s to be an object type E.
. . .
a.add(s); // Adds s to the end of the ArrayList a
. . .
s = a.get(i); // Assigns ith element from a to s.