Java Collections Tutorial for Beginners and Professionals

8 Flares Twitter 0 Facebook 0 Google+ 8 Pin It Share 0 LinkedIn 0 Filament.io 8 Flares ×

Java Collections framework tutorial

Table of Contents

  • Introduction to Collections
  • Data Structures
  • Core Interfaces
  • Collection Interface
    • List
      • ArrayList
      • Vector
      • Stack
      • LinkedList
    • Set
      • HashSet
    • SortedSet
      • SortedHashSet
    • Queue
    • Deque
  • Map Interface
    • Map
      • HashMap
    • SortedMap

Introduction to Collections

Collections are nothing but a natural group of items. For example a pack of cards is a natural grouping. The collections provides methods to store, retrieve, remove and manipulate the data. Various concrete classes that implements different Collection interfaces, provides implementation of algorithms for efficient operations on the group of data. Typically it provides java implementations for common Abstract data types and different algorithms described in fundamental computer science courses.

The different components of Collection framework are interfaces (to provide an abstract view of methods), implementations (concrete implementations of the interfaces) and algorithms (standard implementation of common algorithms for sorting, searching, insertion, deletion etc). Before we look into the Collections framework, it is a good idea to quickly understand few common data structures in computer science.

Data structures

Let us look at some of the common data structures in computer science. Through collections framework, java provides implementations for most of them.  Let us first understand what an Abstract Data type is. It is important to understand this to get a good grasp of data structures.

Abstract Data Type

An abstract data types hides the specification of objects and operations on them from their representation and implementation of operations. This means you can provide different implementations for how objects are stored and provide different algorithms for search, insert, delete etc.  ArrayList, Vector and LinkedList are different implementations for List ADT.

List ADT

Instances : Sequence of arbitrary number of items

Operations

  • add( index, item)
  • remove(index)
  • isEmpty()
  • get(index)
  • size()

Features of List: 

  1. List is a sequence of values.
  2. It can contain duplicate values.
  3. It contains items of the same type.
  4. Except the first and last element, all others has a unique predecessor and a unique successor. So it is a linear sequence of items.

An implementation of List in Java is ArrayList class.

Set ADT

Set is a container of distinct objects. No duplicates are allowed. There is no explicit notion of a key or even an order.

Example: A collection of unique books, collection of unique numbers

The set ADT supports the following fundamental methods acting on a set A

  • union(B) :  create union of A and B and replace A with that. That is A = A U B
  • intersect(B) : create intersection of A and B and replace A with that. A = A Intersection symbol B
  • subtract(B) :  Substitue A with Difference of A and B. A = A – B

java.util.Set is the java implementation of the Set ADT.

Mapping of the ADT method to java.uti.Set methods

  • union(B) =  addAll(Collection B)
  • intersect(B) = retainAll(Collection B)
  • subtract(B) = removeAll(Collection B)
  • Java could have used the union, intersect and subtract method names as it is. But it didn’t. To me it is confusing and un-warranted.

We now have enough background information to deep dive into the Collection framework.

Core Interfaces

The Collections framework is organized into two interfaces and their child interfaces. The two hierarchies are Collection and Map.

Collections Framework

The Collection Interface

Is at the top of the collections hierarchy.

The List Interface

Extends Collection represents a sequence of elements.

The Set

No duplicates are allowed. Supports intersect, union and subtraction

The SortedSet

Extends Set to handle sorted sets

The Map

Stores data (values) indexed via keys. Key – Value data structure

The SortedMap

Extends Map such that keys are maintained in ascending order.

Iterator

Let us explore few popular concrete implementations of these interfaces and its usage.

List (java.util.List)

Popular Concrete Implementations: ArrayList, Vector, Stack, LinkedList

ArrayList: Resizable array based implementation of List data type. Access to this class is NOT synchronized. Hence not thread safe. Code sample below shows the usage of various methods in ArrayList class.

ArrayList Example

Vector:  It is also a List and elements are accessed using the index. The size of a Vector can grow or shrink as needed to accommodate adding and removing items after the Vector has been created. It optimizes storage by growing and shrinking depending on the number of elements. Access to vector elements are synchronized and hence thread safe. If a particular vector instance is going to be accessed by only a single thread always, it is better to change it an ArrayList. Helps in improving the performance.

Stack: Stack is a Last In First out ( LIFO) structure. When you add items, it get stacked vertically like a stack of bricks. A element can only be removed from the top of the stack and hence the last added elements gets removed or serviced first.  The common methods are push(x) that adds element x to the top of the stack and pop() that removes the top element from the stack and returns the same. Other useful methods are peek() and empty().  Peek is very useful, it tell us what object is at the top of the stack without removing it from the stack. This is very handy in certain situations.

Stack Example

LinkedList :  LinkedLIst is also a list ( sequence of items). But it is very efficient in storage and dynamically grows or shrinks as per the storage requirements. There is no wastage or up-front allocation of memory. If you compare it with an ArrayList, which uses an array internally, LinkedLIst is much more efficient in terms of space management. Remember, if a new element needs to be inserted into the ArraylList and the array (used internally by the ArrayList)  is full, a new array needs to be created with more capacity and all elements from the earlier array needs to be copied to new array. There is no such problem with a LinkedLIst. The main draw back is finding an element is not as efficient as an ArrayList as the LinkedList needs to be traversed from the head.

 LinkedLIst Example

Set (java.util.Set)

A collection that contains no duplicate elements. More formally, sets contain no pair of elements a and b such that a.equals(b), and at most one null element. As implied by its name, this interface models the mathematical set abstraction. If you want to eliminate duplicate, set is your data structure. Java provides implementations of the set interface namely AbstractSet, ConcurrentSkipListSet, CopyOnWriteArraySet, EnumSet, HashSet, JobStateReasons, LinkedHashSet, TreeSet

Let us explore the most popular of this, HashSet.

HashSet: It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element. This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important. A HashSet is not synchronized and hence thread safe.

Map (java.util.Map)

Map maintains the data as key – value pairs. Map cannot contain duplicate keys. But it is perfectly correct for different keys to contain same value. It is very useful for quick look up of data based on a given key. Perfect for indexing. Java provides implementations of the Map interface namely AbstractMap, Attributes, AuthProvider, ConcurrentHashMap, ConcurrentSkipListMap, EnumMap, HashMap, Hashtable, IdentityHashMap, LinkedHashMap, PrinterStateReasons, Properties, Provider, RenderingHints, SimpleBindings, TabularDataSupport, TreeMap, UIDefaults, WeakHashMap.

Let us explore the most popular of this, the HashMap

HashMap :  A HashMap is roughly equivalent to a Hashtable, except that it is not synchronized. Hence when you don’t need thread safety use HashMap instead of HashTable. HashMap allows null for key and values. It doesn’t guarantee that the order of the elements will remain constant over time.

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.  It is thus important to make sure that the Hashtable doesn’t rehash very frequently. Allocate the initial capacity and load factor depending on the anticipated data storage requirements. A HashMap is not thread safe. If you need thread safety use a HashTable.

HashMap Example

Generics and Collection

Since JDK 5, java added support for generics. Hence all collection implementations now support generic style type definition allowing the developer to declare up-front what type of object a particular collection will hold. This ensures during compile time itself compiler can throw a compiler exception if a different type is being inserted.

 

If you try to add something other than string to the above list, you will receive a compiler error.

 

References

https://docs.oracle.com/javase/tutorial/collections/intro/

 http://docs.oracle.com/javase/7/docs/api/java/util/Collection.html

http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html