data-structures-all-langs/set at main · iseeyah/data-structures-all-langs

Name	Name	Last commit message	Last commit date
parent directory ..
java	java
README.md	README.md

Introduction

The Set represents unordered groups of unique items (as in mathematical sets). They’re used when the order of items you need to store is meaningless, or if you must ensure no items in the group occurs more than once. The most common Set operations are:

add(e): add an item to the set or produce an error if the item is already in the set.
contains(e): checks if an item exists in a set
list(): list the items in the set.
remove(e): remove an item from the set.¹

PS: Operations can be named differently in different languages

Internals

The Set's internal implementation is very similar to that of the Map's, and like the Map, an array is the underlying data structure used by a Set. The difference is that the Set does not store any key-value mappings, it stores an independent object i.e. whatever object is passed in :). It is important that the objects to be stored in a set are immutable.

Operations

add(e)

Pseudocode:
* Generate a hash (a.k.a index for the underlying array) for the object (or element) **e**, using a hash function
* Check if object (or element) **e** exists at index. See `contains(e)`'s description for more info
* Store object (or element) **e** at the index generated
* Handle collision, if necessary

Time Complexity : time to find address (always constant) + time to traverse bucket (depends on the data structure)

Time to find address	Bucket	Time to traverse bucket	Worst Case for `add(e)`	Amortized Time
Constant Time/O(1)	List	Linear Time/O(N)	Linear Time/O(N)	Constant Time/O(1)
Constant Time/O(1)	LinkedList	Linear Time/O(N)	Linear Time/O(N)	Constant Time/O(1)
Constant Time/O(1)	Binary Search Tree	Logarithmic Time/O(LogN)	Logarithmic Time/O(LogN)	Constant Time/O(1)

contains(e)

Pseudocode:
* Generate a hash (a.k.a index for the underlying array) for the object (or element) **e**, using a hash function
* Check if object (or element) **e** exists at index. It is this step that ensure uniqueness in a set. It is for this step that immutability is important. 
    - To check if an object exists at an element, we traverse the bucket until we find it.

Time Complexity : time to find address (always constant) + time to traverse bucket (depends on the data structure)

Time to find address	Bucket	Time to traverse bucket	Worst Case for `contains(e)`	Amortized Time
Constant Time/O(1)	List	Linear Time/O(N)	Linear Time/O(N)	Constant Time/O(1)
Constant Time/O(1)	LinkedList	Linear Time/O(N)	Linear Time/O(N)	Constant Time/O(1)
Constant Time/O(1)	Binary Search Tree	Logarithmic Time/O(LogN)	Logarithmic Time/O(LogN)	Constant Time/O(1)

list()

Pseudocode:
* Traverse each index in the underlying array
* At each index's bucket and add each element found to a list
* Return list

PS: This operation doesn't list out the elements in the order they were inserted - this is why we say sets are unordered. It lists them according to their location in the underlying array, which is based on the hash that was generated for each element.

Time Complexity : time to traverse array (linear) + time to traverse bucket at each index (depends on the data structure)

Time to traverse array	Bucket	Time to traverse bucket	Worst Case for `list()`
Linear Time/O(N)	List	Linear Time/O(N)	Linear Time/O(N)
Linear Time/O(N)	LinkedList	Linear Time/O(N)	Linear Time/O(N)
Linear Time/O(N)	Binary Search Tree	Logarithmic Time/O(LogN)	Logarithmic Time/O(LogN)

remove(e)

Pseudocode:
* Generate a hash (a.k.a index for the underlying array) for the object (or element) **e**, using a hash function
* Delete object (or element) **e**  at the index, if there's no collision
* If there's a collision, traverse bucket at index to find the correct object and delete it

Time Complexity : time to find address (always constant) + time to delete from bucket (depends on the data structure)

Time to find address	Bucket	Time to delete from bucket	Worst Case for `remove(e)`	Amortized Time
Constant Time/O(1)	List	Linear Time/O(N)	Linear Time/O(N)	Constant Time/O(1)
Constant Time/O(1)	LinkedList	Constant Time/O(1)	Constant Time/O(1)	Constant Time/O(1)
Constant Time/O(1)	Binary Search Tree	Logarithmic Time/O(LogN)	Logarithmic Time/O(LogN)	Constant Time/O(1)

See explanation for Hashing and Collision here

Reference(s)

Computer Science Distilled

Author(s)

Ayomide Oyekanmi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set

set

README.md

Introduction

PS: Operations can be named differently in different languages

Internals

Operations

Reference(s)

Author(s)

Files

set

Directory actions

More options

Directory actions

More options

Latest commit

History

set

Folders and files

parent directory

README.md

Introduction

PS: Operations can be named differently in different languages

Internals

Operations

Reference(s)

Author(s)