7.2 Sorting, Searching, Hashing and Graphs

Sorting and searching are fundamental operations in computer science, widely used for data organization and retrieval. Graphs are essential structures in many applications like networks, social media, and routing algorithms. Understanding these concepts is vital for efficient problem-solving.

1. Sorting techniques

Sorting is used to manipulate or order data structures, such as arrays, linked lists, or other data structures. The role of sorting is to organize the data in a way that makes searching, accessing, or processing it more efficient or easier.

Types of Sorting:

Internal Sorting: Involves sorting data that can all fit into the computer's main memory (RAM). Example: Bubble Sort, Quick Sort, Merge Sort.

External Sorting: Used when the data to be sorted is too large to fit into memory and needs to be stored externally (like in disk files). Example: Merge Sort used in external sorting.

Common Sorting Algorithms

1. Insertion Sort:

Description: Builds the sorted array one element at a time by inserting each new element into its correct position.

Example: Array:

Array: [5, 2, 9, 1]

Steps:

Start with the second element (2):
- Compare 2 with 5. Since 2 is smaller than 5, move 5 one position to the right and insert 2 at the first position.
- Array after the first step: [2, 5, 9, 1]
Move to the third element (9):
- Compare 9 with 5. Since 9 is greater, it stays in its place.
- Array remains: [2, 5, 9, 1]
Move to the fourth element (1):
- Compare 1 with 9. Since 1 is smaller, shift 9 one position to the right.
- Compare 1 with 5. Since 1 is smaller, shift 5 one position to the right.
- Compare 1 with 2. Since 1 is smaller, shift 2 one position to the right.
- Insert 1 at the first position.
- Array after the final step: [1, 2, 5, 9]

Final Sorted Array: [1, 2, 5, 9]

Best Case: O(n) (When the array is already sorted)
- In the best case, each new element is greater than or equal to the last element, so only one comparison per element is made, resulting in a linear time complexity.
Worst Case: O(n²) (When the array is sorted in reverse order)
- In the worst case, each new element has to be compared with every element before it, leading to a quadratic time complexity.

2. Selection Sort:

Description: Repeatedly selects the smallest element from the unsorted portion of the array and swaps it with the first unsorted element.

Example: Array: [5, 2, 9, 1]

Steps:

Select 1 (smallest), swap with 5: [1, 2, 9, 5]
Select 2 (smallest), no change: [1, 2, 9, 5]
Select 5, swap with 9: [1, 2, 5, 9]

Final Sorted Array: [1, 2, 5, 9]

Best Case: O(n²) (No improvement for already sorted arrays)
- Regardless of whether the array is sorted or not, Selection Sort always compares every element with all the others, resulting in a quadratic time complexity in all cases.
Worst Case: O(n²) (Worst case remains the same as the best case)
- Even in the worst case, it performs the same number of comparisons and swaps.

3. Merge Sort:

Description: Divides the array into two halves, recursively sorts them, and merges them back together.

Time Complexity: O(n log n)

Example: Array: [5, 2, 9, 1]

Steps:

Divide: [5, 2] and [9, 1]
Sort: [2, 5] and [1, 9]
Merge: [1, 2, 5, 9]

Final Sorted Array: [1, 2, 5, 9]

Best Case: O(n log n) (Regardless of the initial order of the array)
- Merge Sort always divides the array into two halves, sorts each half recursively, and merges them back together. The time complexity remains O(n log n) even if the array is already sorted.
Worst Case: O(n log n) (Same as the best case)
- The time complexity is still O(n log n) because the algorithm always splits the array and performs a merge step regardless of the input.

4. Radix Sort:

Description: Sorts numbers digit by digit from the least significant digit to the most significant.

Time Complexity: O(nk) (n = number of elements, k = number of digits)

Example: Array: [170, 45, 75, 90, 802, 24, 2, 66]

Steps:

Sort by least significant digit: [170, 90, 802, 2, 24, 45, 75, 66]
Sort by second least significant digit: [802, 2, 24, 45, 66, 170, 75, 90]
Sort by third digit: [2, 24, 45, 66, 75, 90, 170, 802]

Final Sorted Array: [2, 24, 45, 66, 75, 90, 170, 802]

Best Case: O(nk) (When the number of digits k is small)
- The best case occurs when the number of digits for the numbers being sorted is small, resulting in a linear time complexity based on the number of elements (n).
Worst Case: O(nk) (When the number of digits k is large)
- Radix Sort's time complexity depends on both the number of elements (n) and the number of digits (k). The worst case happens when k is large, but it still remains linear in terms of the number of elements.

5. Heap Sort:

Description: Uses a binary heap (priority queue) to sort the array.

Example: Array: [5, 2, 9, 1]

Steps:

Build a max-heap from the array.
Swap the root (maximum element) with the last element.
Heapify the remaining heap to restore the heap property.
Repeat steps 2 and 3 until all elements are sorted.

Visual of Each Step:

Initial Array: [5, 2, 9, 1]
- Max-Heap: [9, 2, 5, 1]
Swap root with last element:
- After swap: [1, 2, 5, 9]
- Heapify: [5, 2, 1]
Swap root with last element:
- After swap: [1, 2, 5]
- Heapify: [2, 1, 5]
Final Swap and Heapify:
- After swap: [1, 2, 5]
- Heapify: [1, 2, 5]

Final Sorted Array: [1, 2, 5, 9]

Best Case: O(n log n) (Heapification is the same regardless of the order of the array)
- Heapify after each swap has a complexity of O(log n), and we perform this operation for each element in the heap. This step takes O(n log n) time in total.
Worst Case: O(n log n) (Same as the best case)
- The time complexity remains O(n log n) for both best and worst cases, as the algorithm always performs a series of heap operations.

2. Searching Techniques

1. Sequential Search

Description: Sequential search (also known as linear search) is the simplest search technique. It involves checking each element of the list one by one, from the beginning to the end, until the target element is found or the list is exhausted.
Example: If you are looking for a specific number in an unsorted list of integers, you would start from the first element and check each subsequent element until the target number is found.
Example: List: [4, 10, 15, 23, 42] Target: 15 You start from 4, then 10, and finally find 15.
Applications:
- Searching in small, unsorted datasets.
- Searching for an item in an array or list where there are no specific ordering requirements.
Time Complexity: O(n) — The time complexity is linear because, in the worst case, you might need to check every element of the list (where n is the number of elements).

2. Binary Search

Description: Binary search is a much more efficient search algorithm that works only on sorted arrays or lists. It repeatedly divides the search interval in half. After each division, it checks whether the target is less than, greater than, or equal to the middle element, effectively halving the search space in each step.
Example: If you want to search for 15 in the sorted array [4, 10, 15, 23, 42], the binary search would:
- Compare 15 with the middle element, 15.
- Since the middle element is the target, the search stops.
Applications:
- Searching in large datasets or databases where the list is already sorted.
- Used in binary search trees (BST) for fast lookups.
Time Complexity: O(log n) — The time complexity is logarithmic because the size of the search space is halved in each step, resulting in fewer steps to find the target element.

3. Tree Search

Description: Tree search refers to searching in a tree structure, like a binary search tree (BST) or other types of trees. In this method, you start at the root and traverse through the tree nodes based on comparisons with the target value. At each node, the search proceeds to the left or right child depending on whether the target value is smaller or larger than the current node’s value.
Example: If you are searching for 15 in the following binary search tree:
```
    20
   /  \
 10    30
  \      \
  15     40
```
- Start at the root, 20. Since 15 is smaller, move to the left child (10).
- Compare 15 with 10, and since 15 is larger, move to the right child (15), which is the target.
Applications:
- Binary Search Trees (BSTs) for efficient searching, insertion, and deletion.
- Database indexing and search operations (e.g., B-trees, AVL trees).
- Game AI in decision trees or move search.
Time Complexity:
- O(log n) for balanced trees (e.g., AVL or Red-Black trees), as the tree height is logarithmic relative to the number of elements.
- O(n) in the worst case for unbalanced trees, as the tree might degenerate into a linked list.

3. Hashing Techniques

Hashing is a technique used to quickly locate a data record given its search key. It involves using a hash function to map data to a fixed-size value called a hash code, and storing this hash code in a hash table.

1. Hash Function

Description: A hash function is a function that takes an input (or "key") and returns a fixed-size string of bytes, typically a hash code. The hash code is used to uniquely identify the data or key, and it's used to index the data in a hash table.
Example: If you want to store a person's name in a hash table, the hash function might take their name (e.g., "Alice") and convert it into a hash code like 9876534, which will be used as an index to store or retrieve the data from the hash table.
Applications:
- Data retrieval: Efficiently finding data based on a unique key (e.g., looking up a phone number by a name).
- Hash-based data structures like hash tables, hash maps, and dictionaries.
- Cryptographic functions (e.g., in digital signatures or blockchain systems).

2. Hash Tables

Description: A hash table (or hash map) is a data structure that stores key-value pairs. It uses a hash function to compute an index (or hash code) into an array of slots, from which the desired value can be found. The main advantage of hash tables is that they provide fast insertion, deletion, and search operations, ideally with O(1) time complexity.
Example: In a phonebook application, the name of the person (e.g., "Alice") could be the key, and the corresponding phone number (e.g., "555-1234") would be the value. A hash table could store this pair in an index location derived from the hash code of the name "Alice". This allows quick look-up when you search for the phone number by name.
Applications:
- Dictionaries and hash maps in programming languages (e.g., Python's dict or Java's HashMap).
- Database indexing: Storing and retrieving data in an indexed form.
- Caching: Storing results of expensive function calls for quick retrieval.
Time Complexity:
- O(1) for average cases (insertion, deletion, and search).
- O(n) in the worst case (if all keys hash to the same value, causing a collision).

3. Collision Resolution Techniques

A collision occurs when two different keys hash to the same index. Collisions need to be resolved to ensure that both keys can be stored and retrieved correctly.

A. Chaining

Description: In chaining, each slot in the hash table contains a linked list (or another data structure) that holds all the keys that hash to the same index. When a collision occurs, the new key is simply added to the linked list at the corresponding slot.
Example: Let's assume you have a hash table of size 5 and a hash function that returns the following hash values for keys:
- Key "Alice" -> Hash code: 1
- Key "Bob" -> Hash code: 1
- Key "Charlie" -> Hash code: 3
Since both "Alice" and "Bob" hash to index 1, they are stored in a linked list at that index, like this:
```
Hash Table:
[0] -> null
[1] -> Alice -> Bob
[2] -> null
[3] -> Charlie
[4] -> null
```
Applications:
- Useful when the hash table has a low load factor (few collisions).
- Used in most basic implementations of hash tables.
Time Complexity:
- Average case: O(1) for search, insertion, and deletion (assuming a good hash function and low collision rate).
- Worst case: O(n) for search, insertion, and deletion (when all keys hash to the same value).

B. Open Addressing

Description: Open addressing resolves collisions by probing (or searching) for the next available slot in the table. When a collision occurs, the algorithm searches for another empty slot in the hash table and inserts the key there. Several probing methods exist, such as linear probing, quadratic probing, and double hashing.
- Linear Probing: When a collision occurs at index i, the algorithm checks the next slot (i+1), and so on, until an empty slot is found.
- Quadratic Probing: It checks slots at i+1², i+2², and so on, reducing clustering.
- Double Hashing: A second hash function is used to calculate the next probe index.
Example: If the hash table size is 5, and the hash function for the key "Alice" produces index 1, and the key "Bob" also hashes to index 1, linear probing will attempt the next slot (2), and store Bob there.
```
Hash Table with Linear Probing:
[0] -> null
[1] -> Alice
[2] -> Bob
[3] -> null
[4] -> null
```
Applications:
- Suitable for hash tables with a high load factor.
- Used in environments where the hash table size is fixed, and resizing is not feasible.
Time Complexity:
- Average case: O(1) for search, insertion, and deletion (with low load factor).

4. Graphs

Graphs are mathematical structures used to model pairwise relations between objects. They are used in applications like social networks, transportation systems, and routing algorithms.

Graph Representation:

Adjacency Matrix: A 2D array where the element at (i, j) indicates if there's an edge between nodes i and j.
Adjacency List: A collection of lists or arrays where each list represents a node and contains the nodes to which it is connected.

Graph Algorithms

1. Traversal Algorithms

These algorithms are used to explore all the nodes and edges in a graph.

Depth-First Search (DFS): Explores a graph by going as deep as possible down a branch before backtracking.
- Time Complexity: O(V + E)
- Applications:
  - Pathfinding in games and puzzles.
  - Solving mazes.
  - Topological sorting in task scheduling.
Breadth-First Search (BFS): Explores a graph level by level, starting from a source node.
- Time Complexity: O(V + E)
- Applications:
  - Finding the shortest path in unweighted graphs (e.g., in a social network).
  - Web crawlers (exploring websites level by level).
  - Broadcasting in networks.

2. Shortest Path Algorithms

These algorithms are used to find the shortest path between nodes in a graph.

Greedy Algorithms These algorithms make the locally optimal choice at each step to find the global optimum.
- Dijkstra’s Algorithm: Finds the shortest path from a source node to all other nodes in a graph with non-negative weights.
  - Time Complexity: O(V²) for basic implementation, O(E log V) with a priority queue.
  - Applications:
    GPS and navigation systems.
    Network routing (e.g., data packet transmission in the internet).
- A Search Algorithm*: Combines Dijkstra’s algorithm with heuristics to prioritize the search based on estimated distances.
  - Time Complexity: O(E), where E is the number of edges.
  - Applications:
    Video game AI pathfinding.
    Robotics and automated vehicle navigation.
Dynamic Programming Algorithms These algorithms break down the problem into simpler subproblems, storing the results of these subproblems.
- Bellman-Ford Algorithm: Finds the shortest path in graphs with negative weights, and can also detect negative weight cycles.
  - Time Complexity: O(V * E)
  - Applications:
    Financial modeling (e.g., shortest path in network of investments with potential losses).
    Detection of negative weight cycles in graphs.
- Floyd-Warshall Algorithm: Finds the shortest paths between all pairs of nodes in a graph.
  - Time Complexity: O(V³)
  - Applications:
    Transitive closure (e.g., finding if a person is indirectly connected to another in a social network).
    All-pairs shortest path in routing and networking.

3. Minimum Spanning Tree (MST) Algorithms

These algorithms are used to find a subset of the edges that connect all the vertices in a weighted graph, without forming any cycles, and with the minimum total edge weight.

Prim’s Algorithm: A greedy algorithm that grows the MST by adding the smallest edge connecting the tree to an outside vertex.
- Time Complexity: O(E log V) with priority queues.
- Applications:
  - Network design (e.g., electrical grids, communication networks).
  - Approximate solutions to cluster problems (e.g., data clustering).
Kruskal’s Algorithm: Adds edges in increasing weight order to form the MST, ensuring no cycles are formed.
- Time Complexity: O(E log E), or O(E log V) after sorting edges.
- Applications:
  - Designing minimal-cost infrastructure (e.g., telecommunication lines, transportation networks).

4. Graph Search and Cycle Detection Algorithms

These algorithms are used to detect cycles or explore specific features in a graph.

Cycle Detection Used to detect cycles in a graph.
- DFS-based Cycle Detection: Detects cycles in a directed graph by checking back edges during a DFS traversal.
  - Time Complexity: O(V + E)
  - Applications:
    Detecting deadlocks in operating systems.
    Dependency resolution in compilers.
    Verifying the integrity of network connections.
- Union-Find (Disjoint Set): Detects cycles in an undirected graph by checking whether adding an edge would create a cycle.
  - Time Complexity: O(α(V)), where α is the inverse Ackermann function.
  - Applications:
    Network connectivity checking.
    Kruskal’s Algorithm for MSTs.

5. Topological Sorting Algorithms

These algorithms are used to arrange vertices in a directed acyclic graph (DAG) in a linear order such that for every directed edge from vertex u to vertex v, u comes before v.

Topological Sort: A linear ordering of vertices such that for every directed edge uv, vertex u comes before vertex v.
- Time Complexity: O(V + E)
- Applications:
  - Task scheduling (e.g., project management).
  - Resolving symbol dependencies in compilers.

6. Connected Components and Component Detection Algorithms

These algorithms are used to find and explore the connected components in a graph.

Connected Components in an Undirected Graph: Identifies all connected components in an undirected graph using DFS or BFS.
- Time Complexity: O(V + E)
- Applications:
  - Social network analysis.
  - Community detection in large graphs.
Biconnected Components: Identifies maximal biconnected components in a graph, useful in network design and reliability analysis.
- Time Complexity: O(V + E)
- Applications:
  - Network reliability and robustness analysis.

Conclusion

Sorting and searching are foundational techniques in computer science, providing efficient methods for data organization, retrieval, and analysis, with algorithms optimized for various scenarios based on time and space complexity.
Graph structures and algorithms, such as DFS, BFS, and MSTs, are indispensable in solving real-world problems like networking, routing, and scheduling, showcasing their versatility and importance.
Mastering these concepts is essential for efficient problem-solving and forms the backbone of advanced computational and data-driven applications.

Previous7.1 Introduction to Data Structures, Lists, Linked Lists, and Trees Next7.3 Introduction to Data Models, Normalization, and SQL

Last updated 7 days ago