What is Graphs Theory

Mathematics

In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices (also called nodes or points) which are connected by edges (also called links or lines).

Computer Science

it is considered an abstract data type that is really good for representing connections or relations – unlike the tabular data structures of relational database systems, which are ironically very limited in expressing relations.

Graph Data Structure

A graph is a non-linear data structure consisting of nodes and edges. The nodes are sometimes also referred to as vertices and the edges are lines or arcs that connect any two nodes in the graph. More formally a Graph can be defined as, a Graph consists of a finite set of vertices(or nodes) and set of Edges which connect a pair of nodes.

In graph theory, a graph is a mathematical structure consisting of a set of objects, called vertices or nodes, and a set of connections, called edges, which link pairs of vertices. The notation:

\[ G = (V, E) \]

is used to represent a graph, where \(G\) is the graph, \(V\) is the set of vertices, and \(\bigvee\) is the set of edges.

The nodes of a graph can represent any objects, such as cities, people, web pages, or molecules, and the edges represent the relationships or connections between them.

import networkx as nx
import matplotlib.pyplot as plt

G = nx.Graph()
G.add_edges_from([('A', 'B'), ('A', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'F'), ('C', 'G')])

plt.axis('off')
nx.draw_networkx(G,
                 pos=nx.spring_layout(G, seed=0),
                 node_size=600,
                 cmap='coolwarm',
                 font_size=14,
                 font_color='white',
                 )
plt.show()

/home/docs/checkouts/readthedocs.org/user_builds/ml-math/envs/latest/lib/python3.11/site-packages/networkx/drawing/nx_pylab.py:450: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  node_collection = ax.scatter(

../_images/5afa37be0e9b62098dc24608fe8d03811b2dd74f5dea94e01a21d258b1081dc8.png

Graphs Terminology

The following are the most commonly used terms in graph theory with respect to graphs:

Vertex - A vertex, also called a “node”, is a fundamental part of a graph. In the context of graphs, a vertex is an object which may contain zero or more items called attributes.
Edge - An edge is a connection between two vertices. An edge may contain a weight/value/cost.
Path - A path is a sequence of edges connecting a sequence of vertices.
Cycle - A cycle is a path of edges that starts and ends on the same vertex.
Weighted Graph/Network - A weighted graph is a graph with numbers assigned to its edges. These numbers are called weights.
Unweighted Graph/Network - An unweighted graph is a graph in which all edges have equal weight.
Directed Graph/Network - A directed graph is a graph where all the edges are directed.
Undirected Graph/Network - An undirected graph is a graph where all the edges are not directed.
Adjacent Vertices - Two vertices in a graph are said to be adjacent if there is an edge connecting them.

Types of Graphs

There are many types of graphs:

Directed Graphs
Undirected Graphs
Weighted Graph
Cyclic Graph
Acyclic Graph
Directed Acyclic Graph

Graphs have many properties, including the direction of travel for each relationship. The decision of which graph type you use usually depends on the use case.

There may not be an explicit direction from one node to another, such as with friendships in a social network, but in others, there might be clearly defined directions, such as flights and airports dataset.

Directed Graphs

In a directed graph, all the edges are directed. That means, each edge is associated with a direction. For example, if there is an edge from node A to node B, then the edge is directed from A to B and not the other way around.

Directed graph, also called a digraph.

https://media.geeksforgeeks.org/wp-content/cdn-uploads/SCC1.png

DG = nx.DiGraph()
DG.add_edges_from([('A', 'B'), ('A', 'C'), ('B', 'D'),  
('B', 'E'), ('C', 'F'), ('C', 'G')])
nx.draw_networkx(DG, pos=nx.spring_layout(DG, seed=0), node_size=600, cmap='coolwarm', font_size=14, font_color='white')
plt.show()

../_images/b9c0bc6a7dc75ab7de7bf1b7ca6be88c472b3395870a611900403cff9648c9d5.png

Undirected Graphs

In an undirected graph, all the edges are undirected. That means, each edge is associated with a direction. For example, if there is an edge from node A to node B, then the edge is directed from A to B and not the other way around.

G = nx.Graph()
G.add_edges_from([('A', 'B'), ('A', 'C'), ('B', 'D'),  
('B', 'E'), ('C', 'F'), ('C', 'G')])

nx.draw_networkx(G, pos=nx.spring_layout(G, seed=0), node_size=600, cmap='coolwarm', font_size=14, font_color='white')
plt.show()

../_images/832aee7ac81abcb2b4bc84588c290c491b78097c5eaa67716e7470f305e67926.png

Weighted Graph

In a weighted graph, each edge is assigned a weight or a cost. The weight can be positive, negative or zero. The weight of an edge is represented by a number. A graph G= (V, E) is called a labeled or weighted graph because each edge has a value or weight representing the cost of traversing that edge.

WG = nx.Graph()
WG.add_edges_from([('A', 'B', {"weight": 10}), ('A', 'C', {"weight": 20}), ('B', 'D', {"weight": 30}), ('B', 'E', {"weight": 40}), ('C', 'F', {"weight": 50}), ('C', 'G', {"weight": 60})])
labels = nx.get_edge_attributes(WG, "weight")

Cyclic Graph

A graph is said to be cyclic if it contains a cycle. A cycle is a path of edges that starts and ends on the same vertex. A graph that contains a cycle is called a cyclic graph.

Acyclic Graph

When there are no cycles in a graph, it is called an acyclic graph.

Directed Acyclic Graph

It’s also known as a directed acyclic graph (DAG), and it’s a graph with directed edges but no cycle. It represents the edges using an ordered pair of vertices since it directs the vertices and stores some data.

Trees

A tree is a special type of graph that has a root node, and every node in the graph is connected by edges. It’s a directed acyclic graph with a single root node and no cycles. A tree is a special type of graph that has a root node, and every node in the graph is connected by edges. It’s a directed acyclic graph with a single root node and no cycles.

Biprartite Graph

A bipartite graph is a graph whose vertices can be divided into two independent sets, U and V, such that every edge (u, v) either connects a vertex from U to V or a vertex from V to U. In other words, for every edge (u, v), either u belongs to U and v to V, or u belongs to V and v to U. We can also say that there is no edge that connects vertices of same set.

Examples of Bipartite Graphs

Authors-to-Papers (they authored)
Actors-to-Movies (they appeared in)
Users-to-Movies (they rated)
Recipes-to-Ingredients (they contain)
Author collaboration networks
Movie co-rating networks

Projections of Bipartite Graphs

Homogeneous graph

Homogeneous graph consists of one type of nodes and edges, and heterogeneous graph has multiple types of nodes or edges.

An example of a homogeneous graph is an online social network with nodes representing people and edges representing friendship. means we have same node for all types of node and similary all edges are same.

Heterogeneous graph

Heterogeneous graphs come with different types of information attached to nodes and edges. Thus, a single node or edge feature tensor cannot hold all node or edge features of the whole graph, due to differences in type and dimensionality.

A graph with two or more types of node and/or two or more types of edge is called heterogeneous. An online social network with edges of different types, say ‘friendship’ and ‘co-worker’, between nodes of ‘person’ type is an example of a heterogeneous

Node degrees

The degree of a node is the number of edges connected to it. A node with no edges is called an isolated node. A node with only one edge is called a leaf node.

Degree of a vertex/Node

The degree of a vertex is the number of edges incident to it. In the following figure, the degree of vertex A is 3, the degree of vertex B is 4, and the degree of vertex C is 2.

In-Degree and Out-Degree of a Vertex/node

In a directed graph, the in-degree of a vertex is the number of edges that are incident to the vertex. The out-degree of a vertex is the number of edges that are incident to the vertex.

G = nx.Graph()
G.add_edges_from([('A', 'B'), ('A', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'F'), ('C', 'G')])
print(f"deg(A) = {G.degree['A']}")
DG = nx.DiGraph()
DG.add_edges_from([('A', 'B'), ('A', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'F'), ('C', 'G')])

print(f"deg^-(A) = {DG.in_degree['A']}")
print(f"deg^+(A) = {DG.out_degree['A']}")

Danger

Drawsback of Node Degree feature

The degree k of node v is the number of edges (neighboring nodes) the node has
Treats all neighboring nodes equally.

Node Centrality

Centrality quantifies the importance of a vertex or node in a network. It helps us to identify key nodes in a graph based on their connectivity and influence on the flow of information or interactions within the network.

There are several measures of centrality, each providing a different perspective on the importance of a node:

Degree centrality
Engienvector centrality
Betweenness centrality
Closeness centrality

Note

Node degree counts the neighboring nodes without capturing their importance.
Node centrality \(c_v\) takes the node importance in a graph into account

Graph Representation

There are two ways to represent a graph:

Adjacency Matrix
Edge List
Adjacency List

Each data structure has its own advantages and disadvantages that depend on the specific application and requirements.

Adjacency Matrix

In an adjacency matrix, each row represents a vertex and each column represents another vertex. If there is an edge between the two vertices, then the corresponding entry in the matrix is 1, otherwise it is 0. The following figure shows an adjacency matrix for a graph with 4 vertices.

Calculating the degree of node in adjacency matrix

Sum all the 1 in the row corresponding to the node. The sum of the elements in the row corresponding to the node is the degree of the node.

Drawbacks of adjacency matrix

The adjacency matrix representation of a graph is not suitable for a graph with a large number of vertices. This is because the number of entries in the matrix is proportional to the square of the number of vertices in the graph.
The adjacency matrix representation of a graph is not suitable for a graph with parallel edges. This is because the matrix can only store a single value for each pair of vertices.
One of the main drawbacks of using an adjacency matrix is its space complexity: as the number of nodes in the graph grows, the space required to store the adjacency matrix increases exponentially. adjacency matrix has a space complexity of \(O\left(|V|^2\right)_{\text {, where }}|V|_{\text{repre- }}\) sents the number of nodes in the graph.

Overall, while the adjacency matrix is a useful data structure for representing small graphs, it may not be practical for larger ones due to its space complexity. Additionally, the overhead of adding or removing nodes can make it inefficient for dynamically changing graphs.

Edge list

An edge list is a list of all the edges in a graph. Each edge is represented by a tuple or a pair of vertices. The edge list can also include the weight or cost of each edge. This is the data structure we used to create our graphs with networkx:

edge_list = [(0, 1), (0, 2), (1, 3), (1, 4), (2, 5), (2, 6)]

checking whether two vertices are connected in an edge list requires iterating through the entire list, which can be time-consuming for large graphs with many edges. Therefore, edge lists are more commonly used in applications where space is a concern.

Adjacency List

In an adjacency list, each vertex stores a list of adjacent vertices. The following figure shows an adjacency list for a graph with 4 vertices.

However, checking whether two vertices are connected can be slower than with an adjacency matrix. This is because it requires iterating through the adjacency list of one of the vertices, which can be time-consuming for large graphs.

Graph Traversals

Traversal means to walk along the edges of a graph in specific ways.

Graph algorithms are critical in solving problems related to graphs, such as finding the shortest path between two nodes or detecting cycles. This section will discuss two graph traversal algorithms: BFS and DFS.

Graph traversal is the process of visiting (checking and/or updating) each vertex in a graph, exactly once. Such traversals are classified by the order in which the vertices are visited. The order may be defined by a specific rule, for example, depth-first search and breadth-first search.

Link: https://medium.com/basecs/breaking-down-breadth-first-search-cebe696709d9

While DFS uses a stack data structure, BFS leans on the queue data structure.

Depth First Search

We know that depth-first search is the process of traversing down through one branch of a tree until we get to a leaf, and then working our way back to the “trunk” of the tree. In other words, implementing a DFS means traversing down through the subtrees of a binary search tree.

DFS is a recursive algorithm that starts at the root node and explores as far as possible along each branch before backtracking.

It chooses a node and explores all of its unvisited neighbors, visiting the first neighbor that has not been explored and backtracking only when all the neighbors have been visited. By doing so, it explores the graph by following as deep a path from the starting node as possible before backtracking to explore other branches. This continues until all nodes have been explored.

In depth first, Continue down the edges from the start vertex to the last vertex on that path or until the maximum traversal depth is reached, then walk down the other paths.

DFS Algorithm goes ‘deep’ instead of ‘wide’

https://miro.medium.com/v2/resize:fit:1400/1*LUL63FWqneOfsLKqMtHKFw.gif

In depth-first search, once we start down a path, we don’t stop until we get to the end. In other words, we traverse through one branch of a tree until we get to a leaf, and then we work our way back to the trunk of the tree.

Stack data structure is used to implement DFS. The algorithm starts with a particular node of a graph, then goes to any of its adjacent nodes and repeats this process until it finds the goal. If it reaches a node from which there is no unexplored edge leading to an unvisited node, then it backtracks to the last visited node and repeats the process.

G = nx.Graph()
G.add_edges_from([('A', 'B'), ('A', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'F'), ('C', 'G')])

visited = []

def dfs(visited, graph, node):
    if node not in visited:
        visited.append(node)
    # We then iterate through each neighbor of the current node. For each neighbor, we recursively call the dfs() function
    # passing in visited, graph, and the neighbor as arguments:
        for neighbor in graph[node]:
            visited = dfs(visited, graph, neighbor)
    # The dfs() function continues to explore the graph depth-first, visiting all the neighbors of each node until there
    # are no more unvisited neighbors. Finally, the visited list is returned
    return visited

dfs(visited, G, 'A')

['A', 'B', 'D', 'E', 'C', 'F', 'G']

Once again, the order we obtained is the one we anticipated in Figure. DFS is useful in solving various problems, such as finding connected components, topological sorting, and solving maze problems. It is particularly useful in finding cycles in a graph since it traverses the graph in a depth-first order, and a cycle exists if, and only if, a node is visited twice during the traversal.

Additionally, many other algorithms in graph theory build upon BFS and DFS, such as Dijkstra’s shortest path algorithm, Kruskal’s minimum spanning tree algorithm, and Tarjan’s strongly connected components algorithm. Therefore, a solid understanding of BFS and DFS is essential for anyone who wants to work with graphs and develop more advanced graph algorithms.

Breadth First Search

Breadth First Search (BFS) is an algorithm for traversing or searching tree or graph data structures. It starts at the tree root (or some arbitrary node of a graph, sometimes referred to as a ‘search key’[1]), and explores the neighbor nodes first, before moving to the next level neighbors.

It works by maintaining a queue of nodes to visit and marking each visited node as it is added to the queue. The algorithm then dequeues the next node in the queue and explores all its neighbors, adding them to the queue if they haven’t been visited yet.

Follow all edges from the start vertex to the next level, then follow all edges of their neighbors by another level and continue this pattern until there are no more edges to follow or the maximum traversal depth is reached.

Let’s now see how we can implement it in Python

def bfs(graph, node):
    # We initialize two lists (visited and queue) and add the starting node. The visited list keeps track of the nodes that have been
    # visited #during the search, while the queue list stores the nodes that need to be visited:

    visited, queue = [node], [node]

    while queue:
    # When We enter a while loop that continues until the queue list is empty.
    # Inside the loop, we remove the first node in the queue list using the pop(0) method and store the result in the node variable

        node = queue.pop(0)
        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.append(neighbor)
                queue.append(neighbor)
    # We iterate through the neighbors of the node using a for loop. For each neighbor that has not been visited yet,
    # we add it to the visited list and to the end of the queue list using the append() method. When it’s complete,
    # we return the visited list:
    return visited

bfs(G, 'A')

['A', 'B', 'C', 'D', 'E', 'F', 'G']

The order we obtained is the one we anticipated in Figure.

BFS is particularly useful in finding the shortest path between two nodes in an unweighted graph. This is because the algorithm visits nodes in order of their distance from the starting node, so the first time the target node is visited, it must be along the shortest path from the starting node.

Both algorithms return the same amount of paths if all other traversal options are the same, but the order in which edges are followed and vertices are visited is different.

Depth-first	Breadth-first
\(S \rightarrow A\)	\(S \rightarrow A\)
\(S \rightarrow A \rightarrow D\)	\(S \rightarrow B\)
\(S \rightarrow A \rightarrow E\)	\(S \rightarrow C\)
\(S \rightarrow B\)	\(S \rightarrow A \rightarrow D\)
\(S \rightarrow B \rightarrow F\)	\(S \rightarrow A \rightarrow E\)
\(S \rightarrow C\)	\(S \rightarrow B \rightarrow F\)

Note that there is no particular order in which the edges of a single vertex are followed. Hence, S→C may be returned before S→A and S→B. Shorter paths are returned before longer paths using a breadth-first search still.

Topological Sort

Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for every directed edge uv, vertex u comes before v in the ordering. Topological Sorting for a graph is not possible if the graph is not a DAG.

Graph Algorithms

Graph algorithms are used to solve problems that involve graphs. Graph algorithms are used to find the shortest path between two nodes, find the minimum spanning tree, find the strongly connected components, find the shortest path from a single node to all other nodes, find the bridges and articulation points, find the Eulerian path and circuit, find the maximum flow, find the maximum matching, find the biconnected components, find the Hamiltonian path and circuit, find the dominating set, find the shortest path from a single node to all other nodes, find the bridges and articulation points, find the Eulerian path and circuit, find the maximum flow, find the maximum matching, find the biconnected components, find the Hamiltonian path and circuit, find the dominating set, etc.