Skip to main content
SearchLoginLogin or Signup

Applications of network theory to data analysis

Published onAug 31, 2019
Applications of network theory to data analysis

Applications of network theory to data analysis

@ Liubov Tupikina, Ilya Zakharov.

Illustration is generated by L.T. illustrating Babel graph on embedded into 2D.


Today in many fields of science the key research objects are often understood as the complex systems with a number of interactions within it. The mathematical graph theory today is becoming the popular tool to describe and investigate the characteristics of such type of systems. The basic idea of graph theoretical approach is to represent the elements of the system as nodes and the interactions between them - as edges. The whole system in this case can be called graph, or in a more popular way, a network. According to graph theory, each network can be described with a number of specific characteristics (or metrics), that describes its’ global or local structure.

The main question in applying graph theory to complex system is how exactly you define nodes and edges. The example that is familiar to everybody today is GIS (geoinformatical system). In the GIS objects on a map (e.g. building, such as shop or house) can be represented as nodes, and the roads between them - as edges. The graph theoretical approach can then be used, for example, to define the optimal (e.g. shortest) route between the pair of houses. Another example is internet, world-wide web. If we are talking about physical organisation of the internet, the nodes are separate computers (or big data servers) and the edges are the wires between them (e.g. optical fibres, connecting computers in a network). The question to be asked can be how to avoid bottlenecks within such systems. If we are talking about the internet as the virtual space, the nodes can be web-pages, the edges - hyperlinks from one page to another, and the question to be asked - how important is the specific web-page for the whole system (what’s it rank).

There is one problem, which one can find in various fields of science: climatology, neuroscience, IT: cloud service analysis, medical research.

The problem is phrased as follows:

given data with N (let say 10000) data points and the variation of these time points, we need to find out the easy representation of these time-series.

For simplicity, we can talk about example from neuroscience:

imagine we measure 10 000 points from a head of a patient during time T=10 hours and we want to investigate what is illness which this patient has. There are many ways to tackle this problem.

* These time-series (time-series is series of numbers, which one gets, when measuring something from a data-point, for example, for neuroscience this can be EEG signal). These signals are varying with some periodicity.

* We should also mention what is a data-point. It is usually some point from an artificially created separation of a human brain. These data-points may be, for example, electrodes positions regularly distributed on a brain surface.

Existing methods


Usually before event to start to analyze these data one needs to do so-called analysis of independent (principal) components, PCA analysis.

In fact, these methods are based on principles of linear algebra, field of mathematics, which studies matrix theory.


After PCA method for reprocessing the data one can apply very common method, such as Granger causality. Granger causality is method, which uses simple idea:

we take two time-series, T1(t) and T2(t). Then we fix the timestamp t0. If we can to predict the future of T1(t>t0) from the past of T2(t<t0), and at the same time we cannot infer the other way round: future for T2(t>t0) from T1(t<t0), then one can interpret it as T2 causes T1.

Moreover, there are many other methods, such as PCMCI, MCI and others.

Then one can ask, what is the suitable timestamp t0, which we should use so that our prediction has any physical meaning.

Choosing the right time-window [0,t0] for predictions is quite challenging question. Often scientists are using some underlying assumptions about the system, which can help to predict the right choice of t0. For example, in brain research this may be duration of signal transfer between two zones of brain and other physical properties.

Quite often methods from one field can be used for other fields. This happened for climatology and neuroscience:
two fields, which may look standing quite far from each other are using similar methods. To name but a few: event synchronisation methods [Quiroga, N.Malik] and functional networks framework [picture is taken from the article J.Donges et al.]

Illustration of methods

Workflow of functional network analysis illustrated for climate networks [modified from T.Nocke et al.].

In step 1, a discretized time series representation of the fields of interest is chosen that is usually prescribed by the available gridded or station data.
Step 2 includes time series preprocessing and the computation of similarity measures SijS_{ij} for quantifying statistical inter-dependencies between pairs of climatological time-series. In step 3, the construction of a climate network from the similarity matrix (causality or correlation matrix) S typically involves some thresholding criterion (see Nocke et al.40 and Tominski et al.42 for details on the climate network shown here that was visualized using the software CGV41).
In step 4, the obtained climate network is investigated drawing on the tools of complex network theory. Finally, in step 5, the results of this analysis need to be interpreted in terms of the underlying dynamical Earth system.

(work in progress)


  1. J.Donges et al. Chaos 2015

  2. T.Nocke et al. 2015

No comments here
Why not start the discussion?