Theory - [T1, T2, T3]
General description
Statistics can be defined as the discipline of developing methods to collect, analyze, interpret and finally present data. We can therefore observe that the final object of statistics is to obtain a deeper knowledge of a population from an initial Dataset that describe a certain attribute of interest.
Variables and Attributes
When a phenomenon is to be studied, it is very important to choose what data we want to gather from the population (or sample) that is to be analysed.
A statistical attribute is a characteristic of an object that we want to study, while variables are the logical set of values that each attribute can assume. For example, age is an attribute that can be operationalized in many ways, such as, with only two values: “old” and “young”. In this case the attribute “age” is operationalized as a binary variable.
Variables can be classified as quantitative or qualitative:
- Qualitative variables take on values that are names or labels, for example the color of a ball (red, green)
- Quantitative variables are numeric, and represent a measurable quantity. These variables can be either discrete, if the numbers of values they can take is fixed, or continuous, if they can take all numbers between the maximum and minimum value.
Dataset
A dataset is a collection of data, that allows us to describe formally a population (or sample) that is analysed. It is mathematically represented as a matrix, which each row representing a statistical unit and each column representing how each attribute is expressed through all the units in the dataset.
Applications in Cybersecurity
The newly adopted usage of statistics in the field of Cybersecurity can be considered quite a novelty. In fact in the past years this science has been applied to achieve active protection of computer systems.
Detection of anomalies
An interesting concept is the usage of statistics for the detection of anomalies in a system. In particular it can be used to profile the typical user behaviour and detect when a particular set of collected data is much different from the typical profile.
This process is usually achieved through the analysis of big data, and other interesting applications of statistics that can be seen across the whole industry of information technology.
Cryptanalytic attacks
Another interesting usage of statistics can be found in the execution of statistical attacks on certain cryptographic systems.
As an example we can see how frequency analysis can be used to break crypto systems such as vigenere or xor. These are very simple examples that display the power of statistics in carrying out cryptoanalytic attacks.
Sources
- https://en.wikipedia.org/wiki/Statistics
- https://en.wikipedia.org/wiki/Variable_and_attribute_(research)
- https://www.stat.uci.edu/what-is-statistics/
- https://www.imperial.ac.uk/statistics/research/statistical-cyber-security/
- https://www.ukessays.com/essays/computer-science/statistical-techniques-for-cryptanalysis.php