IDG Contributor Network: Data science in service of detection vs. investigation

Security companies regularly extol the use of data science, machine learning and artificial intelligence in their products. There are important conceptual differences when applying these approaches to different cybersecurity use cases.

To start with, I want to clarify the meaning behind these terms.

  • Data science involves using fancy math to extract knowledge from data, either via machine learning or using more straightforward data analytics techniques.
  • Machine learning involves employing an algorithm to construct a model that is trained to recognize patterns by feeding (usually large amounts of) data into it. Models may be supervised (fed labeled data and usually tuned by a data scientist) or unsupervised (trained on unlabeled data and identifying anomalies and outliers).
  • Artificial intelligence is a more abstract concept that can involve techniques like machine learning but can also include approaches such as interviewing subject matter experts and writing code to approximate their thought processes (we used to call this “expert systems”).

Because artificial intelligence can be a somewhat abstract concept, I will focus on the application of data analytics and machine learning to the threat detection and incident investigation use cases.

Threat detection

Data science is broadly applicable to a number of different threat detection problems: