Context and problematic
As part of the work of the security team, the client needs a tool to help in detecting these “beaconing” cases. This traffic at regular intervals is sent by the victim’s network to an infrastructure controlled by the adversary. The latter could be a sign of a malware virus or a compromised host performing data exfiltration.
Goals
The goal of the project is to create a machine learning beaconing case detection system, which is capable of processing huge amounts of data.
This system will suggest potential beaconing domains that the experts can check.
Our intervention
2 Data Scientist
- Extraction of data from proxy logs to scan and clean them and then use them to create the required features.
- The calculated features are of two types: aggregations by client/host/date, daily and aggregations by host over a given period which will serve as a history
- These features serve as training data for several anomaly detection models
- Modelization
- Implementation of an assessment system which simulates the real Use-Case
Results
Based on the available data, the system obtains promising performance, but the number of false positives remains rather high. To let a team of experts, manage alerts in a reasonable amount of time, this point will need to be improved.
We are currently digging the feature engineering deeper, with the help of security experts to improve performance
Technical environment
Python, Jupyter Hub
Spark (PySpark), H2O (PySparkling) for Git/GitHub modelization
Scrum