C++ library for fast development of (heterogeneous) detection and classification modules for (Encrypted) Network Traffic Analysis - (E)NTA. The library contains the most commonly used methods for ENTA. Therefore, WIF ims to minimize the time between the detection of a new threat and the deployment of tailored module for its detection.
WIF contains following structure and objects:
- Classifiers
- Pattern-matching via Regex
- IP blocklists
- Machine Learning via scikit-learn
- Possible interconnection withALF
- Combinators
- Average
- Dempster-Shafer Theory
- Majority
- Sum
- Reporters
- Unirec
- Data storage classes
- Classification result (ClfResult)
- IP address
- Network IP flow (FlowFeatures)
- Utils
- IP prefix (range or subnet)
- Timer
Classifiers perform traffic classification and threat detection. Combinators are used to fuse weak results together to obtain more robust and accurate result. Reporters are used for additional data exfiltration from modules for increased explainability. Utils are mainky used by other parts of the library but are made available for others as well. For more info, we would kindly refer to the sectionUsing WIF for Development.
Note that Unirec Repoter and ALF Classifier are only available whenBUILD_WITH_UNIREC
is enabled, described in the sectionBuild & Installationbelow.
- Python3.6 or Python3.8 devel with numpy:
- python36-devel, python36-numpy
- python38-devel, python38-numpy
Optionally, for additional features (BUILD_WITH_UNIREC
option must be enabled manually):
- CESNET/Nemea,mainly libtrap, libunirec, and libunirec++
git clone https://github.com/CESNET/WIF.git
cd WIF
make
# For setting installation folders etc.
ccmake build
sudo make install
git clone https://github.com/CESNET/WIF.git
cd WIF
make rpm
sudo rpm -i <pathToRpmOutputtedByPreviousCommand>
Doxygen documentation can be generated by calling:
make docs
WIF-based module should firstly transform data intoFlowFeaturesobject as it is used as input to classifiers.FlowFeaturesis a wrapper overstd::vectorof allowed types, seeDataVariantclass. The recommended use is to receive network flows periodically in a loop and run it through the implemented classification method. Therefore,FlowFeaturesshould keep its layout: contain the same features on the same indexes throught the processing.
#include <wif/storage/flowFeatures.hpp>
WIF::FlowFeatures flow(NUMBER_OF_FLOW_ELEMENTS);
All classifiers share a common interface define by an abstractClassifierclass. Theclassify()takes eitherFlowFeaturesorstd::vector<WIF::FlowFeatures>and performs the classification. However,setSourceFeatureIDs()msut be called before the firstclassify()call. This method sets source indexes of theFlowFeatureswhich will be processed by the classifier.
The return type ofclassify()isClfResult- variant holding eitherdouble(result of IP-blocklist-based detection: 0 or 1) orstd::vector<double>(result of Machine Learning: array of probablities of each class). Each classifier defines what value type theClfResultholds and what it means. The value can be obtained by one of the following calls:
clfResult.get<double>();
clfResult.get<std::vector<double>>();
The code example below shows how to correctly useIpPrefixClassifier:
#include <wif/classifiers/ipPrefixClassifier.hpp>
constexpr WIF::FeatureID SRC_IP_ID = 0;
constexpr WIF::FeatureID DST_IP_ID = 1;
...
std::vector<WIF::IpPrefix> blocklist = { WIF::IpPrefix( "10.0.0.0/28" ) };
WIF::IpPrefixClassifier clf(blocklist);
clf.setSourceFeatureIDs({
SRC_IP_ID,
DST_IP_ID
});
...
while (record = receiveRecord()) {
WIF::FlowFeatures flow(2);
flow.set<WIF::IpAddress>(SRC_IP_ID, extractSrcIp(record));
flow.set<WIF::IpAddress>(DST_IP_ID, extractDstIp(record));
WIF::ClfResult res = clf.classify(flow);
if (res.get<double>() > 0) {
std::cout << "Blocklisted communication detected!" << std::endl;
}
}
Python C API is used for performing ML-based classification inScikitMlClassifierandAlfClassifier.Two files are required to perform this task. The first one is an actual ML model in apickleformat. The second one is calledbridge.This file must contain two functions:
init(model_path)
- Obtains a string with path to the ML model, loads it and returns it
classify(classifier, features)
- Obtains the loaded ML model and 2D array of features to classify, calls
predict_proba()
,and returns the output
- Obtains the loaded ML model and 2D array of features to classify, calls
Examplebridge.py
is shown below and can be used for many tasks:
import pickle
def init(model_path):
with open(model_path, 'rb') as f:
return pickle.load(f)
def classify(classifier, features):
try:
return classifier.predict_proba(features).tolist()
except Exception as e:
print(e)
return []
Combinators perform data combination and fusion. The interface of this object group is defined by abstractCombinatorclass. No prior method must be called before usage, all initialization is performed in constructors. Then,combine()method performs the actual combination.
#include <wif/combinators/averageCombinator.hpp>
constexpr double THRESHOLD_FOR_POSITIVE_DETECTION 0.65
WIF::AverageCombinator avgCom;
...
double averageScore = avgCom.combine({tlsSniScore, mlProba, blocklistScore});
if (averageScore >= THRESHOLD_FOR_POSITIVE_DETECTION) {
std::cout << "Positive detection!" << std::endl;
}
The common interface of Reporters is defined by abstractReporterobject. Currently, the only available reporter isUnirecRepoterwhich is built on top of thelibunirec.The function of reporters can be described as Finite State Automatons (FSM). Firstly,onRecordStart()must be called, to indicate a start of a new message. Then,report(DataVariant)can be called periodically. At the end,onRecordEnd()is called to indicate that the record can be sent to output. However, buffering may be used and record does not need to be sent right away. Useflush()to send out pending records.
In the case ofUnirecReporter,report(DataVariant)method extracts the value held by the passedDataVariant,and it is used as a value of the next unirec field. Then, field ID is incremented and the next call ofreport()will set value to the next field. Therefore, it is required to follow the correct field order.
The example code for usingUnirecReporteris shown below. For the documentation and usage ofNemea
,we would kindly refer the reader to theNemeaandlibunirecrepositories.
const std::string REPORTER_TEMPLATE = "double TLS_SNI_SCORE,uint8 DETECTION_RESULT";
Nemea::UnirecOutputInterface reporterIfc = unirec.buildOutputInterface();
reporterIfc.changeTemplate(REPORTER_TEMPLATE);
WIF::UnirecReporter unirecReporter(reporterIfc);
...
// Notice: the values are reported in the same order as defined in the REPORTER_TEMPLATE
unirecReporter.onRecordStart();
unirecReporter.report(tlsSniScore);
unirecReporter.report(detectionResult);
unirecReporter.onRecordEnd();
unirecReporter.flush();
...
If you have any questions or problems, you are welcomed to send an email to[email protected].
This project is distributed under theBSD-3-Clause license.
© 2024, CESNET z.s.p.o.