Skip to content
/ WIF Public

Weak Indication Framework: Efficient C++ library for (Encrypted) Network Traffic Analysis

License

Notifications You must be signed in to change notification settings

CESNET/WIF

Repository files navigation

WIF - Weak Indication Framework

Description

C++ library for fast development of (heterogeneous) detection and classification modules for (Encrypted) Network Traffic Analysis - (E)NTA. The library contains the most commonly used methods for ENTA. Therefore, WIF ims to minimize the time between the detection of a new threat and the deployment of tailored module for its detection.

WIF contains following structure and objects:

  • Classifiers
    • Pattern-matching via Regex
    • IP blocklists
    • Machine Learning via scikit-learn
      • Possible interconnection withALF
  • Combinators
    • Average
    • Dempster-Shafer Theory
    • Majority
    • Sum
  • Reporters
    • Unirec
  • Data storage classes
    • Classification result (ClfResult)
    • IP address
    • Network IP flow (FlowFeatures)
  • Utils
    • IP prefix (range or subnet)
    • Timer

Classifiers perform traffic classification and threat detection. Combinators are used to fuse weak results together to obtain more robust and accurate result. Reporters are used for additional data exfiltration from modules for increased explainability. Utils are mainky used by other parts of the library but are made available for others as well. For more info, we would kindly refer to the sectionUsing WIF for Development.

Note that Unirec Repoter and ALF Classifier are only available whenBUILD_WITH_UNIRECis enabled, described in the sectionBuild & Installationbelow.

Requirements

  • Python3.6 or Python3.8 devel with numpy:
    1. python36-devel, python36-numpy
    2. python38-devel, python38-numpy

Optionally, for additional features (BUILD_WITH_UNIRECoption must be enabled manually):

Build & Installation

Build from Source

git clone https://github.com/CESNET/WIF.git
cd WIF
make
# For setting installation folders etc.
ccmake build
sudo make install

Build and Install RPM

git clone https://github.com/CESNET/WIF.git
cd WIF
make rpm
sudo rpm -i <pathToRpmOutputtedByPreviousCommand>

Documentation

Doxygen documentation can be generated by calling:

make docs

Using WIF for Development

Preparation

WIF-based module should firstly transform data intoFlowFeaturesobject as it is used as input to classifiers.FlowFeaturesis a wrapper overstd::vectorof allowed types, seeDataVariantclass. The recommended use is to receive network flows periodically in a loop and run it through the implemented classification method. Therefore,FlowFeaturesshould keep its layout: contain the same features on the same indexes throught the processing.

#include <wif/storage/flowFeatures.hpp>
WIF::FlowFeatures flow(NUMBER_OF_FLOW_ELEMENTS);

Classification

All classifiers share a common interface define by an abstractClassifierclass. Theclassify()takes eitherFlowFeaturesorstd::vector<WIF::FlowFeatures>and performs the classification. However,setSourceFeatureIDs()msut be called before the firstclassify()call. This method sets source indexes of theFlowFeatureswhich will be processed by the classifier.

The return type ofclassify()isClfResult- variant holding eitherdouble(result of IP-blocklist-based detection: 0 or 1) orstd::vector<double>(result of Machine Learning: array of probablities of each class). Each classifier defines what value type theClfResultholds and what it means. The value can be obtained by one of the following calls:

clfResult.get<double>();
clfResult.get<std::vector<double>>();

The code example below shows how to correctly useIpPrefixClassifier:

#include <wif/classifiers/ipPrefixClassifier.hpp>

constexpr WIF::FeatureID SRC_IP_ID = 0;
constexpr WIF::FeatureID DST_IP_ID = 1;

...

std::vector<WIF::IpPrefix> blocklist = { WIF::IpPrefix( "10.0.0.0/28" ) };
WIF::IpPrefixClassifier clf(blocklist);
clf.setSourceFeatureIDs({
SRC_IP_ID,
DST_IP_ID
});

...

while (record = receiveRecord()) {
WIF::FlowFeatures flow(2);
flow.set<WIF::IpAddress>(SRC_IP_ID, extractSrcIp(record));
flow.set<WIF::IpAddress>(DST_IP_ID, extractDstIp(record));

WIF::ClfResult res = clf.classify(flow);
if (res.get<double>() > 0) {
std::cout << "Blocklisted communication detected!" << std::endl;
}
}

Scikit-learn Interconnection

Python C API is used for performing ML-based classification inScikitMlClassifierandAlfClassifier.Two files are required to perform this task. The first one is an actual ML model in apickleformat. The second one is calledbridge.This file must contain two functions:

  1. init(model_path)
    • Obtains a string with path to the ML model, loads it and returns it
  2. classify(classifier, features)
    • Obtains the loaded ML model and 2D array of features to classify, callspredict_proba(),and returns the output

Examplebridge.pyis shown below and can be used for many tasks:

import pickle


def init(model_path):
with open(model_path, 'rb') as f:
return pickle.load(f)


def classify(classifier, features):
try:
return classifier.predict_proba(features).tolist()
except Exception as e:
print(e)
return []

Combination

Combinators perform data combination and fusion. The interface of this object group is defined by abstractCombinatorclass. No prior method must be called before usage, all initialization is performed in constructors. Then,combine()method performs the actual combination.

#include <wif/combinators/averageCombinator.hpp>
constexpr double THRESHOLD_FOR_POSITIVE_DETECTION 0.65

WIF::AverageCombinator avgCom;
...

double averageScore = avgCom.combine({tlsSniScore, mlProba, blocklistScore});
if (averageScore >= THRESHOLD_FOR_POSITIVE_DETECTION) {
std::cout << "Positive detection!" << std::endl;
}

Reporters

The common interface of Reporters is defined by abstractReporterobject. Currently, the only available reporter isUnirecRepoterwhich is built on top of thelibunirec.The function of reporters can be described as Finite State Automatons (FSM). Firstly,onRecordStart()must be called, to indicate a start of a new message. Then,report(DataVariant)can be called periodically. At the end,onRecordEnd()is called to indicate that the record can be sent to output. However, buffering may be used and record does not need to be sent right away. Useflush()to send out pending records.

In the case ofUnirecReporter,report(DataVariant)method extracts the value held by the passedDataVariant,and it is used as a value of the next unirec field. Then, field ID is incremented and the next call ofreport()will set value to the next field. Therefore, it is required to follow the correct field order.

The example code for usingUnirecReporteris shown below. For the documentation and usage ofNemea,we would kindly refer the reader to theNemeaandlibunirecrepositories.

const std::string REPORTER_TEMPLATE = "double TLS_SNI_SCORE,uint8 DETECTION_RESULT";

Nemea::UnirecOutputInterface reporterIfc = unirec.buildOutputInterface();
reporterIfc.changeTemplate(REPORTER_TEMPLATE);
WIF::UnirecReporter unirecReporter(reporterIfc);

...

// Notice: the values are reported in the same order as defined in the REPORTER_TEMPLATE
unirecReporter.onRecordStart();
unirecReporter.report(tlsSniScore);
unirecReporter.report(detectionResult);
unirecReporter.onRecordEnd();
unirecReporter.flush();

...

Contact

If you have any questions or problems, you are welcomed to send an email to[email protected].

License

This project is distributed under theBSD-3-Clause license.
© 2024, CESNET z.s.p.o.

About

Weak Indication Framework: Efficient C++ library for (Encrypted) Network Traffic Analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published