summaryrefslogtreecommitdiff
path: root/doc/modeling-method.md
blob: 58fe03b151f5c0c46f49252027cb4149706428e1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Modeling Method Selection

Set `DFATOOL_MODEL` to an appropriate value, e.g. `DFATOOL_MODEL=cart`.

## CART (Regression Trees)

sklearn CART ("Decision Tree Regression") algorithm. Uses binary nodes and supports splits on scalar variables.

### Related Options

* `DFATOOL_PARAM_CATEGORICAL_TO_SCALAR=1` converts categorical parameters (which are not supported by CART) to numeric ones.

## DECART (Regression Trees)

sklearn CART ("Decision Tree Regression") algorithm. Ignores scalar parameters, thus emulating the DECART algorithm.

## LMT (Linear Model Trees)

[Linear Model Tree](https://github.com/cerlymarco/linear-tree) algorithm. Uses binary nodes and linear functions.

### Related Options

See the [LinearTreeRegressor documentation](lib/lineartree/lineartree.py) for details on training hyper-parameters.

* `DFATOOL_PARAM_CATEGORICAL_TO_SCALAR=1` converts categorical parameters (which are not supported by LMT) to numeric ones.
* `DFATOOL_LMT_MAX_DEPTH`
* `DFATOOL_LMT_MIN_SAMPLES_SPLIT`
* `DFATOOL_LMT_MIN_SAMPLES_LEAF`
* `DFATOOL_LMT_MAX_BINS`
* `DFATOOL_LMT_CRITERION`

## RMT (Regression Model Trees)

This is the default modeling method.
It first uses a heuristic to determine which parameters are relevant, and then builds a non-binary decision tree with function leaves to predict how those affect the performance attribute.
Depending on the set of relevant parameters and how well they can be modeled with tree and function structures, it may output a simple function (without tree structure), a tree with static leaves, or a single static value that does not take any parameters into account.
All of these are valid regression model trees.

### Related Options

* `--force-tree` builds a tree structure even if dfatool's heuristic indicates that no non-integer parameter affects the modeled performance attribute.
* `DFATOOL_DTREE_IGNORE_IRRELEVANT_PARAMS=0` disables the relevant parameter detection heuristic when building the tree structure. By default, irrelevant parameters cannot end up as decision nodes.
* `DFATOOL_FIT_LINEAR_ONLY=1` makes RMT behave more like LMT by only considering linear functions in leaf nodes.
* `DFATOOL_FIT_FOL=1`
* `DFATOOL_PARAM_CATEGORICAL_TO_SCALAR=1`
* `DFATOOL_ULS_SKIP_CODEPENDENT_CHECK=1`
* `DFATOOL_REGRESSION_SAFE_FUNCTIONS=1`

## XGB (Gradient-Boosted Forests / eXtreme Gradient boosting)

You should also specify `DFATOOL_XGB_N_ESTIMATORS`, `DFATOOL_XGB_MAX_DEPTH`, and possibly `OMP_NUM_THREADS`.

### Related Options

* `DFATOOL_PARAM_CATEGORICAL_TO_SCALAR=1` converts categorical parameters (which are not supported by XGB) to numeric ones.
* Anything prefixed with `DFATOOL_XGB_`.

## Least-Squares Regression

If dfatool determines that there is no need for a tree structure, or if `DFATOOL_DTREE_ENABLED=0` has beenset, it will go straight to least-squares regression.
By default, it still utilizes the RMT/ULS algorithms to find and fit a suitable function template.
If needed, `--function-override` can be used to set a function template manually.
For instance, in order to specify that NMC DPU allocation latency is a function of the number of DPUs (and nothing else), ue `--function-override 'NMC reconfiguration:latency_dpu_alloc_us:regression_arg(0) + regression_arg(1) * parameter(n_dpus)'`

* CART: Regression Trees
* DECART: Regression Trees with exclusively binary features/parameters
* XGB: Regression Forests
* LMT: Linear Model Trees
* RMT: [Regression Model Trees](https://ess.cs.uos.de/static/papers/Friesel-2022-CPSIoTBench.pdf) with [non-binary nodes](https://ess.cs.uos.de/static/papers/Friesel-2022-CAIN.pdf)
* Least-Squares Regression