diff options
author | Birte Kristina Friesel <birte.friesel@uos.de> | 2024-01-26 08:22:50 +0100 |
---|---|---|
committer | Birte Kristina Friesel <birte.friesel@uos.de> | 2024-01-26 08:22:50 +0100 |
commit | 8890151d028e383fbc878a3e58bdf67dc08c0d69 (patch) | |
tree | 003a26cdde77cf6b3aa0a1bb6a7254da04e36e3e | |
parent | 242d09e9e03cd630fb56a26cac5abcb212d75426 (diff) |
LMT documentation: refer to lineartree.py for details
-rw-r--r-- | README.md | 4 | ||||
-rw-r--r-- | doc/modeling-method.md | 7 |
2 files changed, 9 insertions, 2 deletions
@@ -119,8 +119,8 @@ The following variables may be set to alter the behaviour of dfatool components. | `DFATOOL_CART_MAX_DEPTH` | **0** .. *n* | maximum depth for sklearn CART. Default (0): unlimited. | | `DFATOOL_DTREE_LMT` | **0**, 1 | Use [Linear Model Tree](https://github.com/cerlymarco/linear-tree) algorithm for regression tree generation. Uses binary nodes and linear functions. Overrides `FUNCTION_LEAVES` (=0) and `NONBINARY_NODES` (=0). | | `DFATOOL_LMT_MAX_DEPTH` | **5** .. 20 | Maximum depth for LMT. | -| `DFATOOL_LMT_MIN_SAMPLES_SPLIT` | 0.0 .. 1.0, **6** .. *n* | Minimum samples required to still perform an LMT split. | -| `DFATOOL_LMT_MIN_SAMPLES_LEAF` | 0.0 .. **0.1** .. 1.0, 3 .. *n* | Minimum samples that each leaf of a split candidate must contain. | +| `DFATOOL_LMT_MIN_SAMPLES_SPLIT` | 0.0 .. 1.0, **6** .. *n* | Minimum samples required to still perform an LMT split. A value below 1.0 sets the specified ratio of the total number of training samples as minimum. | +| `DFATOOL_LMT_MIN_SAMPLES_LEAF` | 0.0 .. **0.1** .. 1.0, 3 .. *n* | Minimum samples that each leaf of a split candidate must contain. A value below 1.0 sets the specified ratio of the total number of training samples as minimum. | | `DFATOOL_LMT_MAX_BINS` | 10 .. **120** | Number of bins used to determine optimal split. LMT default: 25. | | `DFATOOL_LMT_CRITERION` | **mse**, rmse, mae, poisson | Error metric to use when selecting best split. | | `DFATOOL_ULS_ERROR_METRIC` | **ssr**, rmsd, mae, … | Error metric to use when selecting best-fitting function during unsupervised least squares (ULS) regression. Least squares regression itself minimzes root mean square deviation (rmsd), hence the equivalent (but partitioning-compatible) sum of squared residuals (ssr) is the default. Supports all metrics accepted by `--error-metric`. | diff --git a/doc/modeling-method.md b/doc/modeling-method.md index 6357cd8..27cb334 100644 --- a/doc/modeling-method.md +++ b/doc/modeling-method.md @@ -25,7 +25,14 @@ They always use a maximum depth of 20. ### Related Options +See the [LinearTreeRegressor documentation](lib/lineartree/lineartree.py) for details on training hyper-parameters. + * `DFATOOL_PARAM_CATEGORIAL_TO_SCALAR=1` converts categorial parameters (which are not supported by LMT) to numeric ones. +* `DFATOOL_LMT_MAX_DEPTH` +* `DFATOOL_LMT_MIN_SAMPLES_SPLIT` +* `DFATOOL_LMT_MIN_SAMPLES_LEAF` +* `DFATOOL_LMT_MAX_BINS` +* `DFATOOL_LMT_CRITERION` ## RMT (Regression Model Trees) |