summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBirte Kristina Friesel <birte.friesel@uos.de>2024-01-26 08:22:50 +0100
committerBirte Kristina Friesel <birte.friesel@uos.de>2024-01-26 08:22:50 +0100
commit8890151d028e383fbc878a3e58bdf67dc08c0d69 (patch)
tree003a26cdde77cf6b3aa0a1bb6a7254da04e36e3e
parent242d09e9e03cd630fb56a26cac5abcb212d75426 (diff)
LMT documentation: refer to lineartree.py for details
-rw-r--r--README.md4
-rw-r--r--doc/modeling-method.md7
2 files changed, 9 insertions, 2 deletions
diff --git a/README.md b/README.md
index 757600c..b7c2492 100644
--- a/README.md
+++ b/README.md
@@ -119,8 +119,8 @@ The following variables may be set to alter the behaviour of dfatool components.
| `DFATOOL_CART_MAX_DEPTH` | **0** .. *n* | maximum depth for sklearn CART. Default (0): unlimited. |
| `DFATOOL_DTREE_LMT` | **0**, 1 | Use [Linear Model Tree](https://github.com/cerlymarco/linear-tree) algorithm for regression tree generation. Uses binary nodes and linear functions. Overrides `FUNCTION_LEAVES` (=0) and `NONBINARY_NODES` (=0). |
| `DFATOOL_LMT_MAX_DEPTH` | **5** .. 20 | Maximum depth for LMT. |
-| `DFATOOL_LMT_MIN_SAMPLES_SPLIT` | 0.0 .. 1.0, **6** .. *n* | Minimum samples required to still perform an LMT split. |
-| `DFATOOL_LMT_MIN_SAMPLES_LEAF` | 0.0 .. **0.1** .. 1.0, 3 .. *n* | Minimum samples that each leaf of a split candidate must contain. |
+| `DFATOOL_LMT_MIN_SAMPLES_SPLIT` | 0.0 .. 1.0, **6** .. *n* | Minimum samples required to still perform an LMT split. A value below 1.0 sets the specified ratio of the total number of training samples as minimum. |
+| `DFATOOL_LMT_MIN_SAMPLES_LEAF` | 0.0 .. **0.1** .. 1.0, 3 .. *n* | Minimum samples that each leaf of a split candidate must contain. A value below 1.0 sets the specified ratio of the total number of training samples as minimum. |
| `DFATOOL_LMT_MAX_BINS` | 10 .. **120** | Number of bins used to determine optimal split. LMT default: 25. |
| `DFATOOL_LMT_CRITERION` | **mse**, rmse, mae, poisson | Error metric to use when selecting best split. |
| `DFATOOL_ULS_ERROR_METRIC` | **ssr**, rmsd, mae, … | Error metric to use when selecting best-fitting function during unsupervised least squares (ULS) regression. Least squares regression itself minimzes root mean square deviation (rmsd), hence the equivalent (but partitioning-compatible) sum of squared residuals (ssr) is the default. Supports all metrics accepted by `--error-metric`. |
diff --git a/doc/modeling-method.md b/doc/modeling-method.md
index 6357cd8..27cb334 100644
--- a/doc/modeling-method.md
+++ b/doc/modeling-method.md
@@ -25,7 +25,14 @@ They always use a maximum depth of 20.
### Related Options
+See the [LinearTreeRegressor documentation](lib/lineartree/lineartree.py) for details on training hyper-parameters.
+
* `DFATOOL_PARAM_CATEGORIAL_TO_SCALAR=1` converts categorial parameters (which are not supported by LMT) to numeric ones.
+* `DFATOOL_LMT_MAX_DEPTH`
+* `DFATOOL_LMT_MIN_SAMPLES_SPLIT`
+* `DFATOOL_LMT_MIN_SAMPLES_LEAF`
+* `DFATOOL_LMT_MAX_BINS`
+* `DFATOOL_LMT_CRITERION`
## RMT (Regression Model Trees)