LMT documentation: refer to lineartree.py for details

author: Birte Kristina Friesel <birte.friesel@uos.de> 2024-01-26 08:22:50 +0100
committer: Birte Kristina Friesel <birte.friesel@uos.de> 2024-01-26 08:22:50 +0100
commit: 8890151d028e383fbc878a3e58bdf67dc08c0d69 (patch)
tree: 003a26cdde77cf6b3aa0a1bb6a7254da04e36e3e
parent: 242d09e9e03cd630fb56a26cac5abcb212d75426 (diff)
2 files changed, 9 insertions, 2 deletions
diff --git a/README.md b/README.md
index 757600c..b7c2492 100644
--- a/README.md
+++ b/README.md
@@ -119,8 +119,8 @@ The following variables may be set to alter the behaviour of dfatool components.
 | `DFATOOL_CART_MAX_DEPTH` | **0** .. *n* | maximum depth for sklearn CART. Default (0): unlimited. |
 | `DFATOOL_DTREE_LMT` | **0**, 1 | Use [Linear Model Tree](https://github.com/cerlymarco/linear-tree) algorithm for regression tree generation. Uses binary nodes and linear functions. Overrides `FUNCTION_LEAVES` (=0) and `NONBINARY_NODES` (=0). |
 | `DFATOOL_LMT_MAX_DEPTH` | **5** .. 20 | Maximum depth for LMT. |
-| `DFATOOL_LMT_MIN_SAMPLES_SPLIT` | 0.0 .. 1.0, **6** .. *n* | Minimum samples required to still perform an LMT split. |
-| `DFATOOL_LMT_MIN_SAMPLES_LEAF` | 0.0 .. **0.1** .. 1.0, 3 .. *n* | Minimum samples that each leaf of a split candidate must contain. |
+| `DFATOOL_LMT_MIN_SAMPLES_SPLIT` | 0.0 .. 1.0, **6** .. *n* | Minimum samples required to still perform an LMT split. A value below 1.0 sets the specified ratio of the total number of training samples as minimum. |
+| `DFATOOL_LMT_MIN_SAMPLES_LEAF` | 0.0 .. **0.1** .. 1.0, 3 .. *n* | Minimum samples that each leaf of a split candidate must contain. A value below 1.0 sets the specified ratio of the total number of training samples as minimum. |
 | `DFATOOL_LMT_MAX_BINS` | 10 .. **120** | Number of bins used to determine optimal split. LMT default: 25. |
 | `DFATOOL_LMT_CRITERION` | **mse**, rmse, mae, poisson | Error metric to use when selecting best split. |
 | `DFATOOL_ULS_ERROR_METRIC` | **ssr**, rmsd, mae, … | Error metric to use when selecting best-fitting function during unsupervised least squares (ULS) regression. Least squares regression itself minimzes root mean square deviation (rmsd), hence the equivalent (but partitioning-compatible) sum of squared residuals (ssr) is the default. Supports all metrics accepted by `--error-metric`. |
diff --git a/doc/modeling-method.md b/doc/modeling-method.md
index 6357cd8..27cb334 100644
--- a/doc/modeling-method.md
+++ b/doc/modeling-method.md
@@ -25,7 +25,14 @@ They always use a maximum depth of 20.
 
 ### Related Options
 
+See the [LinearTreeRegressor documentation](lib/lineartree/lineartree.py) for details on training hyper-parameters.
+
 * `DFATOOL_PARAM_CATEGORIAL_TO_SCALAR=1` converts categorial parameters (which are not supported by LMT) to numeric ones.
+* `DFATOOL_LMT_MAX_DEPTH`
+* `DFATOOL_LMT_MIN_SAMPLES_SPLIT`
+* `DFATOOL_LMT_MIN_SAMPLES_LEAF`
+* `DFATOOL_LMT_MAX_BINS`
+* `DFATOOL_LMT_CRITERION`
 
 ## RMT (Regression Model Trees)
author	Birte Kristina Friesel <birte.friesel@uos.de>	2024-01-26 08:22:50 +0100
committer	Birte Kristina Friesel <birte.friesel@uos.de>	2024-01-26 08:22:50 +0100
commit	8890151d028e383fbc878a3e58bdf67dc08c0d69 (patch)
tree	003a26cdde77cf6b3aa0a1bb6a7254da04e36e3e
parent	242d09e9e03cd630fb56a26cac5abcb212d75426 (diff)