Use LMT algorithm defaults for max depth and min samples leaf.

max bins remains at 120 (≠25), but that one should only affect run-time and not accuracy/complexity
author: Birte Kristina Friesel <birte.friesel@uos.de> 2024-01-25 14:10:23 +0100
committer: Birte Kristina Friesel <birte.friesel@uos.de> 2024-01-25 14:10:23 +0100
commit: 242d09e9e03cd630fb56a26cac5abcb212d75426 (patch)
tree: 971a45cddb8f9cf864db8318422c4000f8f487b3
parent: 27e4210a7546cb72a0df1c0d54c7c09fed628e12 (diff)
2 files changed, 7 insertions, 7 deletions
diff --git a/README.md b/README.md
index 53e1eeb..757600c 100644
--- a/README.md
+++ b/README.md
@@ -118,11 +118,11 @@ The following variables may be set to alter the behaviour of dfatool components.
 | `DFATOOL_DTREE_SKLEARN_DECART` | **0**, 1 | Use sklearn CART ("Decision Tree Regression") algorithm for decision tree generation. Ignore scalar parameters, thus emulating the DECART algorithm. |
 | `DFATOOL_CART_MAX_DEPTH` | **0** .. *n* | maximum depth for sklearn CART. Default (0): unlimited. |
 | `DFATOOL_DTREE_LMT` | **0**, 1 | Use [Linear Model Tree](https://github.com/cerlymarco/linear-tree) algorithm for regression tree generation. Uses binary nodes and linear functions. Overrides `FUNCTION_LEAVES` (=0) and `NONBINARY_NODES` (=0). |
-| `DFATOOL_LMT_MAX_DEPTH` | 5 .. **20** | Maximum depth for LMT. LMT default: 5 |
-| `DFATOOL_LMT_MIN_SAMPLES_SPLIT` | 0.0 .. 1.0, **6** .. *n* | Minimum samples required to still perform an LMT split. LMT default: 6 |
-| `DFATOOL_LMT_MIN_SAMPLES_LEAF` | 0.0 .. 1.0, **3** .. *n* | Minimum samples that each leaf of a split candidate must contain. LMT default: 0.1 (10% of *n samples*) |
+| `DFATOOL_LMT_MAX_DEPTH` | **5** .. 20 | Maximum depth for LMT. |
+| `DFATOOL_LMT_MIN_SAMPLES_SPLIT` | 0.0 .. 1.0, **6** .. *n* | Minimum samples required to still perform an LMT split. |
+| `DFATOOL_LMT_MIN_SAMPLES_LEAF` | 0.0 .. **0.1** .. 1.0, 3 .. *n* | Minimum samples that each leaf of a split candidate must contain. |
 | `DFATOOL_LMT_MAX_BINS` | 10 .. **120** | Number of bins used to determine optimal split. LMT default: 25. |
-| `DFATOOL_LMT_CRITERION` | **mse**, rmse, mae, poisson | Error metric to use when selecting best split. LMT default: ssr |
+| `DFATOOL_LMT_CRITERION` | **mse**, rmse, mae, poisson | Error metric to use when selecting best split. |
 | `DFATOOL_ULS_ERROR_METRIC` | **ssr**, rmsd, mae, … | Error metric to use when selecting best-fitting function during unsupervised least squares (ULS) regression. Least squares regression itself minimzes root mean square deviation (rmsd), hence the equivalent (but partitioning-compatible) sum of squared residuals (ssr) is the default. Supports all metrics accepted by `--error-metric`. |
 | `DFATOOL_ULS_MIN_DISTINCT_VALUES` | 2 .. **3** .. *n* | Minimum number of unique values a parameter must take to be eligible for ULS |
 | `DFATOOL_ULS_SKIP_CODEPENDENT_CHECK` | **0**, 1 | Do not detect and remove co-dependent features in ULS. |
diff --git a/lib/parameters.py b/lib/parameters.py
index 86f2338..c3af6b0 100644
--- a/lib/parameters.py
+++ b/lib/parameters.py
@@ -1163,7 +1163,7 @@ class ModelAttribute:
             # max_depth : int, default=5
             #     The maximum depth of the tree considering only the splitting nodes.
             #     A higher value implies a higher training time.
-            max_depth = int(os.getenv("DFATOOL_LMT_MAX_DEPTH", "20"))
+            max_depth = int(os.getenv("DFATOOL_LMT_MAX_DEPTH", "5"))
 
             # min_samples_split : int or float, default=6
             #     The minimum number of samples required to split an internal node.
@@ -1189,10 +1189,10 @@ class ModelAttribute:
             #     - If float, then `min_samples_leaf` is a fraction and
             #       `ceil(min_samples_leaf * n_samples)` are the minimum
             #       number of samples for each node.
-            if "." in os.getenv("DFATOOL_LMT_MIN_SAMPLES_LEAF", ""):
+            if "." in os.getenv("DFATOOL_LMT_MIN_SAMPLES_LEAF", "0.1"):
                 min_samples_leaf = float(os.getenv("DFATOOL_LMT_MIN_SAMPLES_LEAF"))
             else:
-                min_samples_leaf = int(os.getenv("DFATOOL_LMT_MIN_SAMPLES_LEAF", "3"))
+                min_samples_leaf = int(os.getenv("DFATOOL_LMT_MIN_SAMPLES_LEAF"))
 
             # max_bins : int, default=25
             #     The maximum number of bins to use to search the optimal split in each
author	Birte Kristina Friesel <birte.friesel@uos.de>	2024-01-25 14:10:23 +0100
committer	Birte Kristina Friesel <birte.friesel@uos.de>	2024-01-25 14:10:23 +0100
commit	242d09e9e03cd630fb56a26cac5abcb212d75426 (patch)
tree	971a45cddb8f9cf864db8318422c4000f8f487b3
parent	27e4210a7546cb72a0df1c0d54c7c09fed628e12 (diff)