diff options
author | Birte Kristina Friesel <birte.friesel@uos.de> | 2024-01-12 09:24:23 +0100 |
---|---|---|
committer | Birte Kristina Friesel <birte.friesel@uos.de> | 2024-01-12 09:24:23 +0100 |
commit | c3043d8537e4dceb303929582dab92a6024924ce (patch) | |
tree | 952cf10ea377e45d56436c7282d6dd925774c720 | |
parent | 2cdc0ebc4a68d44dd6381d7fd473455f1d2f1b5d (diff) |
Expose DFATOOL_ULS_MIN_DISTINCT_VALUES training hyper-parameter
-rw-r--r-- | README.md | 1 | ||||
-rw-r--r-- | lib/parameters.py | 6 |
2 files changed, 6 insertions, 1 deletions
@@ -119,6 +119,7 @@ The following variables may be set to alter the behaviour of dfatool components. | `DFATOOL_DTREE_LMT` | **0**, 1 | Use [Linear Model Tree](https://github.com/cerlymarco/linear-tree) algorithm for regression tree generation. Uses binary nodes and linear functions. Overrides `FUNCTION_LEAVES` (=0) and `NONBINARY_NODES` (=0). | | `DFATOOL_CART_MAX_DEPTH` | **0** .. *n* | maximum depth for sklearn CART. Default (0): unlimited. | | `DFATOOL_ULS_ERROR_METRIC` | **rmsd**, mae, p50, p90 | Error metric to use when selecting best-fitting function during unsupervised least squares (ULS) regression. Least squares regression itself minimzes root mean square deviation (rmsd), hence rmsd is the default. | +| `DFATOOL_ULS_MIN_DISTINCT_VALUES` | 2 .. **3** .. *n* | Minimum number of unique values a parameter must take to be eligible for ULS | | `DFATOOL_USE_XGBOOST` | **0**, 1 | Use Extreme Gradient Boosting algorithm for decision forest generation. | | `DFATOOL_XGB_N_ESTIMATORS` | 1 .. **100** .. *n* | Number of estimators (i.e., trees) for XGBoost. Mandatory. | | `DFATOOL_XGB_MAX_DEPTH` | 2 .. **10** .. *n* | Maximum XGBoost tree depth. XGBoost default: 6 | diff --git a/lib/parameters.py b/lib/parameters.py index 74be565..3173784 100644 --- a/lib/parameters.py +++ b/lib/parameters.py @@ -604,7 +604,11 @@ class ModelAttribute: # There must be at least 3 distinct data values (≠ None) if an analytic model # is to be fitted. For 2 (or fewer) values, decision trees are better. - self.min_values_for_analytic_model = 3 + # Exceptions such as DFATOOL_FIT_LINEAR_ONLY=1 (2 values sufficient) + # can be handled via DFATOOL_ULS_MIN_DISTINCT_VALUES + self.min_values_for_analytic_model = int( + os.getenv("DFATOOL_ULS_MIN_DISTINCT_VALUES", "3") + ) def __repr__(self): mean = np.mean(self.data) |