summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBirte Kristina Friesel <birte.friesel@uos.de>2024-01-12 08:55:45 +0100
committerBirte Kristina Friesel <birte.friesel@uos.de>2024-01-12 08:55:45 +0100
commit2cdc0ebc4a68d44dd6381d7fd473455f1d2f1b5d (patch)
treec4014525ed2fec640a3f697d8612e33065eea679
parentcd9515e788f3b7e6195d34266853248d84a6fb9c (diff)
It's THRESHOLD, not TRESHOLD. Oops.
-rw-r--r--README.md2
-rw-r--r--lib/parameters.py4
2 files changed, 3 insertions, 3 deletions
diff --git a/README.md b/README.md
index e33d351..6da4fcc 100644
--- a/README.md
+++ b/README.md
@@ -134,7 +134,7 @@ The following variables may be set to alter the behaviour of dfatool components.
| `DFATOOL_REGRESSION_SAFE_FUNCTIONS` | **0**, 1 | Use safe functions only (e.g. 1/x returnning 1 for x==0) |
| `DFATOOL_DTREE_NONBINARY_NODES` | 0, **1** | Enable non-binary nodes (i.e., nodes with more than two children corresponding to enum variables) in decision trees |
| `DFATOOL_DTREE_IGNORE_IRRELEVANT_PARAMS` | 0, **1** | Ignore parameters deemed irrelevant by stddev heuristic during regression tree generation |
-| `DFATOOL_PARAM_RELEVANCE_TRESHOLD` | 0 .. **0.5** .. 1 | Threshold for relevant parameter detection: parameter *i* is relevant if mean standard deviation (data partitioned by all parameters) / mean standard deviation (data partition by all parameters but *i*) is less than threshold |
+| `DFATOOL_PARAM_RELEVANCE_THRESHOLD` | 0 .. **0.5** .. 1 | Threshold for relevant parameter detection: parameter *i* is relevant if mean standard deviation (data partitioned by all parameters) / mean standard deviation (data partition by all parameters but *i*) is less than threshold |
| `DFATOOL_DTREE_LOSS_IGNORE_SCALAR` | **0**, 1 | Ignore scalar parameters when computing the loss for split node candidates. Instead of computing the loss of a single partition for each `x_i == j`, compute the loss of partitions for `x_i == j` in which non-scalar parameters vary and scalar parameters are constant. This way, scalar parameters do not affect the decision about which non-scalar parameter to use for splitting. |
| `DFATOOL_PARAM_CATEGORIAL_TO_SCALAR` | **0**, 1 | Some models (e.g. FOL, sklearn CART, XGBoost) do not support categorial parameters. Ignore them (0) or convert them to scalar indexes (1). |
| `DFATOOL_FIT_FOL` | **0**, 1 | Build a first-order linear function (i.e., a * param1 + b * param2 + ...) instead of more complex functions or tree structures. Must not be combined with `--force-tree`. |
diff --git a/lib/parameters.py b/lib/parameters.py
index 740647e..74be565 100644
--- a/lib/parameters.py
+++ b/lib/parameters.py
@@ -209,7 +209,7 @@ def _compute_param_statistics(
np.seterr("raise")
- relevance_threshold = float(os.getenv("DFATOOL_PARAM_RELEVANCE_TRESHOLD", 0.5))
+ relevance_threshold = float(os.getenv("DFATOOL_PARAM_RELEVANCE_THRESHOLD", 0.5))
for param_idx, param in enumerate(param_names):
if param_idx < len(codependent_params) and codependent_params[param_idx]:
@@ -1154,7 +1154,7 @@ class ModelAttribute:
"build_dtree {self.name} {self.attr} called with loss_ignore_scalar=True, with_function_leaves=False. This does not make sense."
)
- relevance_threshold = float(os.getenv("DFATOOL_PARAM_RELEVANCE_TRESHOLD", 0.5))
+ relevance_threshold = float(os.getenv("DFATOOL_PARAM_RELEVANCE_THRESHOLD", 0.5))
self.model_function = self._build_dtree(
parameters,