summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDaniel Friesel <derf@finalrewind.org>2020-12-21 11:46:22 +0100
committerDaniel Friesel <derf@finalrewind.org>2020-12-21 11:46:22 +0100
commit9f4a764adfd68242582a40089d696e4b4f238699 (patch)
treecb3cb5f564b888a2d87e7e4c40cf23159162e5a5
parent9822b5300bd676c92e24401ce0781c46e51c4913 (diff)
document schema
-rw-r--r--README.md270
1 files changed, 258 insertions, 12 deletions
diff --git a/README.md b/README.md
index 0762cd7..28df897 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
-# db-wagenreihung-to-json - Convert wagon order PDFs to JSON
+# db-zugbildung-to-json - Convert wagon order PDFs to JSON
-db-wagenreihung-to-json converts a train composition (“Zugbildungsplan”) PDF
+db-zugbildung-to-json converts a train composition (“Zugbildungsplan”) PDF
obtained from
[data.deutschebahn.com](https://data.deutschebahn.com/dataset/zugbildungsplanzugbildungsplan-zpar)
to JSON. At the moment, conversion is limited to a map of train numbers to
@@ -11,28 +11,274 @@ The PDF-to-JSON conversion is somewhat fragile, so errors are expected. If you
find a bug or inconsistency in the JSON file, please first compare it with the
corresponding PDF on
[data.deutschebahn.com](https://data.deutschebahn.com/dataset/zugbildungsplanzugbildungsplan-zpar).
-If it is indeed a bug in db-wagenreihung-to-json, please [open an issue on
-GitHub](https://github.com/derf/db-wagenreihung-to-json/issues/new).
+If it is indeed a bug in db-zugbildung-to-json, please [open an issue on
+GitHub](https://github.com/derf/db-zugbildung-to-json/issues/new). The format
+specification (see below) details how trustworthy the JSON content is.
-The latest JSON produced by db-wagenreihung-to-json is available online at
-[dbdb/zugbildungsplan\_v0.json](https://lib.finalrewind.org/dbdb/zugbildungsplan_v0.json).
+## Online Services
+
+The latest JSON produced by db-zugbildung-to-json is available online at
+[dbdb/db\_zugbildung\_v0.json](https://lib.finalrewind.org/dbdb/db_zugbildung_v0.json).
+
+Graphical cycle maps are available on
+[dbdb/db\_umlauf](https://lib.finalrewind.org/dbdb/db_umlauf/).
## Data Format
-This README documents **version 0** of the format.
+This README documents **version 0** of the format. It is not stable yet; data
+layout or semantic changes may not be reflected in the version number.
+Starting with v1, schema and semantics will be stable.
```js
{
"deprecated": false,
"source": "2021_ZpAR_Wi_Endstück.pdf",
+ "valid": "2020-12-13/2021-06-12",
"train": {
- "4": { /* train details */ },
- "5": { /* train details */ }
+ "10": { /* train details */ },
+ "11": { /* train details */ }
// ...
},
- "valid": {
- "from": "YYYY-MM-DD",
- "through": "YYYY-MM-DD"
+}
+```
+
+### Deprecation Notice
+
+```js
+{
+ "deprecated": false,
+}
+```
+
+**true** iff this file uses a deprecated version of the db-zugbildung-to-json
+schema. It may not be updated when DB releases a new train composition.
+
+### Source
+
+```js
+{
+ "source": "2021_ZpAR_Wi_Endstück.pdf"
+}
+```
+
+The PDF file used to generate the data set. Useful when reporting an issue and
+to check whether content is up-to-date.
+
+### Validity
+
+```js
+{
+ "valid": "YYYY-MM-DD/YYYY-MM-DD"
+}
+```
+
+An ISO 8601 date range describing the valid range of the train composition data
+as noted in the PDF file.
+
+### Train details
+
+```js
+{
+ "10": {
+ "rawType": "ICE-W",
+ "type": "ICE 3",
+ "shortType": "3" /* optional */,
+ "name": "ICE International", /* optional */
+ "route": { /* scheduled route */ },
+ "schedules": [ /* scheduled service days and route deviations */ ] /* optional */,
+ "cycle": { /* cycle ("Umlauf") data */ } /* optional */,
+ "hasWagon": { /* wagon type map */ } /* optional */,
+ "wagons": [ /* wagon list */ ] /* optional */,
}
}
```
+
+Each train is identified by its number. It is unique in context of DB
+long-distance trains, but may be used by other european operators as well. For
+instance, the IC services Amsterdam – Berlin and Koebenhavns – Aarhus often use
+identical three-letter numbers.
+
+#### rawType
+
+The train type as specified in the PDF file, e.g. **ICE-A**, **IC**, or
+**LNF**.
+
+#### type
+
+The scheduled train type as estimated from rawType and wagon data. This
+information is **mostly reliable**. If a train type is unknown, either due to
+an unhandled composition or due to different train types for different time
+ranges (which are not supported by this schema yet), `rawType` is used.
+
+Possible values:
+
+* ICE 1/2/4
+* ICE 1
+* ICE 2
+* ICE 3
+* ICE 3 Redesign
+* ICE 3 Velaro
+* ICE 4
+* ICE T
+* Metropolitan
+* IC
+* IC2
+* IC2 KISS
+* *anything present in rawType*
+
+#### shortType
+
+A short identifier which can be used to differentiate between ICE 3 and ICE 3
+Redesign or between IC1 and IC2. If the type cannot be estimated, this
+property is not present. Just like `type`, it is **mostly reliable**.
+
+Possible values:
+
+* **1** (ICE 1)
+* **2** (ICE 2 / IC2 / IC2 KISS)
+* **3** (ICE 3)
+* **3R** (ICE 3 Redesign)
+* **3V** (ICE 3 Velaro)
+* **4** (ICE 4)
+* **M** (Metropolitan)
+
+#### name
+
+Optional train name / line name / description provided by the PDF file. Not
+present if the train has no such entry.
+
+Examples include "ICE International", "Kieler Bucht", "Blauer Enzian", and
+"DB-ÖBB EuroCity". Several trains may share a common name / description.
+
+#### empty
+
+true if this is an empty train without passenger service ("Leerfahrt").
+Otherwise, the property is not present. It is **reliable**.
+
+#### route
+
+```js
+{
+ "route": {
+ "end": "München (10:02)",
+ "middle": [
+ "Berlin",
+ "Berlin Südkreuz",
+ "Halle (Saale)",
+ "Erfurt (07:40/07:45)",
+ "Nürnberg"
+ ],
+ "postEnd": null,
+ "preStart": "Berlin-Rummelsburg (Triebzuganlage)",
+ "start": "Berlin-Gesundbrunnen (05:53)"
+ }
+}
+```
+
+Scheduled route. It is divided into five parts:
+
+* **preStart**: Station (without passenger service) where the train is prepared / provisioned. May be null.
+* **start**: First station(s) with passenger service. Often contains the scheduled departure time.
+* **middle**: List of noteworthy stations along the route. May be empty.
+ This is not the complete route. Individual stations may contain timestamps.
+* **end**: Terminal station(s) with passenger service. Often contains the scheduled arrival time.
+* **postEnd**: Station (without passenger service) where the train is parked. May be null.
+
+Station entries are taken as-is from the PDF file. They may differ from station
+names used in iris-tts or HAFAS. A null preStart / postEnd entry does not
+imply that the train is prepared / parked at the first / last station with
+passenger service. Station names may be surrounded by brackets.
+
+#### schedules
+
+*work in progress*
+
+#### cycle
+
+```js
+{
+ "cycle": {
+ "from": [
+ "78546",
+ "1051",
+ "78526"
+ ],
+ "to": [
+ "1051"
+ ]
+ }
+}
+```
+
+Cycle ("Umlauf") data, i.e., which trains may make up this train and which
+trains it may end up in. **Reliability unknown**.
+
+**from** is a list of train numbers which may make up this train.
+
+**to** is a list of train numbers which this train may end up in.
+
+PDF annotations covering date ranges, wings, and segments (e.g. a two-segment
+train which becomes a one-segment train along the way) are not taken into
+account. Missing cycle data may be caused by known parser issues. Referenced
+train numbers may not be present in this JSON file; this is often the case for
+inter-european connections. Referenced trains may be added / removed along the
+way, not just at the start / end station.
+
+#### hasWagon
+
+```js
+{
+ "hasWagon": {
+ "147.5": true,
+ "DApza": true,
+ "DBpbzfa": true,
+ "DBpza": true
+ }
+}
+```
+
+Contains a true property for each wagon and locomotive type scheduled for the
+train. Does not take time ranges into account. For example, if a train is
+scheduled as IC1 on weekdays and IC2 on weekends, `hasWagon` will contain both
+IC1 and IC2 wagons. **Mostly reliable**. The wagon parser utilizes regular
+expressions and may miss rarely used wagon types.
+
+#### wagons
+
+```js
+{
+ "wagons": [
+ [
+ "DBpbzfa",
+ "1"
+ ],
+ [
+ "DBpza",
+ "2"
+ ],
+ [
+ "DBpza",
+ "3"
+ ],
+ [
+ "DBpza",
+ "4"
+ ],
+ [
+ "DApza",
+ "5"
+ ],
+ [
+ "147.5",
+ null
+ ]
+ ]
+}
+```
+
+Ordered list of wagons and (for ICE 1, ICE 2, and IC2) matched locomotives.
+Unset if the wagon order cannot be reliably determined or if it depends on the
+date. Does not contain IC1 locomotives.
+
+**Mostly reliable**. Just like `hasWagon`, rarely used wagon types may be
+missing.