Skip to content

Commit 2775556

Browse files
authored
Merge pull request #1 from TiberiuD/master
Added train information. Made some improvements.
2 parents 043f7d4 + 137ec22 commit 2775556

File tree

5 files changed

+334
-32
lines changed

5 files changed

+334
-32
lines changed

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Change Log
2+
All notable changes to this project will be documented in this file.
3+
4+
The format is based on [Keep a Changelog](http://keepachangelog.com/)
5+
and this project adheres to [Semantic Versioning](http://semver.org/).
6+
7+
## [1.0.1] - 2017-04-26
8+
### Changed
9+
- The JSON format for the station information, for consistency. This format is not compatible with version 1.0.0.
10+
### Added
11+
- Better error management
12+
- Train information
13+
- This CHANGELOG file.
14+
### Fixed
15+
- Fixed a bug which was preventing the use of the node server from another directory
16+
- The server searches for PhantomJS and throws an error if it doesn't find it
17+
### Broke
18+
- Compatibility with the version 1.0.0 standard.
19+
20+
## 1.0.0 - Initial release

README.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ CFR S.A., the Romanian national railway infrastructure administration company of
66

77
This NodeJS & ExpressJS based API controls the PhantomJS headless webkit engine to extract data from the CFR webapp as requested in the URL API Endpoint and outputs reusable JSON data.
88

9-
### Installation & Requirments
9+
## Installation & Requirments
1010
- You need NodeJS and NPM installed on your system: [See their website for details](https://nodejs.org/en/download/). You may or may not have luck with portable versions.
1111
- Clone the repository and install the module dependencies:
1212
```sh
@@ -18,23 +18,29 @@ $ npm install
1818
```sh
1919
$ node .
2020
```
21-
### Usage
21+
## Usage
22+
### Station information
2223
Now you can point your browser to http://localhost:9090/station/ID to see the magic. ID is the unique station-unit code, you have [a list of those in this very nice repo](https://github.com/vasile/data.gov.ro-gtfs-exporter/blob/master/cfr.webgis.ro/stops.geojson). For example, to get a JSON object with the current departure/arrival board & delay information for the Bucharest North railway station (the main & biggest one in our country), you would point your browser or the URL variable for whatever app you are consuming the data with to: http://localhost:9090/station/10017.
23-
### Ideas
24+
### Train information
25+
In the same way you can get the current trains in a certain railway station, you can get the current information for a certain train. CFR provides information such as delays, the last station the train has passed (with a 7-minute delay), the next station and other useful information.
26+
27+
Just point your browser to http://localhost:9090/train/ID, where ID is the train's unique number. You can get these IDs from the station information feed. For example, you can retrieve the information for train IR 1651 from Bucharest North to Suceava North (valid as of April 2017) by accessing http://localhost:9090/train/1651.
28+
29+
## Ideas
2430
While the official apps themselves work but may not look so great, romanian developers did their best to create some really cool open source projects and online services related to transportation and infrastructure. See [this live map](http://cfr.webgis.ro/), [this proprietary to GTFS converter](https://github.com/vasile/data.gov.ro-gtfs-exporter), [this trip planner](https://www.acceleratul.ro), etc.
2531

2632
Using this API and other public resources, you may create your own style of station departure board, delay-notification service, cool looking mobile app, while learning how to program and work with structured data ?
27-
### License, disclaimer and known limitations
33+
## License, disclaimer and known limitations
2834
This is a completely open source project, built on open source modules and libraries and licensed under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.html).
2935

3036
Also, you are completely responsible for what you do with it - keep in mind that CFR S.A. and InfoFer (the state-railway owned IT firm which builds their software) are not particularly transparent or third party developer friendly. If you do mass-scraping or develop some publicly accessible service that generates loads of traffic from the same server to them or clones their data to a database for various reasons, you may run into some trouble, [as this fellow enthusiast did while making a web trip-planner using CFR Calatori's timetable from their website](http://legi-internet.ro/blogs/index.php/cfr-crede-ca-are-monopol-pe-mersul-trenurilor-pe-internet).
3137

3238
But for tinkering, playing around and working with real-time data that clearly can't confuse anyone if the error is not from CFR themselves, you should be fine and on the right side of the law, at least from my experience. Maybe they'll offer their own API with proper rules and licensing at some point.
3339

34-
Known limitations:
35-
- Currently you can only get the station departure & arrival (times, originating & destination station) board with delays and lines (where applicable) through the /station/ID endpoint, while detailed information about individual trips and trips between stations is provided by the CFR webpage. I may work on additional features.
40+
#### Known limitations:
3641
- Requests are not authenticated and no rate limiting is implemented, so it's in no way ready to be exposed on the web.
3742
- This is not particularly fast, because the CFR Webpage isn't either. You'll probably want background requests and caching. After the initial request is made, it'll wait 8 seconds before parsing the data. If data hasn't been displayed on the webpage, it will wait an additional 20 seconds. After this, the API will output a blank object - this may mean that the scraped web service is down, it is really slow to respond or there are really no current trips stopping at that particular station (at night or at a small stop, for example).
3843
- This is scraping and parsing, so any structural update to the CFR webpage, while highly unlikely in the near future may break this.
44+
- The train information feed does not provide the details regarding the train's delays and other useful information that Infofer offers with their service. This will be updated in the future.
3945

40-
Public information web-service provided by CFR S.A. through Informatica Feroviara: http://appiris.infofer.ro/SosPlcRO.aspx, appiris.infofer.ro/MyTrainRO.aspx, http://appiris.infofer.ro/MersTrenRo.aspx. This is information from infrastructure administration and not a specific passenger carrier. Official passenger timetables are found here: http://mersultrenurilorcfr.ro, http://infofer.ro/ and static data source with timetables updated at the end of each year: http://data.gov.ro/organization/sc-informatica-feroviara-sa
46+
Public information web-service provided by CFR S.A. through Informatica Feroviara: http://appiris.infofer.ro/SosPlcRO.aspx, appiris.infofer.ro/MyTrainRO.aspx, http://appiris.infofer.ro/MersTrenRo.aspx. This is information from infrastructure administration and not a specific passenger carrier. Official passenger timetables are found here: http://mersultrenurilorcfr.ro, http://infofer.ro/ and static data source with timetables updated at the end of each year: http://data.gov.ro/organization/sc-informatica-feroviara-sa

index.js

Lines changed: 154 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,165 @@
11
var exec = require('child_process').exec,
22
cheerio = require('cheerio'),
33
cheerioTableparser = require('cheerio-tableparser'),
4-
data = [], output = [], newArray;
4+
commandExistsSync = require('command-exists').sync,
5+
path = require('path'),
56
express = require('express')();
67

8+
// Check if PhantomJS can be ran on the system
9+
if (!commandExistsSync('phantomjs')) {
10+
process.stderr.write(
11+
"Could not find PhantomJS on this system. " +
12+
"Please ensure that PhantomJS is installed and can be ran " +
13+
"from the current path.\n" +
14+
"http://phantomjs.org/\n"
15+
);
16+
process.exit(1);
17+
}
18+
719
express.get('/station/:id*', function(req, res, next) {
8-
cmd = 'phantomjs station.js '+req.params['id'];
20+
cmd = 'phantomjs ' + path.resolve(__dirname, 'station.js') + ' ' + req.params['id'];
21+
22+
exec(cmd, function(error, stdout, stderr) {
23+
if (error !== null) {
24+
console.log('Execution error: ' + error);
25+
26+
res.statusCode = 500;
27+
res.send('Internal error.');
28+
return;
29+
}
30+
31+
res.setHeader('Access-Control-Allow-Origin', '*');
32+
var output = [];
33+
34+
// Get the retrieved HTML table from PhantomJS
35+
$ = cheerio.load(stdout);
36+
cheerioTableparser($);
37+
var data = $("table").parsetable(true, true, true);
38+
39+
if (data.length != 0) {
40+
var newArray = data[0].map(function(col, i) {
41+
return data.map(function(row) {
42+
return row[i];
43+
});
44+
});
45+
newArray.shift();
46+
47+
// Push each entry in the table into the output array
48+
for (var i = 0; i < newArray.length; i++) {
49+
// Split the origin and final stations from their respective times
50+
if (newArray[i][3]) {
51+
from = newArray[i][3].split(/[0-9]/)[0].replace(/\s\s*$/, '');
52+
originatingdepart = newArray[i][3].match(/([0-9\:]+)/)[0];
53+
} else {
54+
from = ''; originatingdepart = '';
55+
}
56+
57+
if (newArray[i][7]) {
58+
to = newArray[i][7].split(/[0-9]/)[0].replace(/\s\s*$/, '');
59+
finalarrive = newArray[i][7].match(/([0-9\:]+)/)[0]
60+
} else {
61+
to = ''; finalarrive = '';
62+
}
63+
64+
// Push the output
65+
output.push({
66+
ID: newArray[i][1],
67+
Rank: newArray[i][0],
68+
Operator: newArray[i][2],
69+
70+
Origin: from,
71+
OriginDeparture: originatingdepart,
72+
73+
Destination: to,
74+
DestinationArrival: finalarrive,
75+
76+
Arrival: newArray[i][5],
77+
Departure: newArray[i][6],
78+
79+
Line: newArray[i][8],
80+
Delay: newArray[i][4]
81+
});
82+
}
83+
}
84+
85+
// Send the output
86+
res.send(JSON.stringify(output));
87+
});
88+
});
989

10-
exec(cmd, function(error, stdout, stderr) {
11-
$ = cheerio.load(stdout);
12-
cheerioTableparser($);
13-
data = $("table").parsetable(true, true, true);
14-
if (data.length != 0) { newArray = data[0].map(function(col, i) {
15-
return data.map(function(row) {
16-
return row[i]
17-
})
18-
});
19-
newArray.shift();
20-
output = [];
21-
for (var i = 0; i < newArray.length; i++) {
22-
if (newArray[i][3]) { from = newArray[i][3].split(/[0-9]/)[0].replace(/\s\s*$/, ''); originatingdepart = newArray[i][3].match(/([0-9\:]+)/)[0]; } else { from = ''; originatingdepart = ''; }
23-
if (newArray[i][7]) { to = newArray[i][7].split(/[0-9]/)[0].replace(/\s\s*$/, ''); finalarrive = newArray[i][7].match(/([0-9\:]+)/)[0] } else { to = ''; finalarrive = ''; }
24-
output.push({train: newArray[i][1], type: newArray[i][0], operator: newArray[i][2], from: from, originatingdepart: originatingdepart, to: to, finalarrive: finalarrive, arrive: newArray[i][5], depart: newArray[i][6], line: newArray[i][8], delay: newArray[i][4]});
25-
} }
26-
res.send(JSON.stringify(output));
27-
});
90+
express.get('/train/:id*', function(req, res, next) {
91+
cmd = 'phantomjs ' + path.resolve(__dirname, 'train.js') + ' ' + req.params['id'];
92+
93+
exec(cmd, function(error, stdout, stderr) {
94+
if (error !== null) {
95+
console.log('Execution error: ' + error);
96+
97+
res.statusCode = 500;
98+
res.send('Internal error.');
99+
return;
100+
}
101+
102+
res.setHeader('Access-Control-Allow-Origin', '*');
103+
var output = [];
104+
105+
// Get the retrieved HTML table from PhantomJS
106+
$ = cheerio.load(stdout);
107+
cheerioTableparser($);
108+
var data = $("table").parsetable(true, true, true);
109+
110+
if (data.length != 0) {
111+
/*
112+
* Remove the last two elements of the data array,
113+
* since the table does contain other UI elements
114+
* we don't care about.
115+
*/
116+
data.splice(2, 2);
117+
118+
TrainStatus = data[1][4];
119+
TrainInCirculation = (TrainStatus === "In circulatie");
120+
TrainArrived = (TrainStatus === "Sosit la destinatie");
121+
TrainAwaitingDeparture = (TrainStatus === "Asteapta plecarea");
122+
123+
// Save the output
124+
var TrainInfo = {
125+
ID: data[1][1],
126+
Rank: data[1][0],
127+
128+
Operator: data[1][2],
129+
130+
Route: data[1][3],
131+
Distance: data[1][12],
132+
RouteDuration: data[1][13],
133+
134+
TrainStatus: TrainStatus,
135+
TrainInCirculation: TrainInCirculation,
136+
TrainArrived: TrainArrived,
137+
TrainAwaitingDeparture: TrainAwaitingDeparture,
138+
139+
LatestInfo: data[1][5],
140+
LatestInfoTime: data[1][6],
141+
142+
Delay: data[1][7],
143+
144+
Destination: data[1][8],
145+
DestinationArriveTime: data[1][9],
146+
147+
NextStop: data[1][10],
148+
NextStopTime: data[1][11]
149+
};
150+
151+
output = {
152+
Info: TrainInfo,
153+
};
154+
}
155+
156+
// Send the output
157+
res.send(JSON.stringify(output));
158+
});
159+
});
28160

161+
process.on('uncaughtException', function (err) {
162+
console.log('Caught unhandled exception: ' + err);
29163
});
30164

31165
express.listen(9090);

package.json

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,24 @@
11
{
22
"name": "cfr-iris-scraper",
3-
"version": "1.0.0",
4-
"description": "Live train information: srape-powered API with JSON endpoints for the Romanian national railway infrastructure company CFR S.A's realtime information web interface IRIS (built by state-railway IT company InfoFer).",
3+
"version": "1.0.1",
4+
"description": "Live train information: scrape-powered API with JSON endpoints for the Romanian national railway infrastructure company CFR S.A's realtime information web interface IRIS (built by state-railway IT company InfoFer).",
55
"main": "index.js",
66
"scripts": {
77
"test": "echo \"Error: no test specified\" && exit 1"
88
},
9-
"author": "",
10-
"license": "Apache License 2.0",
9+
"contributors": [
10+
{"name": "Bogdan Minea"},
11+
{"name": "Tiberiu Danciu"}
12+
],
13+
"repository": {
14+
"type": "git",
15+
"url": "https://github.com/BodoMinea/cfr-iris-scraper"
16+
},
17+
"license": "Apache-2.0",
1118
"dependencies": {
1219
"cheerio": "^0.22.0",
1320
"cheerio-tableparser": "^1.0.1",
14-
"express": "^4.15.2"
21+
"express": "^4.15.2",
22+
"command-exists": "^1.2.2"
1523
}
1624
}

0 commit comments

Comments
 (0)