Support for new data types when reading/writing networks #73

KasiaKoz · 2021-03-31T14:19:16Z

Adds support (read/write) for:

CSV (network nodes and links csv tables and GTFS-like tables for the PT schedule)
- Comprehensive export for network nodes and links
- Some data loss (relations of stops and routing on the network) for schedule
- read csv for schedule is equivalent to reading GTFS, it is a new method that is more responsive to the data stored and forgiving for data missing in GTFS (the list of reindexing warnings when reading it to schedule object will disappear @Arupwkeu )
JSON
- The most comprehensive export for network and schedule
GeoJSON
- The indexing is now consistent with link indexing for network
- read only for network, format not very supportive for all of the schedule data
- write for geojsons is the same as before, but can now be done via a class method network.write_to_geojson(path)

Adds nice class methods for both Network and Schedule objects to put them in the formats mentioned above

to_json
to_geodataframe

Changes MATSim read methods - breaking change. All read methods are now stored in the read module, which can be imported in the following way, e.g

from genet import read_matsim

where you have a selection of read methods, all return a genet object from that data type. The syntax to read a MATSim network changes from:

from genet import Network

n = Network('epsg:27700')
n.read_matsim_network(path_to_matsim_network)
n.read_matsim_schedule(path_to_matsim_schedule, path_to_matsim_vehicles)

to

from genet import read_matsim

n = read_matsim(
    path_to_network=path_to_matsim_network, 
    epsg='epsg:27700', 
    path_to_schedule=path_to_matsim_schedule, 
    path_to_vehicles=path_to_matsim_vehicles
)

OSM reading method also moved to the read module for consistence.

Adds a lot of new test/example data and jupyter notebooks which which show examples and talk about the schema if you want to read data and limitations of what gets saved if you're saving it out.

# Conflicts: # tests/test_core_schedule_elements.py # tests/test_outputs_handler_geojson.py

# Conflicts: # tests/test_core_network.py # tests/test_outputs_handler_geojson.py # tests/test_use_schedule.py

mfitz

Looks good.

I've made a comment or two in the tests asking if there are any ways we could be clearer about why we have some of the expectations we're asserting against, which is a question that applied to a fair number of different tests.

I also noticed we have an XFAIL test (tests/test_core_schedule_elements.py::test_splitting_service_edge_case_on_direction_results_in_two_directions XFAIL) and a lot of warnings:

669 passed, 1 xfailed, 440 warnings in 550.78s (0:09:10)

Might be worth cleaning those up in a separate story.

mfitz · 2021-04-23T01:09:47Z

genet/inputs_handler/gtfs_reader.py

-                        stop_times_db[row['trip_id']].append(dict(row))
-                    else:
-                        stop_times_db[row['trip_id']] = [dict(row)]
+            stop_times_db = pd.read_csv(file, dtype={'trip_id': str, 'stop_id': str}, low_memory=False)


I like this more flexible approach. A lot less code duplication.

mfitz · 2021-04-23T01:19:33Z

genet/inputs_handler/read.py

+        json_data = json.load(json_file)
+    for node, data in json_data['nodes'].items():
+        try:
+            del data['geometry']


How often would we expect the geometry key/value pair to be missing? Is it normally there, normally missing, or 50/50?

Depending on where this json comes from, all of the geometry might be missing (it's not a requirement for a json network format) or none of it will be missing (genet exports node geometry to json, so if you read it back, it'll all be there)

mfitz · 2021-04-23T01:24:29Z

genet/outputs_handler/csv.py

+        expected schema
+    :return: genet.Network object
+    """
+    pass


Is this supposed to be an empty implementation?

lol, I put a placeholder there and then ended up putting the method somewhere else 😂

mfitz · 2021-04-23T01:41:47Z

notebooks/4.3. Using Network - Routing.ipynb

@@ -208,7 +217,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python (genet)",
+   "display_name": "genet",
   "language": "python",
   "name": "genet"


That reminds me - the notebook smoke test should be discovering all of the kernel names dynamically by reading the notebooks themselves. At the moment, you have to pass the name manually. I'll nice up the smoke test soon to make it more like the one in PAM.

mfitz · 2021-04-23T01:42:03Z

notebooks/4.3. Using Network - Routing.ipynb

@@ -222,7 +231,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.6"
+   "version": "3.7.0"


Is this a deliberate version downgrade?

probably not deliberate, I had to make a new virtual environment for genet at one point. I think this is better (genet's requirement is >=3.7 so I should probably use the lowest version ?)

I don't think it really matters which version you use, so long as you're deliberate and consistent.

mfitz · 2021-04-23T01:47:45Z

tests/test_core_network.py

+
+
+def test_transforming_network_to_json(network1, json_network):
+    assert_semantically_equal(network1.to_json(), json_network)


You could maybe return the network and its expected JSON representation from the same fixture, so that the fixture gives you the data and the expectation. So this test would look something like:

def test_transforming_network_to_json(network): assert_semantically_equal(network['network'].to_json(), network['expected_json'])

Tying the network and it's representation as a dictionary together makes it clearer why we expect what we're asserting on, I think.

mfitz · 2021-04-23T01:53:22Z

tests/test_core_network.py

+
+def test_saving_network_to_json(network1, json_network, tmpdir):
+    network1.apply_attributes_to_links(
+        {'0': {


Why do we add the linestring to this node? It feels like this might be a test in its own right (when you add an attribute, do you see it persisted, I mean).

mfitz · 2021-04-23T01:54:29Z

tests/test_core_network.py

+    json_network['links']['0']['modes'] = 'car'
+    assert_semantically_equal(
+        output_json,
+        {'nodes': {


Is there a way to be clearer about why we expect the things we're asserting on? Would the pattern of tying the fixture data to its expected value make this clearer? If not, is there another way?

KasiaKoz · 2021-04-27T11:06:16Z

Thanks @mfitz, I fixed up a few things and added a card LAB-1179 for a more serious maintenance of genet tests. The XFAIL is there to stay for now though, it's an edge case marker for method that does most of the job, but there is no requirement to cover that edge case right now.

KasiaKoz added 30 commits March 25, 2021 10:26

add placeholders for methods

ff00c11

add json write methods

f2aa912

change+move gtfs read method

50f59ef

add test

f1f71ce

add transform to gts method

832585d

add csv/txt write methods for schedule gtfs

53e8b59

fix in gtfs read to not include routes with single stop

8f46dd0

update read gtfs notebook

8556a89

add notebook for writing gtfs outputs, small fixes to methods

a0d19ed

add to_geodataframe network method

e36e159

add logging, split encoding geometry dataframe

906c807

add CSV read notebook and update gtfs write nb

da17687

correct json outputs w.r.t. geometries

2157a1a

tidy up csv save for schedule

ca244ae

add read csv method

8687f17

add reading CSV notebook and logging

8618ab7

add method to read json

b95a6fe

tidy up geojson outs/geodataframe transform

c135ac6

update doc strings

7213bc4

geojson read method

aee2a32

reading json and geojson notebook and example data

0a3a894

move read matsim method

2262da1

update jupyter notebooks

b3c93a0

Merge branch 'master' into new-network-in-out-puts

e1ef857

# Conflicts: # tests/test_core_schedule_elements.py # tests/test_outputs_handler_geojson.py

Merge branch 'master' into new-network-in-out-puts

7a9ebe4

# Conflicts: # tests/test_core_schedule_elements.py # tests/test_outputs_handler_geojson.py

fix tests

a44a533

add encoding

6faa07f

specify dtypes

d5a6a2c

specify dtypes, fix test

4d75acc

move osm read method to read module

d64646a

Merge branch 'master' into new-network-in-out-puts

879dfaf

# Conflicts: # tests/test_core_network.py # tests/test_outputs_handler_geojson.py # tests/test_use_schedule.py

KasiaKoz requested review from mfitz and Arupwkeu April 22, 2021 14:45

mfitz approved these changes Apr 23, 2021

View reviewed changes

KasiaKoz added 5 commits April 26, 2021 13:50

refactor

05f5219

incorporate loopy gtfs fix

c0d44ac

incorporate loopy gtfs fix

d97d8b5

PR comments

c0c8d91

Merge branch 'master' into new-network-in-out-puts

cdac28f

KasiaKoz merged commit 63db793 into master Apr 27, 2021

KasiaKoz deleted the new-network-in-out-puts branch April 27, 2021 11:06

KasiaKoz mentioned this pull request Apr 29, 2021

Fix scripts #80

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for new data types when reading/writing networks #73

Support for new data types when reading/writing networks #73

KasiaKoz commented Mar 31, 2021 •

edited

Loading

mfitz left a comment

mfitz Apr 23, 2021

mfitz Apr 23, 2021

KasiaKoz Apr 26, 2021

mfitz Apr 23, 2021

KasiaKoz Apr 26, 2021

mfitz Apr 23, 2021

mfitz Apr 23, 2021

KasiaKoz Apr 26, 2021

mfitz Apr 27, 2021

mfitz Apr 23, 2021

mfitz Apr 23, 2021

mfitz Apr 23, 2021

KasiaKoz commented Apr 27, 2021



		def test_transforming_network_to_json(network1, json_network):
		assert_semantically_equal(network1.to_json(), json_network)

Support for new data types when reading/writing networks #73

Support for new data types when reading/writing networks #73

Conversation

KasiaKoz commented Mar 31, 2021 • edited Loading

mfitz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KasiaKoz commented Apr 27, 2021

KasiaKoz commented Mar 31, 2021 •

edited

Loading