Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for new data types when reading/writing networks #73

Merged
merged 36 commits into from
Apr 27, 2021

Conversation

KasiaKoz
Copy link
Collaborator

@KasiaKoz KasiaKoz commented Mar 31, 2021

Adds support (read/write) for:

  • CSV (network nodes and links csv tables and GTFS-like tables for the PT schedule)
    • Comprehensive export for network nodes and links
    • Some data loss (relations of stops and routing on the network) for schedule
    • read csv for schedule is equivalent to reading GTFS, it is a new method that is more responsive to the data stored and forgiving for data missing in GTFS (the list of reindexing warnings when reading it to schedule object will disappear @Arupwkeu )
  • JSON
    • The most comprehensive export for network and schedule
  • GeoJSON
    • The indexing is now consistent with link indexing for network
    • read only for network, format not very supportive for all of the schedule data
    • write for geojsons is the same as before, but can now be done via a class method network.write_to_geojson(path)

Adds nice class methods for both Network and Schedule objects to put them in the formats mentioned above

  • to_json
  • to_geodataframe

Changes MATSim read methods - breaking change. All read methods are now stored in the read module, which can be imported in the following way, e.g

from genet import read_matsim

where you have a selection of read methods, all return a genet object from that data type. The syntax to read a MATSim network changes from:

from genet import Network

n = Network('epsg:27700')
n.read_matsim_network(path_to_matsim_network)
n.read_matsim_schedule(path_to_matsim_schedule, path_to_matsim_vehicles)

to

from genet import read_matsim

n = read_matsim(
    path_to_network=path_to_matsim_network, 
    epsg='epsg:27700', 
    path_to_schedule=path_to_matsim_schedule, 
    path_to_vehicles=path_to_matsim_vehicles
)

OSM reading method also moved to the read module for consistence.

Adds a lot of new test/example data and jupyter notebooks which which show examples and talk about the schema if you want to read data and limitations of what gets saved if you're saving it out.

# Conflicts:
#	tests/test_core_network.py
#	tests/test_outputs_handler_geojson.py
#	tests/test_use_schedule.py
@KasiaKoz KasiaKoz requested review from mfitz and Arupwkeu April 22, 2021 14:45
Copy link
Contributor

@mfitz mfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I've made a comment or two in the tests asking if there are any ways we could be clearer about why we have some of the expectations we're asserting against, which is a question that applied to a fair number of different tests.

I also noticed we have an XFAIL test (tests/test_core_schedule_elements.py::test_splitting_service_edge_case_on_direction_results_in_two_directions XFAIL) and a lot of warnings:

669 passed, 1 xfailed, 440 warnings in 550.78s (0:09:10)

Might be worth cleaning those up in a separate story.

stop_times_db[row['trip_id']].append(dict(row))
else:
stop_times_db[row['trip_id']] = [dict(row)]
stop_times_db = pd.read_csv(file, dtype={'trip_id': str, 'stop_id': str}, low_memory=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this more flexible approach. A lot less code duplication.

json_data = json.load(json_file)
for node, data in json_data['nodes'].items():
try:
del data['geometry']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often would we expect the geometry key/value pair to be missing? Is it normally there, normally missing, or 50/50?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on where this json comes from, all of the geometry might be missing (it's not a requirement for a json network format) or none of it will be missing (genet exports node geometry to json, so if you read it back, it'll all be there)

expected schema
:return: genet.Network object
"""
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be an empty implementation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, I put a placeholder there and then ended up putting the method somewhere else 😂

@@ -208,7 +217,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python (genet)",
"display_name": "genet",
"language": "python",
"name": "genet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That reminds me - the notebook smoke test should be discovering all of the kernel names dynamically by reading the notebooks themselves. At the moment, you have to pass the name manually. I'll nice up the smoke test soon to make it more like the one in PAM.

@@ -222,7 +231,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
"version": "3.7.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a deliberate version downgrade?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not deliberate, I had to make a new virtual environment for genet at one point. I think this is better (genet's requirement is >=3.7 so I should probably use the lowest version ?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it really matters which version you use, so long as you're deliberate and consistent.



def test_transforming_network_to_json(network1, json_network):
assert_semantically_equal(network1.to_json(), json_network)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could maybe return the network and its expected JSON representation from the same fixture, so that the fixture gives you the data and the expectation. So this test would look something like:

def test_transforming_network_to_json(network):
    assert_semantically_equal(network['network'].to_json(), network['expected_json'])

Tying the network and it's representation as a dictionary together makes it clearer why we expect what we're asserting on, I think.


def test_saving_network_to_json(network1, json_network, tmpdir):
network1.apply_attributes_to_links(
{'0': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add the linestring to this node? It feels like this might be a test in its own right (when you add an attribute, do you see it persisted, I mean).

json_network['links']['0']['modes'] = 'car'
assert_semantically_equal(
output_json,
{'nodes': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to be clearer about why we expect the things we're asserting on? Would the pattern of tying the fixture data to its expected value make this clearer? If not, is there another way?

@KasiaKoz
Copy link
Collaborator Author

Thanks @mfitz, I fixed up a few things and added a card LAB-1179 for a more serious maintenance of genet tests. The XFAIL is there to stay for now though, it's an edge case marker for method that does most of the job, but there is no requirement to cover that edge case right now.

@KasiaKoz KasiaKoz merged commit 63db793 into master Apr 27, 2021
@KasiaKoz KasiaKoz deleted the new-network-in-out-puts branch April 27, 2021 11:06
@KasiaKoz KasiaKoz mentioned this pull request Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants