-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Improve worst case performance of GTFS geometry processing #6748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve worst case performance of GTFS geometry processing #6748
Conversation
…at a stop is always visited on the first double-back
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev-2.x #6748 +/- ##
=============================================
+ Coverage 71.76% 71.77% +0.01%
- Complexity 19317 19331 +14
=============================================
Files 2088 2088
Lines 78711 78742 +31
Branches 8020 8028 +8
=============================================
+ Hits 56487 56518 +31
- Misses 19367 19369 +2
+ Partials 2857 2855 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
leonardehrenfried
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do require tests for all business logic so please add some, in particular because this is a tricky algorithmic problem that we might need to tune.
You can use this as a starting point: a365b78
sorry I didn't realise that the original code was not tested - I will add some tests into it. |
|
I am marking this as draft now for #6752 , which changes some of the model classes needed for me to write tests. |
|
@miklcct @leonardehrenfried Just to note, we had this same issue and this pull request works for us so if you can please merge it in as soon as possible. |
|
Following on from PavaoZornija1 comment (we work together) - we were using the GTFS Scotland file here: https://data.bus-data.dft.gov.uk/timetable/download/gtfs-file/scotland/ - In case anyone else is using the same file and running into the same issue. A lot of debugging attempts over the last 4-6 weeks and many trials and errors @miklcct you have done amazing. Many thanks! |
|
I don't believe that you're a reviewer yet, @miklcct , so I will merge this. |
Summary
Don't use recursion for GeometryProcessor.getStopLocations. Instead, join the consecutive segments and choose the best match in the first consecutive segment.
This is to avoid exponential complexity when a route does multiple double-backs with multiple stops in them.
This is now waiting for #6752 for the purpose of tests.
Issue
Fixes #6744
Unit tests
It is a change of private method so no new tests are added.
It has been tested manually with the GTFS attached in the bug report, and on the staging server of Aubin which contains all schedules for the whole of Great Britain, by planning a journey from Stowaway to a later stop on the problematic trip.
Documentation
There may be a behavioural change. I have joined the discontinuities if the discontinuity does not enter the 150 m coverage area of the next stop (which should be adequate to prevent a worse match in the simple case of a bus running ahead to a turning loop in order to call at a stop on the opposite side of the road), but there may be a change in the match if a bus first goes within 150 m of stop 1, then exit the 150 m radius of stop 1 but enters within 100 m from stop 2, then goes back to within 75 m from stop 1, then goes back to within 50 m of stop 2. In the existing algorithm, the match will be 75 m from stop 1 and 50 m from stop 2, as it tries to start with the best match and recursively determine if it is possible or not, but in my algorithm, it will always return the first match to prevent the recursion, so it will return the 150 m and 100 m match in this case.
The description of the existing and my proposed algorithm
Existing: All the possible segment matches are gathered for each stop, sorted by the distance to the stop. Find a series of ordered segment matches by starting from the initial stop, trying from the best match first, and recurse to the next stop and try from the best match from that stop. If the recursion reaches the end of the trip, a valid sequence is produced. If not, try the next best match.My algorithm:
All the possible segment matches are gathered for each stop, ordered from the beginning of the route to the end of route. Starting from the initial stop, allocate the segment matches into continuous segments. Continuous segments are segments which either join up, or separated with the segments in between all not being possible matches for the next stop. Choose the best match within the first continuous segments. Iterate to the next stop and repeat, ignoring any segments which have already been passed by the last stop. If the iteration runs out of segments, a valid sequence is not possible; if it reaches the end of the trip, a valid sequence is produced.
Changelog
Should be added.
Bumping the serialization version id
Not needed.