Streaming GTFS stop times and shapes #6754

leonardehrenfried · 2025-07-16T07:55:05Z

Summary

This is a proof of concept for more efficient processing of stop times and shapes: rather than reading all of them into a huge list/array they are streamed off the CSV source line by line.

This has huge memory savings - in a typical graph build you can save 30-40%!

Combined with #6752 this saves about 60% of memory.

The downside is that we now have two ways of reading GTFS data: one streaming and one from the OBA library.

We need to discuss the various trade offs to make and therefore this is a draft. (It also depends on a PR that isn't merged yet.)

cc @tkalvas @abyrd @jessicaKoehnke

optionsome

I guess one option would also be to add some sort of a streaming reader mode to the OBA library and read the rows through it, but we probably would need to do it a slightly more generic way which might lead to more memory consumption-

optionsome · 2025-08-08T20:25:57Z

application/src/main/java/org/opentripplanner/gtfs/graphbuilder/GtfsModule.java

+    dao.setPackShapePoints(true);
+    dao.setPackStopTimes(true);


What do these do?

They instruct OBA to use a more compact way of representing these entities. But if the do the streaming approach it is no longer necessary.

leonardehrenfried · 2025-08-10T08:48:12Z

I guess one option would also be to add some sort of a streaming reader mode to the OBA library and read the rows through it, but we probably would need to do it a slightly more generic way which might lead to more memory consumption-

I had the same idea. The problem is that streaming the entities will give up referential integrity checks in the library and for example the StopTime.trip is no longer a full Trip but a trip id, which the consumer has to resolve.

This means that we need a new data model. So with a new way of reading data and a new data model there isn't much left of OBA. Also, now that I've maintained OBA for a while, I see that there is a huge amount of complicated indirection in there which to me doesn't make a lot of sense.

My favourite solution is this: we create a new module in this repo where we develop a new streaming library. Once we are satisfied with it we can consider moving it to another repo either in the OBA or the OTP orgs.

# Conflicts: # application/src/main/java/org/opentripplanner/gtfs/mapping/ShapePointMapper.java

leonardehrenfried force-pushed the efficient-stop-times branch from 5aacb43 to 67a963b Compare July 18, 2025 13:22

leonardehrenfried requested a review from optionsome July 21, 2025 09:53

optionsome reviewed Aug 8, 2025

View reviewed changes

t2gran added this to the 2.8 (next release) milestone Aug 11, 2025

leonardehrenfried added 9 commits August 13, 2025 21:29

Process stop times more efficiently

f155cff

Use packed stop times

71a4911

Stream stop times

e26df27

Clean up

f0c7eda

# Conflicts: # application/src/main/java/org/opentripplanner/gtfs/mapping/ShapePointMapper.java

Flesh out stop time mapper

5908d30

Add tests

8af121d

Fix tests

a9e2ce4

Clean up

6290675

Extract framework code

4dc8e10

leonardehrenfried force-pushed the efficient-stop-times branch from 67a963b to 4dc8e10 Compare August 13, 2025 19:30

t2gran modified the milestones: 2.8, 2.9 (next release) Sep 10, 2025

leonardehrenfried mentioned this pull request Nov 7, 2025

Check for NO_VALUE when mapping GTFS booking rules #7029

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming GTFS stop times and shapes #6754

Streaming GTFS stop times and shapes #6754

Uh oh!

leonardehrenfried commented Jul 16, 2025

Uh oh!

optionsome left a comment

Uh oh!

optionsome Aug 8, 2025

Uh oh!

leonardehrenfried Aug 10, 2025

Uh oh!

leonardehrenfried commented Aug 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Streaming GTFS stop times and shapes #6754

Are you sure you want to change the base?

Streaming GTFS stop times and shapes #6754

Uh oh!

Conversation

leonardehrenfried commented Jul 16, 2025

Summary

Uh oh!

optionsome left a comment

Choose a reason for hiding this comment

Uh oh!

optionsome Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

leonardehrenfried Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

leonardehrenfried commented Aug 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants