Initial implementation of inner DbDataReader#3604
Conversation
roji
left a comment
There was a problem hiding this comment.
Looks really good - this makes a lot of sense and indeed fits well with the existing ADO.NET APIs. I also really like this as an API for (efficiently) accessing unmapped composites, it makes more sense than the previous approach we had.
| /// Used to keep track of every unique row this reader object ever traverses. | ||
| /// This is used to detect whether inner DbDataReaders are still valid. | ||
| /// </summary> | ||
| internal ulong UniqueRowId; |
There was a problem hiding this comment.
Take a look at how manage this problem for GetStream/GetTextReader - NpgsqlDataReader has a _columnStream which gets set when GetStream is called, and disposed when moving to the next row.
I don't think there's a particular advantage in any of the two techniques, but we should unify and have just one mechanism for this lifecycle management.
There was a problem hiding this comment.
Note that an outer NpgsqlDataReader could be referenced by multiple inner data readers, and multiple levels of inner data readers. I also want this to work as good as possible in case we recycle data readers. So not sure how the stream approach would work in this case.
There was a problem hiding this comment.
You're right - since we need to handle multiple "dependents", the stream approach doesn't work. But then we can convert the stream to look at the new UniqueRowId mechanism.
There was a problem hiding this comment.
BTW that's also maybe a thing to consider... NpgsqlDataReader currently doesn't support multiple streams even when in non-sequential mode (where there's no reason not too). IIRC this was intentional - I wanted users to not depend on the ability to have multiple ones, so that if they switch to sequential (which generally makes a lot of sense where column streams are used!) they have no issues.
The same logic may apply for the inner data reader, i.e. we may want to restrict to only one active at a given time, just like stream. I doubt this would be restrictive for actual users. What do you think?
There was a problem hiding this comment.
The "stream approach" wouldn't work for the multiple depth case I think, since you have multiple inner data readers then. If there is a simple way to support multiple streams, I would say, why restrict it just because we haven't seen a good use case so far? But to be consistent with sequential mode, maybe we should still restrict it... I'm not sure.
There was a problem hiding this comment.
Yeah, it's a bit trickier... I think we need to make sure there's no more than one nested reader open for a given reader (nested or top-level) - that should provide the right protection for sequential. BTW this protection also applies between streams and readers - you can't have both a stream and a reader open at the same time in sequential.
As to restricting in non-sequential as well... I thought about it again, and I guess it's not a problem if working code starts throwing when you switch on sequential - after all this is already what happens if you're not accessing fields in order; sequential simply does impose further constraints on how you interact with the reader. So I'd be fine with removing all restrictions for non-sequential (including the existing stream one).
There was a problem hiding this comment.
Ok so can we leave "allow multiple streams to be opened for non-sequential mode on a particular row" for a separate PR?
|
Some ideas for increasing test coverage:
|
|
Ok so I made a few changes that affects the api:
And now when caching a |
I see that GetData is a non-virtual method that just calls GetDbDataReader (which is protected virtual), so it's probably better for GetDbDataReader to contain the implementation. However, although declaring a new GetData just to return a provider-specific implementation (NpgsqlNestedDataReader) is in line with ADO.NET practices, I don't really see the point here... Provider implementations are typically exposed this way because they provide some non-standard API, but NpgsqlNestedDataReader doesn't provide any such API (well, beyond GetData 🤣). So I'm not sure there's a reason to expose a new type at this point - unless there's something I'm missing (beyond consistency with OleDbDataReader).
Uhh... that sounds pretty horrible :) I'm surprised that BufferedDataReader requires GetName to work, we may want to see about fixing that there rather than hacking GetName? I can try to take a look at it at some point.
Yep - but I'm only proposing caching during the lifetime of the reader (the command execution), during which ReloadTypes cannot be called. Note also that type handlers and their OIDs are cached in various other places in Npgsql; ReloadTypes is really a very edge-case API, typically done at program startup after a new PostgreSQL type is created programmatically. It's not expected to work in the middle of a working application when arbitrary types are being deletes/changed/whatever. |
GetDbDataReader is protected so couldn't override this to make it public. Note that GetData in DbDataReader is listed as non-browsable, so to make it appear in code completion it must be overridden. To then make nested GetData to appear in code completion, we need a new exposed type that also overrides GetData. Does this make any sense?
Could also just return
Ok, I'll do that. |
roji
left a comment
There was a problem hiding this comment.
Back to this after a while :)
GetDbDataReader is protected so couldn't override this to make it public.
Right. I just meant that the standard ADO.NET "pattern" is to have a protected virtual (or abstract) method in the base class which is overridden by providers (GetDbDataReader in this case), and a public, non-virtual method which calls it, and which is optionally replaced with a new implementation (this is done in case the replacing method wants to return a provider-specific implementation). For example, DbCommand has ExecuteDbDataReader (abstract, overridden by providers), and non-virtual ExecuteReader which calls it. Providers typically have a new ExecuteReader so that it returns NpgsqlDataReader and not DbDataReader.
So to summarize, it would be a bit more "standard" to have the implementation in the override of GetDbDataReader, rather than in GetData (but this is pretty much nitpicking).
To then make nested GetData to appear in code completion, we need a new exposed type that also overrides GetData. Does this make any sense?
That's a valid point - why GetData has EditorBrowsableState.Never is probably lost in the mysteries of time by now :) I think it's quite heavy to expose a whole NpgsqlNestedDataReader type publicly just so that its EditorBrowsableState can override this - I'd personally just leave it (Intellisense doesn't seem that important in this particular case). But if you feel strongly about it I'm fine with NpgsqlNestedDataReader too.
Could also just return ?column? like postgresql and leave GetOrdinal not supported. Let me know which decision we should take.
Another possibility is an empty string... Ideally we could continue throwing and fix BufferedDataReader to not require this, but if that's not feasible I'm OK with whatever here.
| /// </summary> | ||
| internal ulong UniqueRowId; | ||
|
|
||
| internal NpgsqlNestedDataReader? CachedFreeNestedDataReader; |
There was a problem hiding this comment.
I thought it was a good idea to signify that the reader is free.
| { | ||
| NpgsqlDataReader _outermostReader; | ||
| ulong _uniqueOutermostReaderRowId; | ||
| NpgsqlNestedDataReader? _outerNestedReader; |
There was a problem hiding this comment.
This doesn't seem to get assigned anywhere...
Also, we can just have an object _parentReader which points to either a NpgsqlNestedDataReader or NpgsqlDataReader, and switch on the type in Dispose.
There was a problem hiding this comment.
Not really, we need to have a reference to the outermost reader, so that we can fetch the Buffer and the Connector.
There was a problem hiding this comment.
I think I meant one object _parentReader in addition to _outermostReader, which then you can switch on for type in Dispose - but on second look not very significant.
I just did the same approach as in OleDbDataReader. One good thing here is that we compared to the other variant avoid an (ugly) cast from DbDataReader to NpgsqlNestedDataReader.
Yeah I thought so first too that just leave it, but when trying the feature out, it felt very strange that IntelliSense didn't list it. It's also good for documentation purposes so that users know that GetData is available, what it does and in what cases it works.
I was thinking of these options too, but am not sure what is the best way. I thought throwing a NonSupportedException was the most reasonable, until I noticed that BufferedDataReader requires GetName to work. Is this something you can take care of in the efcore code, to relax this? Not sure if it's possible, otherwise I think I'll switch to ?column? and throw for GetOrdinal. Note that in case GetData is used on a composite type that is known (and therefore has column names), do you think we should populate the column names from that info? |
The main use seems to be here - when the user uses raw SQL queries (FromSql*), and hasn't specified the proper columns needed by EF to materialize the entity, this helps detect and report which columns are missing. There also doesn't seem to be a way in the DbDataReader API to ask whether names are available without calling GetName (and catching an exception). So maybe we should just leave it as-is and return empty string or Just to make sure I remember right... This concerns us because you plan to (recursively) wrap nested readers with BufferedDataReader, right?
Good point - yeah, that would be ideal if possible... It would be a full, high-perf API to unmapped composites. But if you run into trouble I don't think it's vital. |
|
@Emill this looks ready to me - let me know when you're done and are OK with me merging! |
Yes... I'll change to |
Done! |
|
Nice to see the composite support too... Very cool. |
| { | ||
| CheckClosedOrDisposed(); | ||
|
|
||
| UniqueRowId++; |
There was a problem hiding this comment.
Is there a reason why we do this only for sync Read?
Fixes #3558.