Skip to content

Initial implementation of inner DbDataReader#3604

Merged
roji merged 7 commits into
npgsql:mainfrom
Emill:RecordDbDataReader
Apr 14, 2021
Merged

Initial implementation of inner DbDataReader#3604
roji merged 7 commits into
npgsql:mainfrom
Emill:RecordDbDataReader

Conversation

@Emill

@Emill Emill commented Mar 17, 2021

Copy link
Copy Markdown
Contributor

Fixes #3558.

@roji roji left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good - this makes a lot of sense and indeed fits well with the existing ADO.NET APIs. I also really like this as an API for (efficiently) accessing unmapped composites, it makes more sense than the previous approach we had.

Comment thread src/Npgsql/NpgsqlRecordDataReader.cs Outdated
Comment thread src/Npgsql/NpgsqlRecordDataReader.cs Outdated
/// Used to keep track of every unique row this reader object ever traverses.
/// This is used to detect whether inner DbDataReaders are still valid.
/// </summary>
internal ulong UniqueRowId;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at how manage this problem for GetStream/GetTextReader - NpgsqlDataReader has a _columnStream which gets set when GetStream is called, and disposed when moving to the next row.

I don't think there's a particular advantage in any of the two techniques, but we should unify and have just one mechanism for this lifecycle management.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that an outer NpgsqlDataReader could be referenced by multiple inner data readers, and multiple levels of inner data readers. I also want this to work as good as possible in case we recycle data readers. So not sure how the stream approach would work in this case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - since we need to handle multiple "dependents", the stream approach doesn't work. But then we can convert the stream to look at the new UniqueRowId mechanism.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW that's also maybe a thing to consider... NpgsqlDataReader currently doesn't support multiple streams even when in non-sequential mode (where there's no reason not too). IIRC this was intentional - I wanted users to not depend on the ability to have multiple ones, so that if they switch to sequential (which generally makes a lot of sense where column streams are used!) they have no issues.

The same logic may apply for the inner data reader, i.e. we may want to restrict to only one active at a given time, just like stream. I doubt this would be restrictive for actual users. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "stream approach" wouldn't work for the multiple depth case I think, since you have multiple inner data readers then. If there is a simple way to support multiple streams, I would say, why restrict it just because we haven't seen a good use case so far? But to be consistent with sequential mode, maybe we should still restrict it... I'm not sure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit trickier... I think we need to make sure there's no more than one nested reader open for a given reader (nested or top-level) - that should provide the right protection for sequential. BTW this protection also applies between streams and readers - you can't have both a stream and a reader open at the same time in sequential.

As to restricting in non-sequential as well... I thought about it again, and I guess it's not a problem if working code starts throwing when you switch on sequential - after all this is already what happens if you're not accessing fields in order; sequential simply does impose further constraints on how you interact with the reader. So I'd be fine with removing all restrictions for non-sequential (including the existing stream one).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so can we leave "allow multiple streams to be opened for non-sequential mode on a particular row" for a separate PR?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Comment thread src/Npgsql/NpgsqlDataReader.cs Outdated
Comment thread src/Npgsql/NpgsqlRecordDataReader.cs Outdated
Comment thread src/Npgsql/NpgsqlRecordDataReader.cs Outdated
Comment thread src/Npgsql/NpgsqlRecordDataReader.cs Outdated
Comment thread src/Npgsql/NpgsqlRecordDataReader.cs Outdated
Comment thread test/Npgsql.Tests/RecordDbDataReaderTests.cs Outdated
Comment thread test/Npgsql.Tests/RecordDbDataReaderTests.cs Outdated
Comment thread test/Npgsql.Tests/RecordDbDataReaderTests.cs Outdated
@roji

roji commented Mar 19, 2021

Copy link
Copy Markdown
Member

Some ideas for increasing test coverage:

  • Heterogeneous rows (different number and types of fields)
  • Composite type and array of composite type (right now only records are tested)

@Emill

Emill commented Mar 21, 2021

Copy link
Copy Markdown
Contributor Author

Ok so I made a few changes that affects the api:

  1. Use the name NpgsqlNestedDataReader.
  2. Made the class public as well as added a "new" method GetData with this type as the return type, rather than a generic DbDataReader. This allows better documentation, makes it browsable and more in-line with the OleDbDataReader approach (maybe not so relevant but whatever). Also allows for extensibility in the future with custom methods.
  3. Made a mapping to column name that just uses the integer converted to/from a string. Just throwing NotSupportedException or similar made wrapping inside an EFCore BufferedDataReader fail, since it expects GetName to work. Not sure if this is the best way though since who knows if we will find a better naming strategy in the future.

And now when caching a TypeHandler using oid, will it be a problem that ReloadTypes not currently flushes this oid cache? (Might get hit by this since now NpgsqlDataReader, inner NpgsqlNestedDataReader are cached, for a connector).

@roji

roji commented Mar 21, 2021

Copy link
Copy Markdown
Member

Made the class public as well as added a "new" method GetData with this type as the return type, rather than a generic DbDataReader. This allows better documentation, makes it browsable and more in-line with the OleDbDataReader approach (maybe not so relevant but whatever). Also allows for extensibility in the future with custom methods.

I see that GetData is a non-virtual method that just calls GetDbDataReader (which is protected virtual), so it's probably better for GetDbDataReader to contain the implementation.

However, although declaring a new GetData just to return a provider-specific implementation (NpgsqlNestedDataReader) is in line with ADO.NET practices, I don't really see the point here... Provider implementations are typically exposed this way because they provide some non-standard API, but NpgsqlNestedDataReader doesn't provide any such API (well, beyond GetData 🤣). So I'm not sure there's a reason to expose a new type at this point - unless there's something I'm missing (beyond consistency with OleDbDataReader).

Made a mapping to column name that just uses the integer converted to/from a string. Just throwing NotSupportedException or similar made wrapping inside an EFCore BufferedDataReader fail, since it expects GetName to work. Not sure if this is the best way though since who knows if we will find a better naming strategy in the future.

Uhh... that sounds pretty horrible :) I'm surprised that BufferedDataReader requires GetName to work, we may want to see about fixing that there rather than hacking GetName? I can try to take a look at it at some point.

And now when caching a TypeHandler using oid, will it be a problem that ReloadTypes not currently flushes this oid cache? (Might get hit by this since now NpgsqlDataReader, inner NpgsqlNestedDataReader are cached, for a connector).

Yep - but I'm only proposing caching during the lifetime of the reader (the command execution), during which ReloadTypes cannot be called. Note also that type handlers and their OIDs are cached in various other places in Npgsql; ReloadTypes is really a very edge-case API, typically done at program startup after a new PostgreSQL type is created programmatically. It's not expected to work in the middle of a working application when arbitrary types are being deletes/changed/whatever.

@Emill

Emill commented Mar 21, 2021

Copy link
Copy Markdown
Contributor Author

Made the class public as well as added a "new" method GetData with this type as the return type, rather than a generic DbDataReader. This allows better documentation, makes it browsable and more in-line with the OleDbDataReader approach (maybe not so relevant but whatever). Also allows for extensibility in the future with custom methods.

I see that GetData is a non-virtual method that just calls GetDbDataReader (which is protected virtual), so it's probably better for GetDbDataReader to contain the implementation.

However, although declaring a new GetData just to return a provider-specific implementation (NpgsqlNestedDataReader) is in line with ADO.NET practices, I don't really see the point here... Provider implementations are typically exposed this way because they provide some non-standard API, but NpgsqlNestedDataReader doesn't provide any such API (well, beyond GetData 🤣). So I'm not sure there's a reason to expose a new type at this point - unless there's something I'm missing (beyond consistency with OleDbDataReader).

GetDbDataReader is protected so couldn't override this to make it public. Note that GetData in DbDataReader is listed as non-browsable, so to make it appear in code completion it must be overridden. To then make nested GetData to appear in code completion, we need a new exposed type that also overrides GetData. Does this make any sense?

Made a mapping to column name that just uses the integer converted to/from a string. Just throwing NotSupportedException or similar made wrapping inside an EFCore BufferedDataReader fail, since it expects GetName to work. Not sure if this is the best way though since who knows if we will find a better naming strategy in the future.

Uhh... that sounds pretty horrible :) I'm surprised that BufferedDataReader requires GetName to work, we may want to see about fixing that there rather than hacking GetName? I can try to take a look at it at some point.

Could also just return ?column? like postgresql and leave GetOrdinal not supported. Let me know which decision we should take.

And now when caching a TypeHandler using oid, will it be a problem that ReloadTypes not currently flushes this oid cache? (Might get hit by this since now NpgsqlDataReader, inner NpgsqlNestedDataReader are cached, for a connector).

Yep - but I'm only proposing caching during the lifetime of the reader (the command execution), during which ReloadTypes cannot be called. Note also that type handlers and their OIDs are cached in various other places in Npgsql; ReloadTypes is really a very edge-case API, typically done at program startup after a new PostgreSQL type is created programmatically. It's not expected to work in the middle of a working application when arbitrary types are being deletes/changed/whatever.

Ok, I'll do that.

@Emill Emill marked this pull request as ready for review March 22, 2021 01:58
@Emill Emill requested a review from vonzshik as a code owner March 22, 2021 01:58

@roji roji left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to this after a while :)

GetDbDataReader is protected so couldn't override this to make it public.

Right. I just meant that the standard ADO.NET "pattern" is to have a protected virtual (or abstract) method in the base class which is overridden by providers (GetDbDataReader in this case), and a public, non-virtual method which calls it, and which is optionally replaced with a new implementation (this is done in case the replacing method wants to return a provider-specific implementation). For example, DbCommand has ExecuteDbDataReader (abstract, overridden by providers), and non-virtual ExecuteReader which calls it. Providers typically have a new ExecuteReader so that it returns NpgsqlDataReader and not DbDataReader.

So to summarize, it would be a bit more "standard" to have the implementation in the override of GetDbDataReader, rather than in GetData (but this is pretty much nitpicking).

To then make nested GetData to appear in code completion, we need a new exposed type that also overrides GetData. Does this make any sense?

That's a valid point - why GetData has EditorBrowsableState.Never is probably lost in the mysteries of time by now :) I think it's quite heavy to expose a whole NpgsqlNestedDataReader type publicly just so that its EditorBrowsableState can override this - I'd personally just leave it (Intellisense doesn't seem that important in this particular case). But if you feel strongly about it I'm fine with NpgsqlNestedDataReader too.

Could also just return ?column? like postgresql and leave GetOrdinal not supported. Let me know which decision we should take.

Another possibility is an empty string... Ideally we could continue throwing and fix BufferedDataReader to not require this, but if that's not feasible I'm OK with whatever here.

Comment thread test/Npgsql.Tests/NestedDataReaderTests.cs Outdated
Comment thread src/Npgsql/NpgsqlNestedDataReader.cs Outdated
Comment thread src/Npgsql/NpgsqlNestedDataReader.cs Outdated
Comment thread src/Npgsql/NpgsqlNestedDataReader.cs Outdated
/// </summary>
internal ulong UniqueRowId;

internal NpgsqlNestedDataReader? CachedFreeNestedDataReader;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: CachedNestedDataReader?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was a good idea to signify that the reader is free.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK by me

Comment thread src/Npgsql/NpgsqlNestedDataReader.cs Outdated
{
NpgsqlDataReader _outermostReader;
ulong _uniqueOutermostReaderRowId;
NpgsqlNestedDataReader? _outerNestedReader;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to get assigned anywhere...

Also, we can just have an object _parentReader which points to either a NpgsqlNestedDataReader or NpgsqlDataReader, and switch on the type in Dispose.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, we need to have a reference to the outermost reader, so that we can fetch the Buffer and the Connector.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I meant one object _parentReader in addition to _outermostReader, which then you can switch on for type in Dispose - but on second look not very significant.

Comment thread src/Npgsql/NpgsqlNestedDataReader.cs Outdated
Comment thread src/Npgsql/NpgsqlNestedDataReader.cs Outdated
@Emill

Emill commented Apr 11, 2021

Copy link
Copy Markdown
Contributor Author

So to summarize, it would be a bit more "standard" to have the implementation in the override of GetDbDataReader, rather than in GetData (but this is pretty much nitpicking).

I just did the same approach as in OleDbDataReader. One good thing here is that we compared to the other variant avoid an (ugly) cast from DbDataReader to NpgsqlNestedDataReader.

To then make nested GetData to appear in code completion, we need a new exposed type that also overrides GetData. Does this make any sense?

That's a valid point - why GetData has EditorBrowsableState.Never is probably lost in the mysteries of time by now :) I think it's quite heavy to expose a whole NpgsqlNestedDataReader type publicly just so that its EditorBrowsableState can override this - I'd personally just leave it (Intellisense doesn't seem that important in this particular case). But if you feel strongly about it I'm fine with NpgsqlNestedDataReader too.

Yeah I thought so first too that just leave it, but when trying the feature out, it felt very strange that IntelliSense didn't list it. It's also good for documentation purposes so that users know that GetData is available, what it does and in what cases it works.

Could also just return ?column? like postgresql and leave GetOrdinal not supported. Let me know which decision we should take.
Another possibility is an empty string... Ideally we could continue throwing and fix BufferedDataReader to not require this, but if that's not feasible I'm OK with whatever here.

I was thinking of these options too, but am not sure what is the best way. I thought throwing a NonSupportedException was the most reasonable, until I noticed that BufferedDataReader requires GetName to work. Is this something you can take care of in the efcore code, to relax this? Not sure if it's possible, otherwise I think I'll switch to ?column? and throw for GetOrdinal. Note that in case GetData is used on a composite type that is known (and therefore has column names), do you think we should populate the column names from that info?

@roji

roji commented Apr 12, 2021

Copy link
Copy Markdown
Member

I thought throwing a NonSupportedException was the most reasonable, until I noticed that BufferedDataReader requires GetName to work. Is this something you can take care of in the efcore code, to relax this?

The main use seems to be here - when the user uses raw SQL queries (FromSql*), and hasn't specified the proper columns needed by EF to materialize the entity, this helps detect and report which columns are missing. There also doesn't seem to be a way in the DbDataReader API to ask whether names are available without calling GetName (and catching an exception). So maybe we should just leave it as-is and return empty string or ?column?...

Just to make sure I remember right... This concerns us because you plan to (recursively) wrap nested readers with BufferedDataReader, right?

Note that in case GetData is used on a composite type that is known (and therefore has column names), do you think we should populate the column names from that info?

Good point - yeah, that would be ideal if possible... It would be a full, high-perf API to unmapped composites. But if you run into trouble I don't think it's vital.

@roji

roji commented Apr 12, 2021

Copy link
Copy Markdown
Member

@Emill this looks ready to me - let me know when you're done and are OK with me merging!

@Emill

Emill commented Apr 12, 2021

Copy link
Copy Markdown
Contributor Author

Just to make sure I remember right... This concerns us because you plan to (recursively) wrap nested readers with BufferedDataReader, right?

Yes...

I'll change to ?column? and throw for GetOrdinal.
I'll also try to see if it's possible to use the loaded composite field names, if present.
After that, I think we're done :)

@Emill

Emill commented Apr 13, 2021

Copy link
Copy Markdown
Contributor Author

I'll change to ?column? and throw for GetOrdinal.
I'll also try to see if it's possible to use the loaded composite field names, if present.

Done!

@roji

roji commented Apr 14, 2021

Copy link
Copy Markdown
Member

Nice to see the composite support too... Very cool.

@roji roji merged commit cf31a28 into npgsql:main Apr 14, 2021
{
CheckClosedOrDisposed();

UniqueRowId++;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we do this only for sync Read?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No :/
My mistake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Read arbitrary trees of arrays of composite types, with specific type handlers for inner elements

3 participants