-
Notifications
You must be signed in to change notification settings - Fork 4.9k
test: fix legacy acceptance tests #69152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/destination-postgres
Are you sure you want to change the base?
test: fix legacy acceptance tests #69152
Conversation
when doing toString we don't output seconds, which we want to include
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Helpful Resources
PR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
|
| constructor(timestamp: String) : this(OffsetDateTime.parse(timestamp)) | ||
| override fun compareTo(other: TimestampWithTimezoneValue): Int = value.compareTo(other.value) | ||
| @JsonValue fun toJson() = value.toString() | ||
| @JsonValue fun toJson() = value.format(DateTimeFormatter.ISO_OFFSET_DATE_TIME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise the string we output doesn't have seconds. Older versions of the connector do contain seconds.
| return false | ||
| } | ||
|
|
||
| protected open fun disableRawTableComparison(): Boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default scenario is that there is a final table and there isn't a raw table.
| @ValueSource(longs = [0L, 42L]) | ||
| @Throws(Exception::class) | ||
| fun incrementalDedup(inputGenerationId: Long) { | ||
| open fun incrementalDedup(inputGenerationId: Long) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that I can disable it
| String finalTableName = getStreamNamespace() + "." + Names.toAlphanumericAndUnderscore(getStreamName()); | ||
| getDatabase().execute("CREATE VIEW " + getStreamNamespace() + ".v1 AS SELECT * FROM " + rawTableName); | ||
| if (!disableRawTableComparison()) { | ||
| getDatabase().execute("CREATE VIEW " + getRawSchema() + ".v1 AS SELECT * FROM " + rawTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that I changed the schema. I think this is the right thing to do since before this was only working because the final table schema existed. Now we can't assume that since there might not exist a final table.
| @Disabled @ParameterizedTest @ValueSource(longs = []) override fun testIncrementalSyncDropOneColumn(inputGenerationId: Long) {} | ||
|
|
||
| // this test assumes that the fully qualified raw table name is lowercased. | ||
| // This was only a restriction in older versions of the connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know why we had this restriction though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes me nervous - is this saying that where previously we did create table "airbyte_internal"."blah_raw__stream_the_table", we're now doing airbyte_internal."blah_raw__stream_The_Table"? (assuming the stream name is The_Table)
wouldn't this require users to rewrite their SQL queries to target the mixed-case table?
(these stats are about a year old, but it looks like around 1/3 of cloud destination-postgres users have T+D disabled today, so that's pretty significant impact)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you read that right. This is in the current docs:
Airbyte Postgres destination will create raw tables and schemas using the Unquoted identifiers by replacing any special characters with an underscore. All final tables and their corresponding columns are created using Quoted identifiers preserving the case sensitivity. Special characters in final tables are replaced with underscores.
So in the current state of the new connector this would be changed to something like
For both final and raw tables, and their corresponding columns are created using Quoted identifiers preserving the case sensitivity.
So you are right, this will cause issues to existing queries. It sounds like we want to avoid this, I'll change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I might need some help here. I thought this commit was going to make it so that both raw tables name and namespace were always lowercased, but the change didn't really do anything. It seems that still the raw table name and namespaces are created using the PostgresFinalTableNameGenerator. Any thoughts on what I could be missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I think this destination is doing the same thing as snowflake (where the "final table name generator" is also used for the raw tables) - you probably should just edit this thing
Lines 48 to 52 in 3a89ba0
| TypingDedupingUtil.concatenateRawTableName( | |
| streamDescriptor.namespace ?: config.schema, | |
| streamDescriptor.name | |
| ) | |
| .toPostgresCompatibleName() |
(also - the condition in the name generator is wrong. Compare snowflake's implementation - it should be if (!config.legacyRawTablesOnly), not if (config.internalTableSchema.isNullOrBlank()))
| // This was only a restriction in older versions of the connector. | ||
| @Disabled @Test override fun testMixedCasedSchema() {} | ||
|
|
||
| // disabling dedup tests since dedup not supported when setting `disable_type_dedupe` to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that even though running dedup when disable_type_dedupe is set to true doesn’t make a ton of sense, it is a config that wouldn’t error out in older versions of the connector, but this will error out on the new version of the connector.
|
|
||
| // older versions of the connector used to error out in this scenario. This is not | ||
| // true for newer versions of the connector | ||
| @Disabled @Test override fun interruptedTruncateWithPriorData() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
truth be told, I don't 100% understand what this test is doing, I just know it used to fail and doesn't fail anymore. I'm assuming this is the behavior we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is covered by new CDK tests
(specifically: the old CDK threw an error if we reached the end of stdin without seeing a "stream status" message; the new CDK only throws an error if we get an explicit INCOMPLETE stream status. In practice, platform always sends an explicit complete/incomplete status, so there's no behavior change at runtime)
| @@ -1,4 +1,4 @@ | |||
| {"_airbyte_extracted_at": "1970-01-01T00:00:01.000000Z", "_airbyte_data": {"id1": 1, "id2": 200, "old_cursor": 0, "_ab_cdc_deleted_at": null, "name" :"Alice", "address": {"city": "San Francisco", "state": "CA"}}, "_airbyte_meta": {"changes":[],"sync_id":42}, "_airbyte_generation_id": 43} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null values are not present on the raw table anymore. This is built into the bulk cdk
| private val postgresColumnUtils: PostgresColumnUtils, | ||
| private val postgresConfiguration: PostgresConfiguration) { | ||
|
|
||
| private val dropTableSuffix: String = if (postgresConfiguration.dropCascade == true) "CASCADE" else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropCascade is part of the connector config. I think we just forgot to add support for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! We should probably eventually port the relevant test out of the legacy test suite, but not a blocker for this PR
(having the option to drop cascade is something that users really want, so we definitely should preserve it)
| jsonObject.replace(key, value.toJson()) | ||
| } | ||
| val jsonData = jsonObject.serializeToString() | ||
| val jsonData = Jsons.writeValueAsString(filteredRecord) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but maybe @edgao should take a look as he has more context.
06be7bc to
7be35f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question on the naming thing - sounds kind of similar to the snowflake uppercasing fire drill, but maybe I'm misunderstanding. Otherwise lgtm!
| @Disabled @ParameterizedTest @ValueSource(longs = []) override fun testIncrementalSyncDropOneColumn(inputGenerationId: Long) {} | ||
|
|
||
| // this test assumes that the fully qualified raw table name is lowercased. | ||
| // This was only a restriction in older versions of the connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes me nervous - is this saying that where previously we did create table "airbyte_internal"."blah_raw__stream_the_table", we're now doing airbyte_internal."blah_raw__stream_The_Table"? (assuming the stream name is The_Table)
wouldn't this require users to rewrite their SQL queries to target the mixed-case table?
(these stats are about a year old, but it looks like around 1/3 of cloud destination-postgres users have T+D disabled today, so that's pretty significant impact)
|
|
||
| // older versions of the connector used to error out in this scenario. This is not | ||
| // true for newer versions of the connector | ||
| @Disabled @Test override fun interruptedTruncateWithPriorData() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is covered by new CDK tests
(specifically: the old CDK threw an error if we reached the end of stdin without seeing a "stream status" message; the new CDK only throws an error if we get an explicit INCOMPLETE stream status. In practice, platform always sends an explicit complete/incomplete status, so there's no behavior change at runtime)
| private val postgresColumnUtils: PostgresColumnUtils, | ||
| private val postgresConfiguration: PostgresConfiguration) { | ||
|
|
||
| private val dropTableSuffix: String = if (postgresConfiguration.dropCascade == true) "CASCADE" else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! We should probably eventually port the relevant test out of the legacy test suite, but not a blocker for this PR
(having the option to drop cascade is something that users really want, so we definitely should preserve it)
What
Legacy Acceptance Tests were not working, so this PR contains a few different changes to make these tests green. Changes are either:
...etc
There's a more thorough description of each of the issues addressed in this PR in this doc