-
-
Notifications
You must be signed in to change notification settings - Fork 831
Update Event Platform Client library to utilize Codable and concrete types where possible, fix minor issues #3682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is a major enough patch that we shouldn't rush through it, there are also fundamentally different ideas being introduced here that need to be deliberated on. I've added this to discussion for v2 (T261987). |
…a-ios into codable_epc
…ad, fix timestamp format
WMF Framework/EventStreams.swift
Outdated
| * analytics-related schemas are collected. | ||
| */ | ||
| public enum Schema: String, Codable { | ||
| case editHistoryCompare = "/analytics/mobile_apps/ios_edit_history_compare/1.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit apprehensive about this.
The schema is a contract between the event producer and the event ingester. Without this, each piece of instrument declares explicitly which schema and which version of that schema it conforms to by including $schema in the event data.
When a schema is updated to have more fields or more allowed values in enum fields and the instrumentation is updated to match, I worry that having the schema+version centralized here, separate from the instrumentation, will make it too easy to forget to update. Of course, if the schema is declared explicitly in the same file as the instrumentation code it's also entirely possible to miss in that case too, but not as easy as if it's different locations.
Also, I really really hope that this isn't taken as me underestimating anybody!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! It would likely help with maintainability to have the schema and the event structure in the same place. I will push a patch that moves the schema definition inside the structure of the event so we can see how it feels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed this change and I agree it helps. Here's what an event looks like after the update:
private struct Event: EventInterface {
static let schema = "/analytics/mobile_apps/ios_edit_history_compare/1.0.0"
let action: Action
let primary_language: String
let is_anon: Bool
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bearloga would we ever need to share schema identifiers across multiple types of event logs? That's the only potential problem I see with this approach, multiple string literals of the same identifier peppered throughout as we move forward. But I could definitely be misremembering how the backend works and streams vs. schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another potential option would be to keep the static definition in the struct but use EPC.Schema for the type instead of String:
private struct Event: EventInterface {
static let schema: EPC.Schema = .editHistoryCompareV1
let action: Action
let primary_language: String
let is_anon: Bool
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick recap of MEP to clarify some of this stuff: In the legacy system stream=table=schema (although "streams" don't exist as a concept in the legacy system). In the modern system schemas and streams have a 1:many relationship. Multiple streams can declare the same schema, and that schema specifies what the incoming the data should look like and how the table in the database looks like. Each stream gets its own table.
When you produce an event to the stream there's 3 conditions which are checked: (1) that the stream exists in the stream config, (2) that the title of the schema in the stream config matches the schema declared by the event, and (3) that the event data conforms to the schema it claims to be conforming to.
I can imagine having a more generic schema that is re-used by multiple instruments that send data to different streams. For example, we can generalize ios_edit_history_compare schema by making the action field a free-form string instead of an enum like it currently is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks for the recap! Given that we might re-use schemas, I pushed the change that restores the central definition of schemas, but keeps the per-event static schema. This way we could re-use them but would also have them declared on the event struct so it's easier to remember it should be updated when changing the event's structure.
| private struct StreamConfiguration: Codable { | ||
| let sampling: Sampling? | ||
| struct Sampling: Codable { | ||
| let rate: Double? | ||
| let identifier: String? | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen when deserializing to this if a stream's config included additional fields besides sampling?
Let's say the returned stream configurations suddenly look like this:
{
"streams": {
"ios.edit_history_compare": {},
"ios.daily_checkin": {
"tags": { "anonymous": true }
},
"ios.some_old_stream": {
"tags": { "active": false }
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your example, the decoding would succeed, tags would be ignored and sampling would be nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! :D
…ined in the same place as the structure of the Event
tonisevener
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing the POSTs fail on this branch with 400 in Charles, though the completion block doesn't return an error so nothing is printed to the log. I haven't figured out what is causing the failure yet.
@tonisevener thanks for finding this! Should be fixed now. |
… conforming to the schema
|
|
||
| if !inSample(stream: stream) { | ||
| /// Private, synchronous version of `submit`. | ||
| private func _submit<E: EventInterface>(stream: Stream, event: E, domain: String? = nil, date: Date? = nil){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be more of a comment with the original PR and less with the refactor, but is there any need anymore for the date parameter? I don't see how it could be used, even if events are queued in the inputBuffer first and posted after the config fetches (which I think is how it was originally going to be used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the original PR was that it's for providing a custom client_dt but if we don't plan to utilize that, it could be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client_dt needs to be the timestamp of when the event was originally submitted. The original PR checks for its presence in the event data and if it's not already there (meaning the event is being submitted for the first time and not re-submitted after stream config is available) it sets it. I'm not entirely sure where this is done in this PR.
|
From my perspective this with the original library can be merged into |
|
From the meeting: We decided to move forward with this PR and merge the original PR (once this is merged into that) |
One of the challenges we had in the past with event logging was that utilizing unstructured dictionaries made it easy to mistype a key or have the structure of the dictionary wrong. This proposal utilizes Swift's built in
Codableprotocol to allow events to be defined as structs so the types and structure are enforced. It also creates enums forStreams andSchemas so they're defined from a concrete set and can be re-used without the potential for mistyping.This change removes the ability to log from Objective-C. Any place we currently log from Objective-C should be converted to Swift when updating to the MEP.
There were a few minor fixes made here as well:
doPeriodicWorkthat was causing a background task to timeoutisEmptyandpopFirst- both checks could be combined into one