Skip to content

Conversation

@joewalsh
Copy link
Contributor

@joewalsh joewalsh commented Sep 9, 2020

One of the challenges we had in the past with event logging was that utilizing unstructured dictionaries made it easy to mistype a key or have the structure of the dictionary wrong. This proposal utilizes Swift's built in Codable protocol to allow events to be defined as structs so the types and structure are enforced. It also creates enums for Streams and Schemas so they're defined from a concrete set and can be re-used without the potential for mistyping.

This change removes the ability to log from Objective-C. Any place we currently log from Objective-C should be converted to Swift when updating to the MEP.

There were a few minor fixes made here as well:

  • Fixed a missing completion call in doPeriodicWork that was causing a background task to timeout
  • Fixed a potential race condition between isEmpty and popFirst - both checks could be combined into one

@bearloga
Copy link
Member

bearloga commented Sep 9, 2020

This is a major enough patch that we shouldn't rush through it, there are also fundamentally different ideas being introduced here that need to be deliberated on. I've added this to discussion for v2 (T261987).

* analytics-related schemas are collected.
*/
public enum Schema: String, Codable {
case editHistoryCompare = "/analytics/mobile_apps/ios_edit_history_compare/1.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit apprehensive about this.

The schema is a contract between the event producer and the event ingester. Without this, each piece of instrument declares explicitly which schema and which version of that schema it conforms to by including $schema in the event data.

When a schema is updated to have more fields or more allowed values in enum fields and the instrumentation is updated to match, I worry that having the schema+version centralized here, separate from the instrumentation, will make it too easy to forget to update. Of course, if the schema is declared explicitly in the same file as the instrumentation code it's also entirely possible to miss in that case too, but not as easy as if it's different locations.

Also, I really really hope that this isn't taken as me underestimating anybody!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! It would likely help with maintainability to have the schema and the event structure in the same place. I will push a patch that moves the schema definition inside the structure of the event so we can see how it feels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed this change and I agree it helps. Here's what an event looks like after the update:

    private struct Event: EventInterface {
        static let schema = "/analytics/mobile_apps/ios_edit_history_compare/1.0.0"
        let action: Action
        let primary_language: String
        let is_anon: Bool
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bearloga would we ever need to share schema identifiers across multiple types of event logs? That's the only potential problem I see with this approach, multiple string literals of the same identifier peppered throughout as we move forward. But I could definitely be misremembering how the backend works and streams vs. schemas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential option would be to keep the static definition in the struct but use EPC.Schema for the type instead of String:

private struct Event: EventInterface {
        static let schema: EPC.Schema = .editHistoryCompareV1
        let action: Action
        let primary_language: String
        let is_anon: Bool
    }

Copy link
Member

@bearloga bearloga Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick recap of MEP to clarify some of this stuff: In the legacy system stream=table=schema (although "streams" don't exist as a concept in the legacy system). In the modern system schemas and streams have a 1:many relationship. Multiple streams can declare the same schema, and that schema specifies what the incoming the data should look like and how the table in the database looks like. Each stream gets its own table.

When you produce an event to the stream there's 3 conditions which are checked: (1) that the stream exists in the stream config, (2) that the title of the schema in the stream config matches the schema declared by the event, and (3) that the event data conforms to the schema it claims to be conforming to.

I can imagine having a more generic schema that is re-used by multiple instruments that send data to different streams. For example, we can generalize ios_edit_history_compare schema by making the action field a free-form string instead of an enum like it currently is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for the recap! Given that we might re-use schemas, I pushed the change that restores the central definition of schemas, but keeps the per-event static schema. This way we could re-use them but would also have them declared on the event struct so it's easier to remember it should be updated when changing the event's structure.

Comment on lines +132 to +138
private struct StreamConfiguration: Codable {
let sampling: Sampling?
struct Sampling: Codable {
let rate: Double?
let identifier: String?
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen when deserializing to this if a stream's config included additional fields besides sampling?

Let's say the returned stream configurations suddenly look like this:

{
  "streams": {
    "ios.edit_history_compare": {},
    "ios.daily_checkin": {
      "tags": { "anonymous": true }
    },
    "ios.some_old_stream": {
      "tags": { "active": false }
    }
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your example, the decoding would succeed, tags would be ignored and sampling would be nil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! :D

…ined in the same place as the structure of the Event
Copy link
Collaborator

@tonisevener tonisevener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing the POSTs fail on this branch with 400 in Charles, though the completion block doesn't return an error so nothing is printed to the log. I haven't figured out what is causing the failure yet.

@joewalsh
Copy link
Contributor Author

joewalsh commented Sep 10, 2020

I'm seeing the POSTs fail on this branch with 400 in Charles, though the completion block doesn't return an error so nothing is printed to the log. I haven't figured out what is causing the failure yet.

@tonisevener thanks for finding this! Should be fixed now.


if !inSample(stream: stream) {
/// Private, synchronous version of `submit`.
private func _submit<E: EventInterface>(stream: Stream, event: E, domain: String? = nil, date: Date? = nil){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be more of a comment with the original PR and less with the refactor, but is there any need anymore for the date parameter? I don't see how it could be used, even if events are queued in the inputBuffer first and posted after the config fetches (which I think is how it was originally going to be used).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the original PR was that it's for providing a custom client_dt but if we don't plan to utilize that, it could be removed.

Copy link
Member

@bearloga bearloga Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client_dt needs to be the timestamp of when the event was originally submitted. The original PR checks for its presence in the event data and if it's not already there (meaning the event is being submitted for the first time and not re-submitted after stream config is available) it sets it. I'm not entirely sure where this is done in this PR.

@tonisevener
Copy link
Collaborator

From my perspective this with the original library can be merged into main. But first we can confirm how other folks feel about this later today.

@bearloga
Copy link
Member

From the meeting: We decided to move forward with this PR and merge the original PR (once this is merged into that)

@joewalsh joewalsh merged commit 0299b52 into event-platform-client Sep 14, 2020
@joewalsh joewalsh deleted the codable_epc branch September 14, 2020 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants