Add Structured Audit Logging#891
Add Structured Audit Logging#891woop merged 1 commit intofeast-dev:masterfrom mrzzy:feast-audit-logs
Conversation
There was a problem hiding this comment.
@woop Does the current E2E auth tests verify that there are no false positives for authentication? (ie should fail to authenticate due to wrong credentials. )
There was a problem hiding this comment.
#892 These tests will fail your authentication if you try to sign in with a JWT that cannot be verified.
There was a problem hiding this comment.
No longer relevant as this was deemed out of scope for this PR.
|
/retest |
There was a problem hiding this comment.
Shouldn't the format be a part of the logger configuration xml instead of a part of the code base?
There was a problem hiding this comment.
The idea was to expose the ability to configure the logger with the most common use cases via application.yml without interacting with the logging configuration XML as it inaccessible from our Helm and Docker-Compose setup.
There was a problem hiding this comment.
As mentioned in the response above, I think we can use the application.yml for configuring logging without writing our own logformat configuration.
There was a problem hiding this comment.
I cant make out this comment. What does the direction represent?
There was a problem hiding this comment.
Direction represents the flow of messages relative to the service:
- incoming - matches requests messages
- outgoing - matches response messages.
Users can use this to configure for which request "direction" message audit logging is enabled. (ie leave in only incoming to enable request logging)
Will change this to request and response as that makes it easier to understand.
There was a problem hiding this comment.
Does this work for you? I have been pulling the identity from claims.
There was a problem hiding this comment.
Is this supposed to apply to GetOnlineFeatures as well, because in that case we will probably experience a serious performance hit. Audit logging is normally distinct from request/response logging, since with the former you expect durability and consistency but with the latter you want availability and performance. Logging the complete request/response is still useful as part of the audit log, but we need to have a way to deal with the latency sensitive case.
There was a problem hiding this comment.
The AuditLogger can be configured via application.yml via AuditLogProperties and thus can be disabled for the Online Serving instance from application.yml for cases where latency is more important that visibility.
There was a problem hiding this comment.
Yep, but we need request/response logging in online serving as well. We just want to make sure that this logging happens in async fashion and doesn't affect latency of the method
There was a problem hiding this comment.
My initial reaction when seeing this code is that it seems quite overwhelming. A lot of decisions have been made here in terms of the structure of our audit logging and I am not sure if we are being too ambitious here. I am quite worried that these abstractions will end up changing quite soon once people start trying to use the audit logger. I was hoping we could start with a simpler key/value API.
Did you follow some kind of convention when structuring the audit logger and its associated types, or was everything custom made?
There was a problem hiding this comment.
Did you follow some kind of convention when structuring the audit logger and its associated types, or was everything custom made?
No, Yes.
I was hoping we could start with a simpler key/value API.
Reasons that we should go with a Key Value API.
- Infinitely more flexible and extensible as no constraints are applied.
Reasons that we should not go with a Key Value API:
- Java is statically typed. Going with a generic Key Value would mean
Map<String, Object>which implies lots of ugly casting. - With no enforced structure in place it will be difficult to parse the logs in upstream systems as there is no expected structure.
- Enforces some rigidity to logging so that we don't break upstream systems unknowing built upon audit logging.
The structured log will be depended on by upstream logging systems and should be treated as a user facing API that should be clearly defined, the same way Core and Serving Service API are clearly defined.
I am quite worried that these abstractions will end up changing quite soon once people start trying to use the audit logger
The important thing is that we are fully aware of changes instead of being blind sighted by a map.put()
There was a problem hiding this comment.
My suggestion is not to throw away all static typing and just accept arbitrary string/object keys and values. Parts of what you have implemented makes sense, but other parts do not.
From a functional perspective what we want is for specific parts of the Feast code base to create a message that can be serialized and printed/sent to the audit log. so having static methods like ofMessage(MessageLogEvent.Kind kind, String method, Message message, String identity) makes sense.
However, I question the value of this LogEvent as a container. It seems like a catchall class that provides little in terms of a contract. When would you have both a MessageLogEvent and ActionLogEvent in the same object? It doesnt seem like that should be possible and so its not clear what the scope of this LogEvent class should be. Will it just grow infinitely as a catchall class that can contain any subclass?
If we agree that maintaining state is a bad thing, then do we need to maintain LogEvents as an object? What is the SLF4J contract that we need to meet. Does it just expect strings?
What prevents us from having an AuditLogger class with static methods like ofMessage that converts a tight contract of objects into a Map or string that can be serialized?
There was a problem hiding this comment.
Refactored away LogEvent into MessageAuditLogEntry, ActionAuditLogEntry, TransitionAuditLogEntry subclasses.
There was a problem hiding this comment.
What is this testing? Wouldnt toString() always return something?
There was a problem hiding this comment.
Yeah, I guess I was just trying to exercise the code path. Would remove.
There was a problem hiding this comment.
This code looks much better with the exception handling in the interceptor, but I don't think its in scope for this PR is it?
There was a problem hiding this comment.
Yeah, moving out of this PR.
There was a problem hiding this comment.
Originally this used the @Builder constructor from lombok and had fields duplicated across all 4 Job Task implementations. I move the fields up to JobTask abstract class however the @builder annotation has problems with inherited fields, hence the added constructor.
There was a problem hiding this comment.
It bothers me a bit that we are pulling in logging dependencies directly into our code. Shouldn't this be abstracted away?
There was a problem hiding this comment.
Is it possible for us to keep things simpler by using SLF4J for logging instead of having our own AuditLogger? Example: https://github.com/yidongnan/grpc-spring-boot-starter/blob/master/examples/local-grpc-server/src/main/java/net/devh/boot/grpc/examples/local/server/LogGrpcInterceptor.java#L34
I can see some value in having a way to standardize the message format through static methods, especially if we need to have actions/resources/identities, but having to pass around an audit logger when SLF4J is already available seems like an unnecessary overhead to me.
There was a problem hiding this comment.
Is it possible for us to keep things simpler by using SLF4J for logging instead of having our own AuditLogger? Example: https://github.com/yidongnan/grpc-spring-boot-starter/blob/master/examples/local-grpc-server/src/main/java/net/devh/boot/grpc/examples/local/server/LogGrpcInterceptor.java#L34
Why we have our own Audit Logger:
- Allow us to configure the audit logger from
application.ymlwhich is valuable as:- Allows us to configure the logger via
application.yml(ie disable request logging for Online Serving without hard coding something.) - We can inject Feast specific elements into the Audit Log (ie Feast Component, Feast Version, or in the future git hash.)
- Allows us to configure the logger via
There was a problem hiding this comment.
Allows us to configure the logger via application.yml (ie disable request logging for Online Serving without hard coding something.)
What have we implemented that can't be implemented using the normal logging configuration https://howtodoinjava.com/spring-boot2/logging/configure-logging-application-yml/
Also, it seems like your answer focuses on configuration but the AuditLogger is more than just a configuration holder right? We can have custom configuration in the FeastProperties for enabling/disabling request/response logging if needed, but that is a separate question to whether we need to maintain the AuditLogger and pass it around.
We can inject Feast specific elements into the Audit Log
Since we have an application context just floating around inside of Feast, we can easily grab these settings at any time without needing to pass around an AuditLogger right? I am just trying to make sure we arent reinventing something that already exists.
There was a problem hiding this comment.
Updated AuditLogger's logging methods to static to remove the need of passing around the AuditLogger instance.
serving/pom.xml
Outdated
There was a problem hiding this comment.
Do we really need to set this? Shouldnt this be set in the parent-pom?
There was a problem hiding this comment.
This was set as the interceptors do not work properly under the lognet grpc starter.
There was a problem hiding this comment.
Removed version part to be set in the parent POM.
|
I think it is unlikely that we will get this PR merged in any time soon in its current state. It's trying to do too much.
The MVP we need for audit logging is not much different from what we have in our existing AuditLogger. We need a way to log incoming requests to track user actions. Tracking internal state changes of jobs are not necessary yet, and even in that case I think it can be tracked at the entity level. I would recommend that you try and separate these into different PRs. The first and most important PR implements a logging interceptor that logs authenticated user requests and the incoming payload using a standardized audit log format (hopefully with SLF4J). This might serve as inspiration https://cloud.google.com/logging/docs/reference/audit/auditlog/rest/Shared.Types/AuditLog |
This already exists with the current Next steps:
|
|
/test test-end-to-end-batch |
1 similar comment
|
/test test-end-to-end-batch |
|
/test test-end-to-end-batch-dataflow |
There was a problem hiding this comment.
Can we call this something more descriptive? enableMessageLogging?
|
/test test-end-to-end |
|
/test test-end-to-end-redis-cluster |
|
/test test-end-to-end-batch-dataflow |
1 similar comment
|
/test test-end-to-end-batch-dataflow |
There was a problem hiding this comment.
Still not sure why we need to persist with FeastInstance.
|
/test test-end-to-end-batch-dataflow |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrzzy, woop The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test test-end-to-end-batch-dataflow |
1 similar comment
|
/test test-end-to-end-batch-dataflow |
|
New changes are detected. LGTM label has been removed. |
Why we need this PR
Feast behavior and usage be opaque due to a lack of consistent logging of:
Logging in Feast is also inconsistent making it difficult to parsed by third party logging systems.
What this PR does:
Adds Audit Logging to Feast Core and Feast Serving:
AuditLoggerthat exposes structured logging methodslogMessage(), logAction()etc.AuditLoggerdisable/configurable from Core/Serving'sapplication.ymlAuditLoggerare structured JSON and machine parsable.AuditLogEntryand subclasses that define the structure of each log entries and provides JSON conversion.GrpcMessageInterceptorto intercept incoming/request or outgoing/response and log them to the Audit Log for both Core and Serving Services.Refactor/Tech debt cleanups:
JobServiceperform actions on jobs viaJobTaskinstead of directly withJobManagerto be consistent withJobCoordinatorServiceWhich issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: