Improve experience for users of the log aggregation feature

## Description

We had some issues in the past with customers running into limitations with the size of our logging volumes causing Pods to be killed because `Usage of EmptyDir volume "log" exceeds the limit`.
This issue is about fixing the underlying issue and documenting the behavior and how users can modify settings themselves if needed.

## Value

We want this feature so that Pods do not crash arbitrarily under normal circumstances especially when the load is high and lots of logs are produced. This will lead to fewer support cases for us.
         
## Dependencies

- I (@lfrancke) am unsure about the exact dependencies but I assume this will require changes to the documentation as well as a change in operator-rs.
- Depending on the exact implementation it might also require changes to each operator

## Tasks

```[tasklist]
- [x] Agree on a new default size for log volumes
- [x] Investigate whether we want to lower the [`checkIncrement`](https://logback.qos.ch/manual/appenders.html#checkIncrement) setting (and equivalent for other logging implementations)
- [x] Potentially implement the changes from the investigation
- [x] Document what customers need to change if they still run into problems, clearly containing the error message
- [ ] https://github.com/stackabletech/operator-rs/pull/853
- [ ] https://github.com/stackabletech/nifi-operator/pull/671
```

## Acceptance Criteria

- Users can search our documentation for the above mentioned error and will find a page/section telling them where it's coming from and how to solve it
- NiFi (and other tools) will fail less under load due to log volume size restrictions

## (Information Security) Risk Assessment
      
I can not identify any additional significant risks this would introduce.
Should we increase the default volume size for logs it'd require more resources from customers, risking resource constraints.

### Quality

- This should be tested by getting one of our tools to emit a lot of log statements in a short amount of time (this could be done by e.g. increasing the log level of anything to TRACE) making sure it doesn't fail under these "normal" high load conditions
  - As this was originally reported for NiFi that'd be a good candidate

## Release Notes

Apache NiFi: The ephemeral `EmptyDir` Volumes used to store log files before being aggregated have their size increased from a default of 33 MiB to 500 MiB. Additionally the interval in which Logback checks if the maximum log file size has been reached was lowered from 60 seconds to 5 seconds.
Previously NiFi log files would become larger than the `log` Volume size between file size checks resulting in a `Usage of EmptyDir volume "log" exceeds the limit` error and the NiFi Pod being evicted.
This change will not prevent this from ever happening again but it'll decrease the likelihood.
We have also [documented](https://docs.stackable.tech/home/nightly/nifi/troubleshooting/) this behavior and how to adjust the size of the volumes further if needed.

##  Remarks

See our internal Slack discussion for more details: https://stackable-workspace.slack.com/archives/C031NP72H7T/p1710514724269639


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve experience for users of the log aggregation feature #574

Description

Value

Dependencies

Tasks

Acceptance Criteria

(Information Security) Risk Assessment

Quality

Release Notes

Remarks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve experience for users of the log aggregation feature #574

Description

Description

Value

Dependencies

Tasks

Acceptance Criteria

(Information Security) Risk Assessment

Quality

Release Notes

Remarks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions