Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
59bfc5d
Introduce Compliance Full Node Scan message
Maddosaurus Oct 4, 2022
cd92c35
Improve formatting
Maddosaurus Oct 4, 2022
cf590d9
Remodel message to contain the correct set of fields
Maddosaurus Oct 5, 2022
ef6f4b7
Add missing newline
Maddosaurus Oct 5, 2022
9b63fa6
Add updated gen proto
Maddosaurus Oct 5, 2022
7d09adb
Address review: Reorganize
Maddosaurus Oct 10, 2022
c891b99
WIP: Introduce fake message channel, update Scanner lib dep
Maddosaurus Sep 29, 2022
62623dc
WIP: Build loop
Maddosaurus Sep 29, 2022
2c16306
WIP: First successful message
Maddosaurus Sep 29, 2022
64f315e
WIP: cleanup
Maddosaurus Sep 29, 2022
f4df5cb
WIP: prepare gens
Maddosaurus Oct 4, 2022
e165360
WIP: Prepare interface and testing
Maddosaurus Oct 5, 2022
664568f
WIP: Rename package
Maddosaurus Oct 5, 2022
a88a5e9
WIP: How to add service
Maddosaurus Oct 7, 2022
aa4e23a
WIP: Reroll to existing bidi stream
Maddosaurus Oct 10, 2022
5d2f17c
WIP: Back to reusing existing Sensor/Compliance bidi stream
Maddosaurus Oct 10, 2022
813c14c
Introduce configurable interval for node scanning
Maddosaurus Oct 12, 2022
8a28d1c
Improve log message and add error handling
Maddosaurus Oct 12, 2022
bc01dd7
Dev cleanup
Maddosaurus Oct 12, 2022
9da5ee6
Address style issues
Maddosaurus Oct 12, 2022
aaa5f09
Update message comment in NodeScan struct
Maddosaurus Oct 21, 2022
9f57e60
Move log message in compliance main
Maddosaurus Oct 21, 2022
2406223
Review comments
Maddosaurus Oct 21, 2022
f4959f0
Address review, part II
Maddosaurus Oct 21, 2022
69aee63
Extend fake message with more components
Maddosaurus Oct 21, 2022
e4028e2
Fix TODO command format so CI understands
Maddosaurus Oct 24, 2022
9dcac92
Move logger to fake nodescan implementation
Maddosaurus Oct 24, 2022
b2e0a56
Introduce duration check for Node Scan interval
Maddosaurus Oct 24, 2022
8c621b4
Rename NodeScanInterval setting env and fix linter warnings
Maddosaurus Oct 25, 2022
bc70585
Change to two separate channels
Maddosaurus Oct 25, 2022
7c7eb2e
WIP: Address comments
Maddosaurus Oct 26, 2022
940b0f6
Rework sending NodeScan messages from compliance to sensor
Maddosaurus Oct 26, 2022
cb9abb7
WIP: Changes by prygiels
Maddosaurus Oct 26, 2022
ed7e56d
Updated FakeNodeScan and slimmed message
Maddosaurus Oct 26, 2022
0a0095b
Set NodeScanInterval default value visibility to private
Maddosaurus Oct 26, 2022
624fb1a
Unexport nodeRescanInterval
vikin91 Oct 27, 2022
707ccf2
Remove duplication of initializeStream
vikin91 Oct 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 89 additions & 3 deletions compliance/collection/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,13 @@ import (
"github.com/cenkalti/backoff/v3"
"github.com/pkg/errors"
"github.com/stackrox/rox/compliance/collection/auditlog"
"github.com/stackrox/rox/compliance/collection/nodescanv2"
"github.com/stackrox/rox/generated/internalapi/sensor"
"github.com/stackrox/rox/generated/storage"
"github.com/stackrox/rox/pkg/clientconn"
"github.com/stackrox/rox/pkg/concurrency"
"github.com/stackrox/rox/pkg/env"
"github.com/stackrox/rox/pkg/features"
"github.com/stackrox/rox/pkg/k8sutil"
"github.com/stackrox/rox/pkg/logging"
"github.com/stackrox/rox/pkg/mtls"
Expand Down Expand Up @@ -80,7 +82,7 @@ func runRecv(ctx context.Context, client sensor.ComplianceService_CommunicateCli
}
}
default:
utils.Should(errors.Errorf("Unhandled msg type: %T", t))
_ = utils.Should(errors.Errorf("Unhandled msg type: %T", t))
}
}
}
Expand All @@ -105,7 +107,26 @@ func startAuditLogCollection(ctx context.Context, client sensor.ComplianceServic
return auditReader
}

func manageStream(ctx context.Context, cli sensor.ComplianceServiceClient, sig *concurrency.Signal) {
// manageSendingToSensor sends everything from sensorC channel to sensor
func manageSendingToSensor(ctx context.Context, cli sensor.ComplianceServiceClient, sensorC <-chan *sensor.MsgFromCompliance) {
for {
select {
case <-ctx.Done():
return
case sc := <-sensorC:
client, _, err := initializeStream(ctx, cli)
if err != nil && ctx.Err() == nil {
// error even after retries
log.Fatalf("unable to establish send stream to sensor: %v", err)
}
if err := client.Send(sc); err != nil {
log.Errorf("failed sending nodeScanV2 to sensor: %v", err)
}
Comment on lines +117 to +124
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read up about gRPC yesterday and got a bit more dangerous ;-) although still not very educated.

And I'm definitely here to spoil the party that started (even though I haven't used the opportunity to do that earlier). It seems this code will open a stream each time there's something to send. I think gRPC maintains a pool of connections on the client, plus streams get multiplexed over HTTP/2 (but only when that's used), so opening a stream doesn't mean establishing new connection, but I think it still has some overhead.

What is worse, I believe that we reuse the same rpc Communicate as the ordinary Compliance does. An effect of that might be that Sensor will push its messages over that stream too, but on Compliance side there's nothing receiving and acting on them for this stream. In case Sensor pushes a message to perform Compliance checks (message TriggerRun) through this stream, it will get lost.

From looking at Sensor code, it seems to me that more recent connection (actually stream) replaces previous in the map keyed by a node name. Therefore, I strongly suspect that what I described actually happens if the feature flag for node scans is turned on, i.e. compliance checks stop working properly.

I don't think it's an issue for now as the feature is gated but I think the logic should be reworked before we are ready to release.

WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good find. We should definitely improve this before releasing - it deserves a ticket.

Applying the connection factory with shared pointer to the stream (as seen in Sesor-Central) should fix the problem that you described, but I am not saying that this is the only viable solution to it. I will open a ticket.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought about this:

In case Sensor pushes a message to perform Compliance checks (message TriggerRun) through this stream, it will get lost.

we may close the stream immediately after sending the scan results to notify sensor that this connection should not be used anymore.

Copy link
Copy Markdown
Contributor

@msugakov msugakov Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may close the stream immediately after sending the scan results to notify sensor that this connection should not be used anymore.

I'm afraid the older active connection established by regular compliance logic may not get restored in Sensor's connection map. It deserves some investigation.

I think the problem is that currently Sensor first calls Compliance to do something (after the initial handshake) whereas in our new flow Compliance will first call Sensor. I don't think gRPC bidirectional streams allow receiving and sending at the same time (i.e. full duplex). As a peer in the stream you can either receive or send (or do nothing) but not receive and send concurrently. Therefore the existing stream can only be reused if Sensor initiates scans (going back to one of your proposals Piotr). We will need a different rpc if otherwise we want Compliance to initiate sending scan results (and Sensor ack or nack it).
This may be confusing to read, probably more efficient would be to explain with showing what I mean in code.

I think, your idea about Sensors initiating scans deserves to be reconsidered.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a peer in the stream you can either receive or send (or do nothing) but not receive and send concurrently.

This would be a limitation. I got the impression from Sensor<->Central that the sending and receiving works concurrently, but I didn't confirm this experimentally.

Having a separate rpc in the proto file would be also an option (maybe even a much cleaner one), but I do not think that this is necessary. The way we initiate the stream clients gives us possibility to send without depending on whether something was received (it is not in the final PR, but we had this variant shortly in the branch).

I am thinking how to design a manual test to confirm or deny our theories 🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created https://issues.redhat.com/browse/ROX-13383 to not forget about the issue we are discussing here.

Copy link
Copy Markdown
Contributor

@msugakov msugakov Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns I was wrong about no duplex support. Looking at Sensor code, one stream is used for both sending and receiving concurrently.

s.receiver.Start(stream, s.Stop, s.sender.Stop)
s.sender.Start(stream, s.Stop, s.receiver.Stop)

Copy link
Copy Markdown
Contributor Author

@Maddosaurus Maddosaurus Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case Sensor pushes a message to perform Compliance checks (message TriggerRun) through this stream, it will get lost. (@msugakov)

A single manual test is nothing to go by, but I was able to successfully run a compliance check with the FF turned on, so it might be the message isn't lost necessarily.
But we should definitely investigate this further! Thanks for creating the ticket @vikin91

}
}
}

func manageReceiveStream(ctx context.Context, cli sensor.ComplianceServiceClient, sig *concurrency.Signal) {
for {
select {
case <-ctx.Done():
Expand All @@ -127,6 +148,49 @@ func manageStream(ctx context.Context, cli sensor.ComplianceServiceClient, sig *
}
}

func manageNodeScanLoop(ctx context.Context, rescanInterval time.Duration, scanner nodescanv2.NodeScanner) <-chan *sensor.MsgFromCompliance {
sensorC := make(chan *sensor.MsgFromCompliance)
nodeName := getNode()
go func() {
defer close(sensorC)
t := time.NewTicker(rescanInterval)

// first scan should happen on start
msg, err := scanNode(nodeName, scanner)
if err != nil {
log.Errorf("error running scanNode: %v", err)
} else {
sensorC <- msg
}

for {
select {
case <-ctx.Done():
return
case <-t.C:
msg, err := scanNode(nodeName, scanner)
if err != nil {
log.Errorf("error running scanNode: %v", err)
} else {
sensorC <- msg
}
}
}
}()
return sensorC
}

func scanNode(nodeName string, scanner nodescanv2.NodeScanner) (*sensor.MsgFromCompliance, error) {
result, err := scanner.Scan(nodeName)
if err != nil {
return nil, err
}
return &sensor.MsgFromCompliance{
Node: nodeName,
Msg: &sensor.MsgFromCompliance_NodeScanV2{NodeScanV2: result},
}, nil
}

func initialClientAndConfig(ctx context.Context, cli sensor.ComplianceServiceClient) (sensor.ComplianceService_CommunicateClient, *sensor.MsgToCompliance_ScrapeConfig, error) {
client, err := cli.Communicate(ctx)
if err != nil {
Expand Down Expand Up @@ -201,7 +265,29 @@ func main() {
ctx = metadata.AppendToOutgoingContext(ctx, "rox-compliance-nodename", getNode())

stoppedSig := concurrency.NewSignal()
go manageStream(ctx, cli, &stoppedSig)

go manageReceiveStream(ctx, cli, &stoppedSig)

if features.RHCOSNodeScanning.Enabled() {
log.Infof("Node Rescan interval: %v", env.GetNodeRescanInterval())
sensorC := make(chan *sensor.MsgFromCompliance)
defer close(sensorC)
go manageSendingToSensor(ctx, cli, sensorC)

// TODO(ROX-12971): Replace with real scanner
scanner := nodescanv2.FakeNodeScanner{}
nodeScansC := manageNodeScanLoop(ctx, env.GetNodeRescanInterval(), &scanner)
// multiplex producers (nodeScansC) into the output channel (sensorC)
go func() {
for {
select {
case <-ctx.Done():
return
case sensorC <- <-nodeScansC:
}
}
}()
}

signalsC := make(chan os.Signal, 1)
signal.Notify(signalsC, syscall.SIGINT, syscall.SIGTERM)
Expand Down
51 changes: 51 additions & 0 deletions compliance/collection/nodescanv2/fake_nodescan.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
package nodescanv2

import (
timestamp "github.com/gogo/protobuf/types"
"github.com/stackrox/rox/generated/storage"
"github.com/stackrox/rox/pkg/logging"
scannerV1 "github.com/stackrox/scanner/generated/scanner/api/v1"
)

var (
log = logging.LoggerForModule()
)

// FakeNodeScanner can be used to send fake messages that would be emitted by NodeScanV2
type FakeNodeScanner struct {
}

// Scan returns a fake message in the same format a real NodeScanV2 would produce
func (f *FakeNodeScanner) Scan(nodeName string) (*storage.NodeScanV2, error) {
log.Infof("Generating fake scan result message...")
msg := &storage.NodeScanV2{
NodeId: "",
NodeName: nodeName,
ScanTime: timestamp.TimestampNow(),
Components: &scannerV1.Components{
Namespace: "Testme OS",
RhelComponents: []*scannerV1.RHELComponent{
{
Name: "vim-minimal",
Namespace: "rhel:8",
Version: "2:7.4.629-6.el8.x86_64",
Arch: "x86_64",
Module: "FakeMod",
Cpes: []string{"cpe:/a:redhat:enterprise_linux:8::baseos"},
AddedBy: "FakeLayer",
},
{
Name: "libsolv",
Namespace: "rhel:8",
Version: "0.7.7-1.el8.x86_64",
Arch: "x86_64",
Module: "FakeMod",
AddedBy: "FakeLayer",
},
},
LanguageComponents: nil,
},
Notes: nil,
}
return msg, nil
}
20 changes: 20 additions & 0 deletions compliance/collection/nodescanv2/nodescan.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
package nodescanv2

import (
"github.com/pkg/errors"
"github.com/stackrox/rox/generated/storage"
)

// NodeScanner defines an interface for V2 NodeScanning
type NodeScanner interface {
Scan(nodeName string) (*storage.NodeScanV2, error)
}

// NodeScan is the V2 NodeScanning implementation
type NodeScan struct {
}

// Scan scans the current node and returns the results as storage.NodeScanV2 object
func (n *NodeScan) Scan(nodeName string) (*storage.NodeScanV2, error) {
return nil, errors.New("Not implemented")
}
23 changes: 23 additions & 0 deletions compliance/collection/nodescanv2/nodescan_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
package nodescanv2

import (
"testing"

"github.com/stackrox/rox/generated/storage"
"github.com/stretchr/testify/suite"
)

func TestNodeScan(t *testing.T) {
suite.Run(t, &NodeScanSuite{})
}

type NodeScanSuite struct {
suite.Suite
}

func (n *NodeScanSuite) TestMessageFormat() {
fns, err := (&FakeNodeScanner{}).Scan("someNode")
n.Nil(err)
n.NotNil(fns)
n.IsType(&storage.NodeScanV2{}, fns)
}
18 changes: 18 additions & 0 deletions pkg/env/node_scan.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
package env

import "time"

var (
nodeRescanIntervalDefault = 4 * time.Hour
// nodeRescanInterval will set the duration for when to scan nodes for vulnerabilities (NodeScanV2)
nodeRescanInterval = registerDurationSetting("ROX_NODE_RESCAN_INTERVAL", nodeRescanIntervalDefault)
)

// GetNodeRescanInterval returns NodeRescanInterval if positive, otherwise returns the default
func GetNodeRescanInterval() time.Duration {
if nodeRescanInterval.DurationSetting() <= 0 {
log.Warnf("Negative or zero duration found. Setting to %s.", nodeRescanIntervalDefault.String())
return nodeRescanIntervalDefault
}
return nodeRescanInterval.DurationSetting()
}
2 changes: 1 addition & 1 deletion proto/internalapi/sensor/compliance_iservice.proto
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ message MsgToCompliance {
}
}

// A Sensor service that allows Compliance to report node scrapes results and audit events.
// A Sensor service that allows Compliance to report node scrapes results, audit events, and node scans v2.
service ComplianceService {
rpc Communicate (stream MsgFromCompliance) returns (stream MsgToCompliance);
}
2 changes: 2 additions & 0 deletions sensor/common/compliance/service_impl.go
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ func (s *serviceImpl) Communicate(server sensor.ComplianceService_CommunicateSer
case *sensor.MsgFromCompliance_AuditEvents:
s.auditEvents <- t.AuditEvents
s.auditLogCollectionManager.AuditMessagesChan() <- msg
case *sensor.MsgFromCompliance_NodeScanV2:
log.Infof("NodeScanV2 message received: %v", msg)
}
}
}
Expand Down