Skip to content

Conversation

@jsarha
Copy link
Contributor

@jsarha jsarha commented Jun 19, 2023

In some error situations the configuration init_data may be NULL, and in such a situations we should fail gracefully and not crash.

Copy link
Contributor

@btian1 btian1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another question is how to make sure cfg is not possible to be NULL?

@aiChaoSONG
Copy link
Collaborator

another question is how to make sure cfg is not possible to be NULL?

see module_adapter_new()

@kv2019i
Copy link
Collaborator

kv2019i commented Jun 20, 2023

Looks ready to go, but we need a clean run for the mandatory CI checks.

@lyakh
Copy link
Collaborator

lyakh commented Jun 20, 2023

what logs do we have and how did we identify this?

Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, so far I don't see how this can happen

@cujomalainey
Copy link
Contributor

sorry, so far I don't see how this can happen

init data is only initialized if the topology has a non 0 payload. Therefore it can be null. See https://github.com/thesofproject/sof/blob/main/src/audio/module_adapter/module_adapter.c#L124

@cujomalainey
Copy link
Contributor

sorry, so far I don't see how this can happen

init data is only initialized if the topology has a non 0 payload. Therefore it can be null. See https://github.com/thesofproject/sof/blob/main/src/audio/module_adapter/module_adapter.c#L124

Also for context its a fuzzer find

@lyakh
Copy link
Collaborator

lyakh commented Jun 22, 2023

sorry, so far I don't see how this can happen

init data is only initialized if the topology has a non 0 payload. Therefore it can be null. See https://github.com/thesofproject/sof/blob/main/src/audio/module_adapter/module_adapter.c#L124

@cujomalainey but size cannot be 0 for SRC. As far as I understand (cannot easily check IPC3 SRC flow now, so, just reading the code) it goes like this:

ipc_comp_new(); // this one is called with an SRC component from IPC, i.e. from topology, so type is set to SRC
comp_new(); // same
comp_common_builder();
	config->type = comp->type;
module_adapter_new();
	switch (config->type) {
	case SOF_COMP_SRC:
		size = sizeof(*ipc_src);

so, I still don't understand, sorry

@andyross
Copy link
Contributor

FWIW, I think this might be a special case of the issue fixed in #7854

That would also produce crashes on the module_adapter initialization path, just like this. The root cause was (what I think is) a protocol goof. There was a case (unexercised by the host) where a pointer field in the initialization data block for a new component creation would be filled in from external data, which is clearly not OK. In the failure I was looking at it wasn't NULL but garbage, but that's going to depend on the specifics of the fuzzer input.

@lyakh
Copy link
Collaborator

lyakh commented Jun 26, 2023

FWIW, I think this might be a special case of the issue fixed in #7854

That would also produce crashes on the module_adapter initialization path, just like this. The root cause was (what I think is) a protocol goof. There was a case (unexercised by the host) where a pointer field in the initialization data block for a new component creation would be filled in from external data, which is clearly not OK. In the failure I was looking at it wasn't NULL but garbage, but that's going to depend on the specifics of the fuzzer input.

@andyross I think your fix in #7854 is wrong too

@jsarha
Copy link
Contributor Author

jsarha commented Jun 27, 2023

@lyakh , if I have understood properly the purpose of the fuzzer tests, they trying to break the FW on purpose by feeding it with broken input. I am not sure we can easily find all possible call paths from these broken messages, but still we should make all we can not to crash on them. I am still saying that a single NULL check is not a big price to pay for security. I can make one more attempt to understand how NULL ended up to SRC init_data. But if I fail, would you like the change better if the check would be an assert?

@lyakh
Copy link
Collaborator

lyakh commented Jun 28, 2023

@lyakh , if I have understood properly the purpose of the fuzzer tests, they trying to break the FW on purpose by feeding it with broken input. I am not sure we can easily find all possible call paths from these broken messages, but still we should make all we can not to crash on them. I am still saying that a single NULL check is not a big price to pay for security. I can make one more attempt to understand how NULL ended up to SRC init_data. But if I fail, would you like the change better if the check would be an assert?

@jsarha that's also my understanding of how fuzzers work, but - where do they feed that random input into? This probably depends on the project, but I think you only feed that random input to software exposed interfaces. E.g. on network servers you send "random" packets to them, on filesystems you feed random corrupted storage data, in our case it would be random IPC contents. Fuzzing does not slit into your software and feed into some random internal interface. So, I still think we can very well understand where data comes from and what data can appear at this point. And further, I still haven't seen any evidence of why we think that this "fix" helps - why do we think that we really get to this point with a NULL pointer. So:

  1. we don't see how this can happen
  2. we have no evidence that this is happening

the above makes me strongly believe that this isn't needed.

@btian1
Copy link
Contributor

btian1 commented Jun 29, 2023

I double checked module_adapter_new, cfg->init_data was specially designed and it must not empty and always valid before enter module_init, and after module_init, this was set to NULL again, so @jsarha , Guennadi is correct, no need this extra check, it is superfluous,no need.

@jsarha
Copy link
Contributor Author

jsarha commented Jun 29, 2023

So the root cause is not yet found. I'll mark this PR as draft for the time being.

@jsarha jsarha marked this pull request as draft June 29, 2023 12:25
@cujomalainey
Copy link
Contributor

cujomalainey commented Jun 29, 2023

I think I found the root cause.

tldr: Its a left over issue from the enum system to identify components that the fuzzer is exploiting by setting the UUID to SRC but the component type to something else.

Long explanation:
IPC3 contains two main identifiers: UUIDs (current) and enums (legacy). We do not check the enum if we use the UUID

Blind copy from IPC into config of the toxic enum.

As a result we are probably hitting the default case in the module_adapter_init where the IPC specifies the size which the fuzzer is stating is 0.

Recommended possible solutions:

  • get_drv to return nothing if type and UUID don't match
  • overwrite type on a UUID match
  • fix the separation of concerns issue and remove the type check from the module adapter and into their respective components with checks

@cujomalainey cujomalainey self-requested a review June 29, 2023 22:44
@cujomalainey
Copy link
Contributor

cujomalainey commented Jun 30, 2023

Turns out I was slightly off in my analysis but on the right track. We do indeed do a enum type check in get_drv

BUT with the module adapter that comp type is not set to COMP_SRC anymore so this switch case is actually completely broken.

@cujomalainey
Copy link
Contributor

@ranj063 might need some help here, looks like we are in a sticky wicket with regards to a security issue in IPC3 for the module adapter transition

@cujomalainey
Copy link
Contributor

Aha and it was hidden in a sub window in OSS fuzz but volume is hitting this bug as well as expected

@lyakh
Copy link
Collaborator

lyakh commented Jul 3, 2023

Turns out I was slightly off in my analysis but on the right track. We do indeed do a enum type check in get_drv

BUT with the module adapter that comp type is not set to COMP_SRC anymore so this switch case is actually completely broken.

@cujomalainey great, you're right, now we know how this happens! I think with the Linux driver we're saved by this: https://github.com/thesofproject/linux/blob/34b7f81747f8dbb82a39d46b1234b157d4a7e38c/sound/soc/sof/ipc3-topology.c#L607 which of course isn't done with fuzzing. Then .type can be anything while UUID still lets us select a driver. So, yes, one way of fixing this would be to also check the type when selecting the driver, another way would be to extend comp_specific_builder() to return an error code when no valid type is recognised. We also can call those configuration builders before getting the driver. Either of these ways would provide a generic fix for this problem, not specific to SRC.

@jsarha
Copy link
Contributor Author

jsarha commented Jul 5, 2023

There is now a generic fix: #7891

@jsarha jsarha closed this Jul 5, 2023
@ranj063
Copy link
Collaborator

ranj063 commented Jul 10, 2023

@ranj063 might need some help here, looks like we are in a sticky wicket with regards to a security issue in IPC3 for the module adapter transition

@cujomalainey @jsarha sorry I;m a bit late to the party but I think the real problem is we're masking the erroneous init_data because of this check in module_adapter_new()

If you look into module_load_config() , we call out the error when size if 0. So if we remove this check in module_adapter_new(), we'd fail gracefully

@lyakh
Copy link
Collaborator

lyakh commented Jul 10, 2023

So if we remove this check in module_adapter_new(), we'd fail gracefully

@ranj063 does that mean, that 0 configuration size is invalid in general by definition for all modules? Shouldn't this be up to individual modules to decide? Cannot it be valid for some modules?

@jsarha jsarha reopened this Jul 10, 2023
@jsarha
Copy link
Contributor Author

jsarha commented Jul 10, 2023

Let's go back to module specific fix since there is no simple way to make a generic fix.

@jsarha jsarha marked this pull request as ready for review July 10, 2023 09:13
@ranj063
Copy link
Collaborator

ranj063 commented Jul 10, 2023

So if we remove this check in module_adapter_new(), we'd fail gracefully

@ranj063 does that mean, that 0 configuration size is invalid in general by definition for all modules? Shouldn't this be up to individual modules to decide? Cannot it be valid for some modules?

@lyakh for IPC3 the init_data points to the entire IPC payload and I dont think we can create a module with a 0 sized payload at all right?

@cujomalainey
Copy link
Contributor

So if we remove this check in module_adapter_new(), we'd fail gracefully

@ranj063 does that mean, that 0 configuration size is invalid in general by definition for all modules? Shouldn't this be up to individual modules to decide? Cannot it be valid for some modules?

@lyakh for IPC3 the init_data points to the entire IPC payload and I dont think we can create a module with a 0 sized payload at all right?

If this is true then indeed the simple fix is to abort 0 sized ops. That being said it still leaves a hole for the kernel to give the wrong size and cause an out of bounds access unless the module checks the size.

@lyakh
Copy link
Collaborator

lyakh commented Jul 11, 2023

So if we remove this check in module_adapter_new(), we'd fail gracefully

@ranj063 does that mean, that 0 configuration size is invalid in general by definition for all modules? Shouldn't this be up to individual modules to decide? Cannot it be valid for some modules?

@lyakh for IPC3 the init_data points to the entire IPC payload and I dont think we can create a module with a 0 sized payload at all right?

@ranj063 I think the .size field in

size = ipc_module_adapter->size;
originates from
uint32_t size; /**< size of bespoke data section in bytes */
and I think the "size of bespoke data section in bytes" comment refers to the size of the data[] array, so, it can be 0?

@jsarha
Copy link
Contributor Author

jsarha commented Jul 11, 2023

sorry, so far I don't see how this can happen

@lyakh Now we know how this happens, so can I dismiss this comment? How about merging?

@lyakh
Copy link
Collaborator

lyakh commented Jul 12, 2023

As a result we are probably hitting the default case in the module_adapter_init where the IPC specifies the size which the fuzzer is stating is 0.

Recommended possible solutions:

* `get_drv` to return nothing if type and UUID don't match

@jsarha following analysis from @cujomalainey I think the above proposal is almost good - we cannot match the type for module-adapter drivers - the driver type is SOF_COMP_MODULE_ADAPTER (is that used? Maybe we could change that to use specific types?) but the type in topology is the actual component type (e.g. SOF_COMP_SRC). But I still think we should check that in comp_specific_builder() and return an error when the type is unrecognised. And also swap comp_specific_builder() and comp_common_builder().

@lyakh
Copy link
Collaborator

lyakh commented Jul 12, 2023

Note, #7854 is now handling the same issue...

@jsarha
Copy link
Contributor Author

jsarha commented Jul 13, 2023

@jsarha following analysis from @cujomalainey I think the above proposal is almost good - we cannot match the type for module-adapter drivers - the driver type is SOF_COMP_MODULE_ADAPTER (is that used? Maybe we could change that to use specific types?) but the type in topology is the actual component type (e.g. SOF_COMP_SRC). But I still think we should check that in comp_specific_builder() and return an error when the type is unrecognised. And also swap comp_specific_builder() and comp_common_builder().

Topology does not help us here. FW itself doesn't now anything about the topology, it only receives messages about it, but here we are trying to make the FW robust enough to not to crash on any malformed message. So the comp type must be set in the FW source. Should be doable thou.

In some error situations the configuration init_data may be NULL, and
in such a situations we should fail gracefully and not crash. Also adds
check that the IPC message is of correct size and of correct type.

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants