Skip to content

Conversation

@juchem
Copy link
Contributor

@juchem juchem commented Sep 4, 2024

mrb_hash_foreach, provides a foreach implementation on hashes using callbacks.

Although sufficient for ruby code, some C and C++ idioms can be made much simpler and easier with a good old for loop.

This pull request implements an iterator that allows user-controlled loops over hash elements in C code, as follows:

for (mrb_hash_iterator i = mrb_hash_iterator_new(h);
  mrb_hash_iterator_remaining(&i);
  mrb_hash_iterator_move_next(&i)
) {
  mrb_value key = mrb_hash_iterator_key(&i);
  mrb_value value = mrb_hash_iterator_value(&i);
  /* ... */
}

The code is comprised of struct mrb_hash_iterator, 4 accompanying functions and 1 macro with inline documentation.

MRB_API mrb_hash_iterator mrb_hash_iterator_new(struct RHash *h);
MRB_API mrb_bool mrb_hash_iterator_move_next(mrb_hash_iterator *it);
MRB_API mrb_value mrb_hash_iterator_key(mrb_hash_iterator *it);
MRB_API mrb_value mrb_hash_iterator_value(mrb_hash_iterator *it);
/* macro */ uint32_t mrb_hash_iterator_remaining(mrb_hash_iterator *it);

@juchem juchem requested a review from matz as a code owner September 4, 2024 00:54
@github-actions github-actions bot added the core label Sep 4, 2024
@juchem
Copy link
Contributor Author

juchem commented Sep 4, 2024

I've created a test under test/t/hash_iterator.c but I'm not sure what's the best way to ensure it runs as part of the test suite.

@juchem juchem force-pushed the mj-hash-iterator branch 3 times, most recently from 1566e3d to fe02ed7 Compare September 4, 2024 16:49
@matz
Copy link
Member

matz commented Sep 5, 2024

If the hash is modified in the loop, the code will crash. Is this fragility OK?

@juchem
Copy link
Contributor Author

juchem commented Sep 5, 2024

@matz isn't the same true for mrb_hash_foreach? I mostly copied the logic used in mrb_hash_foreach to implement this.

The quirk of invalidation upon modification is inherent to most enumerables.
Some libraries provide a iterator-safe version of modifying operations for this specific reason, like C++'s std::remove_if or std::vector::erase that implicitly does a move_next.

I wholeheartedly believe this to be a useful idiom, isomorphic to foreach and, therefore, inheriting the same caveats and limitations.

That being said, I respect your authority over mruby and understand it is your prerogative to whether accept or reject this proposal.

If there are specific cases that you believe the current implementation doesn't take into account, I'd be more than happy to implement tests for it and fix the implementation if necessary.

If your decision is to not include this at all, I can understand and accept it.

Any any case, thank you for the consideration offered so far.

@dearblue
Copy link
Contributor

dearblue commented Sep 6, 2024

I think mrb_hash_foreach() will raise an exception if there is a major change during the loop, because the h_check_modified() macro function is monitoring it.

In the Hash Iterator API, how about putting a prohibition in the documentation against making changes in the middle of the loop?

In my opinion, the API can be left as it is, but depending on the internal implementation, it may be possible to continue processing after the hash object has been modified without crashing.
However, skipped elements and reporting of the same elements are expected. Not immediately, but I would like to experiment a bit.

@dearblue
Copy link
Contributor

dearblue commented Sep 6, 2024

I don't think there is a documented way to write tests for the C API provided by mruby core.

I don't know if this will help you, but I'll give you two examples anyway:

@juchem
Copy link
Contributor Author

juchem commented Sep 6, 2024

@dearblue I appreciate the comprehensive comments. I'll dig some more in the references you provided and will get back here either with changes to the PR or further questions.

@juchem juchem force-pushed the mj-hash-iterator branch 2 times, most recently from a138536 to 27342b0 Compare September 6, 2024 20:39
src/hash.c Outdated
Comment on lines 1139 to 1140
if (0 < it->rem) {
while (entry_deleted_p(++it->entry));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it->rem is 1, then ++it->entry will cause a buffer overrun.
This may be the reason why all VisualC++ 32-bit versions by appveyor are failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into that. I may have to change the API slightly to make it safer. Will report back soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there are other ways, but as a point of reference, I made the following changes and valgrind no longer reports problems here.

diff --git a/src/hash.c b/src/hash.c
index eb049d0e7..12b5e0215 100644
--- a/src/hash.c
+++ b/src/hash.c
@@ -1136,10 +1136,9 @@ mrb_hash_iterator_new(struct RHash *h)
 MRB_API mrb_bool
 mrb_hash_iterator_move_next(mrb_hash_iterator *it)
 {
-  if (0 < it->rem) {
+  if (0 < it->rem && 0 < --it->rem) {
     while (entry_deleted_p(++it->entry));
-    --it->rem;
-    return it->rem > 0;
+    return TRUE;
   }

   return FALSE;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I'll apply and re-submit

Copy link
Contributor

@dearblue dearblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this before, but using mrb_gc_unregister() inside the RData::dfree function may cause use-after-free.

Therefore, instead of using the combination of mrb_gc_register() and mrb_gc_unregister() in RData objects, replacing it with mrb_iv_set() is required.

The scenario for when use-after-free occurs begins with a call to mrb_close().

  • From mrb_close(), mrb_gc_destroy() is called.
  • The array object used in mrb_gc_register() is destroyed.
  • Heap pages containing array objects used in mrb_gc_register() are released.
  • The RData::dfree function is called, and mrb_gc_unregister() is called.
  • Since the array object for the gc register does not already exist, any attempt to reference it will result in use-after-free.

static void
hash_iterator_free(mrb_state *mrb, void *p) {
struct hash_iterator *iterator = (struct hash_iterator *)p;
mrb_gc_unregister(mrb, iterator->hash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this mrb_gc_unregister().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dearblue shouldn't the corresponding mrb_gc_register also be removed to avoid leaks?
Otherwise the hash will always be part of the root objects list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the bigger question is: what's the best idiom to keep ruby objects alive when they're members of a C object. Is mrb_iv_set it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mrb_iv_set it?

Yes, that is the only way I remember.

mrb_data_init(self, iterator, &hash_iterator_type);

iterator->hash = mrb_get_arg1(mrb);
mrb_gc_register(mrb, iterator->hash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace this mrb_gc_register() with mrb_iv_set().
If the name of the instance variable starts with, say, a lower case letter, it can be referenced from the C side but cannot be referenced or manipulated from the Ruby side.

@juchem juchem force-pushed the mj-hash-iterator branch 4 times, most recently from 4600aab to 1d8e82a Compare September 10, 2024 15:14
@juchem
Copy link
Contributor Author

juchem commented Sep 10, 2024

@dearblue the 32 bit targets are still having issues, even after the changes above

Is there any way I can get more details on that run?

@dearblue
Copy link
Contributor

To see where the test is failing, run rake -v test.
Or directly as bin/mrbtest -v.

@dearblue
Copy link
Contributor

I found the cause.
The problem I tried to fix in #6338 was also occurring in this PR.
https://ci.appveyor.com/project/dearblue/mruby/builds/50581024/job/7gp0jj98uh01ja0n#L1533

I merged #6338 at my hand and tried it, and the problem no longer occurs.
https://ci.appveyor.com/project/dearblue/mruby/builds/50581058

Please wait until #6338 is accepted or another solution is found.

@juchem
Copy link
Contributor Author

juchem commented Sep 11, 2024

To see where the test is failing, run rake -v test. Or directly as bin/mrbtest -v.

I meant the appveyor CI tests. I don't have the same environment in my machine to repro.

@dearblue
Copy link
Contributor

Even with appveyor, I think the only way is to change the configuration file and run rake -v test.

@juchem
Copy link
Contributor Author

juchem commented Sep 11, 2024

I found the cause. The problem I tried to fix in #6338 was also occurring in this PR. https://ci.appveyor.com/project/dearblue/mruby/builds/50581024/job/7gp0jj98uh01ja0n#L1533

I merged #6338 at my hand and tried it, and the problem no longer occurs. https://ci.appveyor.com/project/dearblue/mruby/builds/50581058

Please wait until #6338 is accepted or another solution is found.

I've rebased on top of your PR's branch to re-run CI with the fix.

`mrb_hash_foreach`, provides a foreach implementation on hashes using
callbacks.

Although sufficient for ruby code, some C and C++ idioms can be made
much simpler and easier with a good old `for` loop.

This pull request implements an iterator that allows user-controlled
loops over hash elements in C code, as follows:
```
for (mrb_hash_iterator i = mrb_hash_iterator_new(h);
  mrb_hash_iterator_remaining(&i);
  mrb_hash_iterator_move_next(&i)
) {
  mrb_value key = mrb_hash_iterator_key(&i);
  mrb_value value = mrb_hash_iterator_value(&i);
  /* ... */
}
```

The code is comprised of `struct mrb_hash_iterator`, 4 accompanying
functions and 1 macro with inline documentation.
```
MRB_API mrb_hash_iterator mrb_hash_iterator_new(struct RHash *h);
MRB_API mrb_bool mrb_hash_iterator_move_next(mrb_hash_iterator *it);
MRB_API mrb_value mrb_hash_iterator_key(mrb_hash_iterator *it);
MRB_API mrb_value mrb_hash_iterator_value(mrb_hash_iterator *it);
/* macro */ uint32_t mrb_hash_iterator_remaining(mrb_hash_iterator *it);
```
@juchem
Copy link
Contributor Author

juchem commented Sep 13, 2024

rebased to origin/master since #6338 has been merged

dearblue added a commit to dearblue/mruby that referenced this pull request Oct 22, 2024
Calling `mrb_gc_unregistor()` from `mrb_data_type::dfree` caused a use-after-free deep inside `mrb_close()`.
The impetus to investigate was <mruby#6342 (review)>.

Currently, when `mrb_close()` is called, all objects are destroyed first.
The process is done heap page by heap page, and when all objects belonging to a heap page are destroyed, the heap page is released.
If the next heap page contains `RData` objects, the `mrb_gc_unregistor()` function may be called from the `mrb_data_type::dfree` function.
At this time, the `mrb_gc_unregistor()` function gets an array object from a Ruby global variable.
If the array object belongs to a freed heap page, use-after-free is established by referencing this array object.

About the fixes.

First of all, there is the fact that the `mrb_gv_get()` function returns `nil` if `mrb->globals` is `NULL`.
Therefore, before destroying all objects, free `mrb->globals` and set `mrb->globals` to `NULL` at the same time.
Now the `mrb_gv_get()` function will return `nil` to the calling `mrb_gc_unregistor()` function and `mrb_gc_unregistor()` will do nothing more.

ref. mruby#4618
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants