-
Notifications
You must be signed in to change notification settings - Fork 825
Implementing hash iterator. #6342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
0a01327 to
a7a9455
Compare
|
I've created a test under |
1566e3d to
fe02ed7
Compare
|
If the hash is modified in the loop, the code will crash. Is this fragility OK? |
|
@matz isn't the same true for The quirk of invalidation upon modification is inherent to most enumerables. I wholeheartedly believe this to be a useful idiom, isomorphic to That being said, I respect your authority over If there are specific cases that you believe the current implementation doesn't take into account, I'd be more than happy to implement tests for it and fix the implementation if necessary. If your decision is to not include this at all, I can understand and accept it. Any any case, thank you for the consideration offered so far. |
|
I think In the Hash Iterator API, how about putting a prohibition in the documentation against making changes in the middle of the loop? In my opinion, the API can be left as it is, but depending on the internal implementation, it may be possible to continue processing after the hash object has been modified without crashing. |
|
I don't think there is a documented way to write tests for the C API provided by mruby core. I don't know if this will help you, but I'll give you two examples anyway:
|
|
@dearblue I appreciate the comprehensive comments. I'll dig some more in the references you provided and will get back here either with changes to the PR or further questions. |
a138536 to
27342b0
Compare
src/hash.c
Outdated
| if (0 < it->rem) { | ||
| while (entry_deleted_p(++it->entry)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it->rem is 1, then ++it->entry will cause a buffer overrun.
This may be the reason why all VisualC++ 32-bit versions by appveyor are failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into that. I may have to change the API slightly to make it safer. Will report back soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there are other ways, but as a point of reference, I made the following changes and valgrind no longer reports problems here.
diff --git a/src/hash.c b/src/hash.c
index eb049d0e7..12b5e0215 100644
--- a/src/hash.c
+++ b/src/hash.c
@@ -1136,10 +1136,9 @@ mrb_hash_iterator_new(struct RHash *h)
MRB_API mrb_bool
mrb_hash_iterator_move_next(mrb_hash_iterator *it)
{
- if (0 < it->rem) {
+ if (0 < it->rem && 0 < --it->rem) {
while (entry_deleted_p(++it->entry));
- --it->rem;
- return it->rem > 0;
+ return TRUE;
}
return FALSE;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, I'll apply and re-submit
dearblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know this before, but using mrb_gc_unregister() inside the RData::dfree function may cause use-after-free.
Therefore, instead of using the combination of mrb_gc_register() and mrb_gc_unregister() in RData objects, replacing it with mrb_iv_set() is required.
The scenario for when use-after-free occurs begins with a call to mrb_close().
- From
mrb_close(),mrb_gc_destroy()is called. - The array object used in
mrb_gc_register()is destroyed. - Heap pages containing array objects used in
mrb_gc_register()are released. - The RData::dfree function is called, and
mrb_gc_unregister()is called. - Since the array object for the gc register does not already exist, any attempt to reference it will result in use-after-free.
| static void | ||
| hash_iterator_free(mrb_state *mrb, void *p) { | ||
| struct hash_iterator *iterator = (struct hash_iterator *)p; | ||
| mrb_gc_unregister(mrb, iterator->hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this mrb_gc_unregister().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dearblue shouldn't the corresponding mrb_gc_register also be removed to avoid leaks?
Otherwise the hash will always be part of the root objects list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the bigger question is: what's the best idiom to keep ruby objects alive when they're members of a C object. Is mrb_iv_set it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
mrb_iv_setit?
Yes, that is the only way I remember.
| mrb_data_init(self, iterator, &hash_iterator_type); | ||
|
|
||
| iterator->hash = mrb_get_arg1(mrb); | ||
| mrb_gc_register(mrb, iterator->hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace this mrb_gc_register() with mrb_iv_set().
If the name of the instance variable starts with, say, a lower case letter, it can be referenced from the C side but cannot be referenced or manipulated from the Ruby side.
4600aab to
1d8e82a
Compare
|
@dearblue the 32 bit targets are still having issues, even after the changes above Is there any way I can get more details on that run? |
|
To see where the test is failing, run |
|
I found the cause. I merged #6338 at my hand and tried it, and the problem no longer occurs. Please wait until #6338 is accepted or another solution is found. |
I meant the appveyor CI tests. I don't have the same environment in my machine to repro. |
|
Even with appveyor, I think the only way is to change the configuration file and run |
1d8e82a to
071317d
Compare
I've rebased on top of your PR's branch to re-run CI with the fix. |
`mrb_hash_foreach`, provides a foreach implementation on hashes using
callbacks.
Although sufficient for ruby code, some C and C++ idioms can be made
much simpler and easier with a good old `for` loop.
This pull request implements an iterator that allows user-controlled
loops over hash elements in C code, as follows:
```
for (mrb_hash_iterator i = mrb_hash_iterator_new(h);
mrb_hash_iterator_remaining(&i);
mrb_hash_iterator_move_next(&i)
) {
mrb_value key = mrb_hash_iterator_key(&i);
mrb_value value = mrb_hash_iterator_value(&i);
/* ... */
}
```
The code is comprised of `struct mrb_hash_iterator`, 4 accompanying
functions and 1 macro with inline documentation.
```
MRB_API mrb_hash_iterator mrb_hash_iterator_new(struct RHash *h);
MRB_API mrb_bool mrb_hash_iterator_move_next(mrb_hash_iterator *it);
MRB_API mrb_value mrb_hash_iterator_key(mrb_hash_iterator *it);
MRB_API mrb_value mrb_hash_iterator_value(mrb_hash_iterator *it);
/* macro */ uint32_t mrb_hash_iterator_remaining(mrb_hash_iterator *it);
```
071317d to
a2a2349
Compare
|
rebased to |
Calling `mrb_gc_unregistor()` from `mrb_data_type::dfree` caused a use-after-free deep inside `mrb_close()`. The impetus to investigate was <mruby#6342 (review)>. Currently, when `mrb_close()` is called, all objects are destroyed first. The process is done heap page by heap page, and when all objects belonging to a heap page are destroyed, the heap page is released. If the next heap page contains `RData` objects, the `mrb_gc_unregistor()` function may be called from the `mrb_data_type::dfree` function. At this time, the `mrb_gc_unregistor()` function gets an array object from a Ruby global variable. If the array object belongs to a freed heap page, use-after-free is established by referencing this array object. About the fixes. First of all, there is the fact that the `mrb_gv_get()` function returns `nil` if `mrb->globals` is `NULL`. Therefore, before destroying all objects, free `mrb->globals` and set `mrb->globals` to `NULL` at the same time. Now the `mrb_gv_get()` function will return `nil` to the calling `mrb_gc_unregistor()` function and `mrb_gc_unregistor()` will do nothing more. ref. mruby#4618
mrb_hash_foreach, provides a foreach implementation on hashes using callbacks.Although sufficient for ruby code, some C and C++ idioms can be made much simpler and easier with a good old
forloop.This pull request implements an iterator that allows user-controlled loops over hash elements in C code, as follows:
The code is comprised of
struct mrb_hash_iterator, 4 accompanying functions and 1 macro with inline documentation.