Skip to content

Conversation

@byroot
Copy link
Member

@byroot byroot commented Dec 20, 2025

There are numerous ruby tools that need to recursively scan the project directory, such as Zeitwerk, rubocop, etc.

All of them end up listing childs of a directory then for each child emit a stat call to check if it's a directory or not.

This is common enough for a pattern that on most operating systems, struct dirent include a dtype member that allows to check the file type without issuing a any extra system calls.

By yielding that type, we can make these routines twice as fast.

$ hyperfine './miniruby --disable-all --yjit ../test.rb' 'OPT=1 ./miniruby --disable-all --yjit ../test.rb'
Benchmark 1: ./miniruby --disable-all --yjit ../test.rb
  Time (mean ± σ):      1.428 s ±  0.062 s    [User: 0.342 s, System: 1.070 s]
  Range (min … max):    1.396 s …  1.601 s    10 runs

Benchmark 2: OPT=1 ./miniruby --disable-all --yjit ../test.rb
  Time (mean ± σ):     673.8 ms ±   5.8 ms    [User: 146.0 ms, System: 527.3 ms]
  Range (min … max):   659.7 ms … 679.6 ms    10 runs

Summary
  OPT=1 ./miniruby --disable-all --yjit ../test.rb ran
    2.12 ± 0.09 times faster than ./miniruby --disable-all --yjit ../test.rb
if ENV['OPT']
  def count_ruby_files
    count = 0
    queue = [File.expand_path(__dir__)]
    while dir = queue.pop
      Dir.each_child(dir) do |name, type|
        next if name.start_with?(".")

        case type
        when :directory
          queue << File.join(dir, name)
        when :file
          count += 1 if name.end_with?(".rb")
        end
      end
    end
    count
  end
else
  def count_ruby_files
    count = 0
    queue = [File.expand_path(__dir__)]
    while dir = queue.pop
      Dir.each_child(dir) do |name|
        next if name.start_with?(".")

        abspath = File.join(dir, name)
        if File.directory?(abspath)
          queue << abspath
        else
          count += 1 if name.end_with?(".rb")
        end
      end
    end
    count
  end
end

10.times do
  count_ruby_files
end

@byroot byroot force-pushed the expose-dtype branch 2 times, most recently from a0d6da6 to 3a0eb14 Compare December 20, 2025 13:02
@nobu
Copy link
Member

nobu commented Dec 20, 2025

I don't think that the type is only preferable, why not File::Stat objects?
And just calling rb_yield_values may be enough?
I don't think dir.each {|*child| ...} is common code.

@launchable-app

This comment has been minimized.

@byroot
Copy link
Member Author

byroot commented Dec 20, 2025

I don't think that the type is only preferable, why not File::Stat objects?

That was my initial thought, but as far as I know it's not possible to build a complete File::Stat from a struct dirent. We could introduce a Dir::Entry object though, and expose more than just the type.

@nobu what do you think? This is a quick prototype to show if it's worth it, but I do plan to open a feature request.

There are numerous ruby tools that need to recursively scan
the project directory, such as Zeitwerk, rubocop, etc.

All of them end up listing childs of a directory then for each child
emit a `stat` call to check if it's a directory or not.

This is common enough for a pattern that on most operating
systems, `struct dirent` include a `dtype` member that allows to
check the file type without issuing a any extra system calls.

By yielding that type, we can make these routines twice as fast.

```
$ hyperfine './miniruby --disable-all --yjit ../test.rb' 'OPT=1 ./miniruby --disable-all --yjit ../test.rb'
Benchmark 1: ./miniruby --disable-all --yjit ../test.rb
  Time (mean ± σ):      1.428 s ±  0.062 s    [User: 0.342 s, System: 1.070 s]
  Range (min … max):    1.396 s …  1.601 s    10 runs

Benchmark 2: OPT=1 ./miniruby --disable-all --yjit ../test.rb
  Time (mean ± σ):     673.8 ms ±   5.8 ms    [User: 146.0 ms, System: 527.3 ms]
  Range (min … max):   659.7 ms … 679.6 ms    10 runs

Summary
  OPT=1 ./miniruby --disable-all --yjit ../test.rb ran
    2.12 ± 0.09 times faster than ./miniruby --disable-all --yjit ../test.rb
```

```ruby
if ENV['OPT']
  def count_ruby_files
    count = 0
    queue = [File.expand_path(__dir__)]
    while dir = queue.pop
      Dir.each_child(dir) do |name, type|
        next if name.start_with?(".")

        case type
        when :directory
          queue << File.join(dir, name)
        when :file
          count += 1 if name.end_with?(".rb")
        end
      end
    end
    count
  end
else
  def count_ruby_files
    count = 0
    queue = [File.expand_path(__dir__)]
    while dir = queue.pop
      Dir.each_child(dir) do |name|
        next if name.start_with?(".")

        abspath = File.join(dir, name)
        if File.directory?(abspath)
          queue << abspath
        else
          count += 1 if name.end_with?(".rb")
        end
      end
    end
    count
  end
end

10.times do
  count_ruby_files
end
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants