3

I was trying to get all words (sequence of non whitespace characters) out of a file. In trying to do so, I accidently created an infinite loop, because at the end of the file, no more word is extracted, but the stream is not exhausted yet. Note I realized, that just using std::views::istream<std::string>(file_stream) would have solved my problems, but I am interested, in the why.

My code: compiler used: Clang 18.1.0 with flags: -std=c++23 -stdlib=libc++

#include <cctype>
#include <format>
#include <iostream>
#include <ranges>
#include <sstream>
#include <string>
#include <vector>

constexpr auto is_white_space = [](char ch) constexpr {
    return std::isspace(static_cast<unsigned char>(ch));
};

struct word_extractor {
    std::string word;

    friend std::istream &operator>>(std::istream &s, word_extractor &we) {
        std::string buff = std::ranges::subrange(std::istreambuf_iterator{s},
                                                 std::istreambuf_iterator<char>{}) 
                         | std::views::drop_while(is_white_space)
                         | std::views::take_while([](auto x) { 
                               return !is_white_space(x); 
                           })
                         | std::ranges::to<std::string>();

        //if (s.peek() == EOF) s.get(); // uncommenting this code makes it work
        we.word = buff;
        return s;
    }
};

int main() {
    std::istringstream file_stream("lorem ipsum dolor sit amet ");

    auto parsed_words = std::views::istream<word_extractor>(file_stream)
                      | std::views::transform([](const word_extractor &word_extractor)
                        {
                            return word_extractor.word;
                        })
                      | std::ranges::to<std::vector<std::string>>();

    for (auto w : parsed_words) {
        std::cout << std::format("{{{}}}\n", w);
    }
}

output with if (s.peek() == EOF) s.get():

{Lorem}
{ipsum}
{dolor}
{sit}
{amet}

no output without if (s.peek() == EOF) s.get(), due to infinite loop.

Without the commented line of manually consuming EOF, the code gets stuck in an infinite loop, as std::views::istream<word_extractor>(file_stream) tries to call operator>> forever. Why is the stream not exhausted, as I first consume all white space characters and then all non white space ones?

Question: Is there a way to make this kind of extraction work with c++ ranges or is the (ugly) manual check for EOF needed?

8
  • 1
    Please try to provide a minimal reproducible example. Commented Feb 7, 2024 at 16:01
  • Please provide input file and expected result. Also compiler and its version. Commented Feb 7, 2024 at 16:08
  • Note also that defining function in place where friendship is declared can lead to strange errors. Move definition of this function outside of class/struct. Commented Feb 7, 2024 at 16:09
  • 2
    @MarekR That's a very typical way of implementing an extraction operator, I'm not sure what strange errors you mean but that's definitely unrelated to the issue here. Commented Feb 7, 2024 at 16:10
  • "uncommenting this code makes it work" - So, you expect everything but the last word to be extracted? Commented Feb 7, 2024 at 16:37

2 Answers 2

3

streambuf iterators iterate over the streambuf (even if constructed from the stream as a convenience), they won't touch the state of the stream. Without the commented-out code, nothing in operator>> is setting failbit to inform istream_view that an extraction failed and iteration should stop.

Sign up to request clarification or add additional context in comments.

Comments

2

As pointed out by T.C., streambuf iterators will not change the state of the stream. To still use a range pipeline, another views::istream<char> can be used instead of ranges::subrange():

struct word_extractor {
    std::string word;

    friend std::istream &operator>>(std::istream &s, word_extractor &we) {
        s >> std::noskipws;                                // don't skip ws
        std::string buff = std::views::istream<char>(s) |  // read single char from stream
                           std::views::drop_while(is_white_space) |
                           std::views::take_while([](auto x) { return !is_white_space(x); }) |
                           std::ranges::to<std::string>();
        we.word = buff;
        return s;
    }
};

Note however, the white-spaces are still needed for separation, thus s >> std::noskipws is done to prevent white-spaces from being skipped, by the operator>> applied to char.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.