1

I am writing my own strtok function. How do I make it so that it will return the remaining string as an output parameter?

Here is what I made so far.

char *mystringtokenize(char **string, char delimiter) {
    static char *str = NULL;
    int stringIndex = 0;
    if (string != NULL) { //check if string is NULL, if its not null set str to string
        str = string;
    }
    if (string == NULL) { //return NULL if string is empty
        return NULL;
    }
    do { //traverse through string
        if (!str[stringIndex]) { //if str at string index is null character, stop while loop
            break;
        }
        stringIndex++;//index through string
    } while (str[stringIndex] != delimiter);
    
    str[stringIndex] = '\0'; //cut the string
    char *lastToken = str; //set last token to the cut off part
    
    return lastToken;
}

When I call it in main, and try to pass in the file that needs to be tokenized, I get a bad exception error.

int main(int argc, char const *argv[])
{
    FILE *inputStream = fopen("FitBitData.csv", "r");
    int index = 0;
    int fitbitindex = 0;
    char testline[100];
    char minute[10] = "0:00:00";
    FitbitData fitBitUser[1446];
    if (inputStream != NULL) {
        while (fgets(testline, sizeof(testline), inputStream) != NULL) {
            strcpy(fitBitUser[fitbitindex].patient, mystringtokenize(testline, ','));
            strcpy(fitBitUser[fitbitindex].minute, mystringtokenize(NULL, ','));
            printf("%s %s\n", fitBitUser[fitbitindex].patient, fitBitUser[fitbitindex].minute);
            printf("%s\n", fitBitUser[fitbitindex].patient);
            fitbitindex++;
        }
    }

    return 0;
}

For example, when I have a line Hello, World and tokenize it. It will return Hello. But If I call it again mystringtokenize(NULL, ','), it returns a bad exception error.

2
  • 1
    Always read compiler messages. The compiler should issue a message for this code snippet if(string != NULL){//check if string is NULL, if its not null set str to string str = string; Commented Jan 22, 2024 at 4:53
  • 2
    My advice, emulate strtok exactly first. Then modify your implementation to add functionality. Commented Jan 22, 2024 at 4:58

2 Answers 2

1

There are multiple problems in your code:

  • the argument string should have type char *, not char **.

  • stringIndex should have type size_t.

  • the initial tests on str and string are incorrect and lead your code to always return NULL if the argument is null (this explains your segmentation fault). You should instead use the static state str if the argument is null:

      if (string == NULL)
          string = str;
    
  • the loop to find the delimiter is confusing too. You should use strchr or write a simple while loop:

      while (string[stringIndex] && string[stringIndex] != delimiter)
          stringIndex++;
    
  • at the end of this loop, if you found a delimiter, store string + stringIndex + 1 to str for the next call, otherwise store NULL to str so the next call returns NULL.

Here is a modified version:

char *mystringtokenize(char *str, char delimiter) {
    static char *state = NULL;
    
    if (str == NULL) {
        str = state;
        if (str == NULL)
            return NULL;
    }
    for (size_t i = 0; str[i] != '\0'; i++) {
        if (str[i] == delimiter) {
            str[i] = '\0';
            state = str + i + 1;
            return str;
        }
    }
    state = NULL;
    return str;
}

Note that this function does not implement the semantics of strtok:

  • it takes a single delimiter character instead of a string of delimiters.
  • it can return empty tokens: ",," will produce 3 empty tokens whereas strtok would return none.

Yet it still has the major flaw in strtok, ie: the hidden static state variable that prevents nested use of the function as well as makes it non thread safe. You should accept an extra argument of type char ** so this state variable can be provided by the caller as in strtok_r.

Here is a modified version using an external state and strchr:

char *mystringtokenize(char *str, char delimiter, char **state) {
    if (str == NULL && (str = *state) == NULL)
        return NULL;
    }
    char *p = strchr(str, delimiter);
    if (p) {
        *p++ = '\0';
    }
    *state = p;
    return str;
}
Sign up to request clarification or add additional context in comments.

Comments

0

There are several problems with your function. The first one is that the first parameter shall have type char * instead of char **.

char *mystringtokenize(char *string, char delimiter);

If the first parameter is a null pointer then the function always returns NULL:

if(string == NULL){//return NULL if string is empty
    return NULL;
}

So for example such a call of the function

mystringtokenize(NULL,',')

does not extract a string.

The pointer str with static storage duration does not keep the last position of the extracted string between function calls.

The passed string can start from the delimiter. In this case your function returns an empty string.

Pay attention to that the function fgets can append the new line character '\n' to entered string. To remove the new line character '\n' you can write after a call of fgets

testline[strcspn( testline, "\n" )] = '\0';

If you are going to make your function similar to the standard C string function strtok then it will be useful to read the description of the function.

From the C17 Standard (7.24.5.8 The strtok function)

3 The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.

and

4 The strtok function then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer. If such a character is found, it is overwritten by a null character, which terminates the current token. The strtok function saves a pointer to the following character, from which the next search for a token will start.

That is in particularly all characters that represent delimiters are skipped until a non-delimiter character is encountered. Otherwise the function returns a null pointer. It means that for example if the passed string is ",," and the delimiter is ',' then the function in its first call shall return a null pointer. Or if the original string is ",,,Hello,,," then in the first call the function shall return substring "Hello" and in a second call it shall return a null pointer.

It is important to note that the function shall never return an empty string.

Thus your function can be implemented for example the following way as shown in the demonstration program below.

#include <stdio.h>

char * mystringtokenize( char *s, char delimiter )
{
    static char *p = NULL;

    char *substr = NULL;

    if (s != NULL) p = s;

    if ( p != NULL )
    {
        while (*p == delimiter) ++p;

        if (*p == '\0')
        {
            p = NULL;
        }
        else
        {
            char *current = p;
            while (*current && *current != delimiter) ++current;

            substr = p;
            p = current;

            if (*p != '\0')
            {
                *p++ = '\0';
            }
        }
    }

    return substr;
}

int main( void )
{
    char s[] = ",,,Hello,,, World!,,,";

    for (char *p = mystringtokenize( s, ',' ); p != NULL; p = mystringtokenize( NULL, ',' ))
    {
        puts( p );
    }

}

The program output is

Hello
 World! 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.