0

I just started learning C 3 days ago, and I am trying to make a program that takes in a file, reads the characters, and deletes all spaces and tabs from it. I am a beginning programmer who has only learned a little bit of MATLAB prior to starting to learn C. I am using Ubuntu 12.04.

Here is my code:

#include <stdio.h>
int main ()
{
    FILE * pFile;
    FILE * pFile2;
    int c;
    pFile = fopen ("spaces.txt","r");
    pFile2 = fopen ("nospaces.txt","w");

    if (pFile==NULL) perror ("Error opening file");
    else
    {
       while (c != EOF)
       {
           c = fgetc (pFile);
           if (!(c == ' ' || c == ' '))
           {
               fputc (c, pFile2);
           }
       } 
       fclose (pFile2);
       fclose (pFile);
    }
    return 0;
}

When I open the new file "nospaces.txt" everything is fine except there is weird character at the end. Gedit says it is /FF or /00 with a red background and complains that I shouldn't edit the file because I could corrupt it. No matter what I tried to do I cannot get rid of that strange character at the end. Here are examples of things I have tried:

-Adding EOF, '\0', and other random characters to the end of pFile2 using fputc (EOF , pFile2) -Putting random constraints on the value of c so that it doesn't pick characters that aren't letters or numbers (like 40 < c < 125) -Making it a do while instead of a while loop (I don't know the difference)

Please help. Thank you.

1
  • while (c != EOF){ c = fgetc (pFile); --> while ((c = fgetc (pFile)) != EOF){ Commented Sep 9, 2014 at 3:28

3 Answers 3

1

Your read loop is wrong. You read one character too many.

while (c != EOF)
{
    c = fgetc (pFile);
    if (!(c == ' ' || c == \t)) {   
        fputc (c, pFile2);
    }
} 

That says: "While c does not equal the end of file value, read one more character and write it" Well, that one more character is EOF. You need to extract the character prior to the check:

while ((c = fgetc(pFile)) != EOF)
{
    if (c != ' ' && c != \t) {   
        fputc (c, pFile2);
    }
} 

On a side note, c is uninitialized before you first read it, which is invokes undefined behavior as its value is indeterminate. The new version fixes that as well.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, this worked. Is there a particular 'programming robustness' reason for using if(not this and not that) rather than if not (this or that) ?
@AAC: Well, most importantly, I think it's more clear. It also has the advantage of short circuit evaluation. If the first condition fails there is no need to check the second. Yours must check both equalities.
The original version also uses short-circuit evaluation. If the first one is false then the other side of the || is not evaluated. In this case we can expect that the majority of the characters are not space or tab therefore the == check will likely yield false (conversely the != check will likely yield true). Which means that the original version using || will short-circuit more often than the version using &&. So this version evaluates both expressions more than the original.
@slebetman: You're right, not sure what I was thinking there.
1

You go:

int c;
// ...
while ( c != EOF )

This uses the value of c uninitialized, which does not have well-defined behaviour.

The reason you got a bogus character at the end is because when you call fgetc after reaching the end of the file, c is set to EOF, but then you go on to call fputc(c, pFile2) with the EOF.

The usual way to fix both of these problems without requiring code duplication is to include the read operation in the test condition:

while ( (c = fgetc(pFile)) != EOF )
{
    if ( c != ' ' )
        fputc(c, pFile2);
}

Also, this line has a logic problem:

if (!(c == ' ' || c == '    ')) {  

The first test, c == ' ' is clear enough. But I don't know what you're trying to do with the second part of the test (and I'm sure that you don't achieve whatever it is!). Things enclosed in ' ' represent a single character.

It just occurred to me that maybe you meant a horizontal tab; if so then write '\t'. You may find the function isspace from <ctype.h> useful, which checks for whitespace (although it counts '\n' as whitespace too, so you'd need to make an exception for that).

Comments

0

Actually your c is uninitialized first.So read first and then check condition in while, like follows

while ((c = fgetc(pFile)) != EOF)
{
 if (c != ' ' && c != \t) 
   {   
    fputc (c, pFile2);
   }
} 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.