3

I'm a new learner of the C programming language, and I've been having trouble setting up some code I made for a small quiz in Brazilian Portuguese. Here it is:

#include <stdio.h>
#include <windows.h>
#include <locale.h>
#include <unistd.h>

int main() {
    int menu_principal;

    // Configura o terminal para usar UTF-8
    // Permite que o sistema leia letras com acento e outras como cedilha
    
    // SetConsoleOutputCP(CP_UTF8); // Não é necessário

    while(1) {
        

        // Lembrete: system("cls") limpa a tela
        system("cls");
        system("chcp 1252");

        // Menu principal
        printf("Digite o número correspondente a sua escolha:\n");
        printf("1. Iniciar o quiz\n");
        printf("2. Mostrar o resultado\n");
        printf("3. Limpar respostas\n");
        printf("4. Sair\n");
        printf("\n");
        printf("Escolha uma das opções: ");
        scanf("%d", &menu_principal);

        // Switch case para o menu principal
        switch(menu_principal) {
            case 1:
                // function
                break;
            case 2:
                // function
                break;
            case 3:
                // function
                break;
            case 4:
                // function
            default:
                // function
        }
    }
}

I've been trying for a while now to make the code page change to 1252/65001 and it simply doesn't work, is it just me?


EDIT:

Here's the image as asked:

image

On CMD, it also didn't work on PowerShell. The same code I used to accept special characters worked on a friend's computer.


EDIT:

Editor's note: Something's fishy. This code works on some computers but not others.

14
  • 2
    The documentation doesn't mention 1252 as a supported value. Commented Mar 6 at 23:52
  • 2
    Are you running in cmd.exe or another environment? What encoding is the source saved in? Using chcp 65001 with source saved in UTF-8 encoding works. So does using chcp 1252 and using Windows-1252 encoding. They need to agree. "It doesn't work" doesn't give us a clue what you are actually seeing. Commented Mar 6 at 23:58
  • 1
    @Joshua Not in my experience. But anyway, the image shows using chcp 65001 and the output is UTF-8 displayed as 1252, which doesn't make sense. Using matching encodings (whether chcp or setconsolecp) works. Something's fishy. Commented Mar 7 at 0:15
  • 1
    Sorry to bother again, it seems to work on the msvc terminal now, but not in cmd or powershell imgur.com/l1GIdNW Commented Mar 7 at 0:22
  • 2
    use system("chcp N"); this is a very bad way to call SetConsoleOutputCP(N); and at all your problem(dont understand what) in printf, not in SetConsoleOutputCP Commented Mar 7 at 7:18

2 Answers 2

2

at first about call to

system("chcp N");

this create new process, based on chcp.com, which call SetConsoleOutputCP(N) (and of course do many other api calls). so if need set output code page, direct call SetConsoleOutputCP and never system("chcp N");

now about main problem. you use printf. so your input text in multibyte. windows UI use unicode (wide char). so your text must be converted to unicode first. this is done by using MultiByteToWideChar api. what is CodePage (1-st parameter of MultiByteToWideChar) must be used ? it must match the encoding of your source file ("quiz.c"). this is the encoding in which the line is in your source file, in this same encoding it will get into the binary exe and will be transferred to printf i dot know in which encoding is quiz.c is saved on your system, but you must use this and only this encoding in code. also this your file encoding can initially loss some info, so can not be correct converted to unicode anyway. but probably in this case you will get C4566 warning:

character represented by universal-character-name 'char' cannot be represented in the current code page (page). Not every Unicode character can be represented in your current ANSI code page.

The same code I used to accept special characters worked on a friend's computer.

are binary version, build on your comp (where encoding already embeded), work ? i doubt (this must not be). or the exe builded from source code on another comp and another encoding use for source file/strings ? i guess this.

in any case you can not set any encoding in call SetConsoleOutputCP(N) but only encoding of your src file. you can use SetConsoleOutputCP(CP_UTF8); only in case your src file (strings passed to printf ) is in utf8 encoding.

and always better use src lines in wide char (unicode) from begin. use WriteConsoleW but not use wprintf - this is worst possible choise. because wprintf first convert input unicode to multibyte by wctomb_s. so you by fact got 2 conversions - first from unicode to multibyte, by locale in your current process and then from multibyte to unicode by codepage in conhost.exe (not your exe) which you set by SetConsoleOutputCP


effect of setlocale call on printf :

enter image description here

the right image is what is happens inside _write_nolock ( called inside printf) in case setlocale is not called before. and the left image - how is _write_nolock after call to setlocale

the different is huge.

in case setlocale is not called - simply WriteFile( s, strlen(s) ) is called (where s is string passed to printf(s) and conhost.exe do next:

enter image description here

call MultiByteToWideChar with CodePage equal to SetConsoleOutputCP()

so here we have single WriteFile call which handled inside conhost.exe and single MultiByteToWideChar call where codepage is set by SetConsoleOutputCP()

but what is happens after setlocale ? no processing became absolute different. in current process, every characters in string first convert to corresponding wide character via mbtowc. which internal call MultiByteToWideChar with CodePage based to setlocale argument. then wide character converted back to multibyte character with WideCharToMultiByte and CodePage = GetConsoleCP() ( Retrieves the input code page used by the console associated with the calling process ). yes, not GetConsoleOutputCP() ( Retrieves the output code page used by the console associated with the calling process ). then call WriteFile for single character . and in conhost again will be call MultiByteToWideChar ( GetConsoleOutputCP())

so here we have 3 tranforms in loop:

while (char*c = *str++)
{
  wchar_t w;
  MultiByteToWideChar (fn(locale from setlocale), 0, &c, 1, &w, 1);
  WideCharToMultiByte(GetConsoleCP(), 0, &w, 1, &c, 1, 0, 0);
  WriteFile(, &c, 1, );
  //++conhost
  MultiByteToWideChar (GetConsoleOutputCP(), 0, &c, 1,  )
  //--conhost
}

and if we have N chars in string will be N calls to WriteFile - so to kernel and then to remote process conhost.exe

when in case no setlocale will be next:

  WriteFile(, str, strlen(str), );
  //++conhost
  MultiByteToWideChar (GetConsoleOutputCP(), 0, str, strlen(str),  )
  //--conhost

(no loops, sigle call to WriteFile and MultiByteToWideChar)

Sign up to request clarification or add additional context in comments.

3 Comments

It is advisable to use in the program first GetConsoleOutputCP and save the preferred code page in memory of the program before using SetConsoleOutputCP. There should be used SetConsoleOutputCP with the data get by GetConsoleOutputCP before exiting main. A user could open a command prompt window, run your program, and then do something else in still opened command prompt window.
The code page preferred by the user according to the region settings of the user account should be always saved internally in application memory and restored before exiting the application. The default font on Windows versions prior Windows 8 is Terminal which supports only a very limited subset of characters specified by the Unicode consortium. It could be necessary for that reason to use first GetCurrentConsoleFontEx (Windows Vista or later and Windows Server 2008 or later needed) and next SetCurrentConsoleFontEx to change the font to Consolas.
Finally SetCurrentConsoleFontEx is used again with the initially get font data to restore the initial console font and its size to make it possible for the user to continue using the still opened command prompt window with the font preferred by the user.
0

it's been a few days, I tried suggestions from every comment in this thread with no success. Earlier today, though, my teacher told me a way to fix it through the command system("chcp 65001 > nul"), it didn't work, so I took off the set_locale command and library, and voíla, special characters work.

I'm sorry to say I don't know how it works, however, instead of keeping this unanswered, I chose to share it here for future fixes. I'd appreciate any explanations as to why it works, though...

1 Comment

in your code was not call to setlocale. only in image (so code in image and your real code not equal to code which you paste here) the setlocale have huge negative effect. and say use system("chcp 65001 > nul") only speaks of the illiteracy of the person who gave such advice

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.