2

I have the following simple script, running under PHP 8.3.6

<?php
$original = '"=?utf-8?Q?part1=40part2.com?=" <[email protected]>' ;
$converted = imap_utf8($original) ;
printf("Original: %s\nConverted: %s\n", $original, $converted) ;

When this is executed, the result is that the $converted text is exactly equal to the original text.

I get values like this (especially in the TO field) when using IMAP_SEARCH and other functions that return headers. I am sure this will be widespread, I just just getting into initial testing with PHP IMAP. Note in particular the embedded double quotes, which may be (a part of) the problem.

What is the appropriate way to decode a value like the above?

1
  • 4
    Yes, I can reproduce your problem. Removing the double-quotes resolves the problem, but that's clearly not what you want. Double-quotes have a special meaning in MIME. I did find that imap_mime_header_decode() can decode your string correctly, but it requires a bit more code to parse the resulting array. Commented Dec 16 at 22:18

2 Answers 2

1

The imap_utf8 function is designed to convert MIME-encoded text (like =?charset?encoding?encoded-text?=) to UTF-8.

The issue you have encountered may be due to the input string, which may contain intended double quotes (U+0022 or "fancy quotes" like U+201C), is not a correctly formatted MIME header string according to RFC standards

One of the possible causes of the above is due to improper encoding by the sending client.

One of the workarounds is to use a function applying imap_mime_header_decode and mb_convert_encoding to perform the parsing job in a function.

Please note that mb_convert_encoding is needed to convert text from its original charset to UTF-8, if necessary.

so the function is:

function custom_imap_utf8_decode($mime_encoded_text) {
    $decoded_elements = imap_mime_header_decode($mime_encoded_text);
    $decoded_string = '';

    foreach ($decoded_elements as $element) {
        // Convert the text to UTF-8 from its original charset, if necessary
        if ($element->charset != 'utf-8' && $element->charset != 'default') {
            $decoded_string .= mb_convert_encoding($element->text, 'UTF-8', $element->charset);
        } else {
            $decoded_string .= $element->text;
        }
    }
    return $decoded_string;
}

So the following is a working example code:

<?php
$original = '"=?utf-8?Q?part1=40part2.com?=" <[email protected]>' ;


function custom_imap_utf8_decode($mime_encoded_text) {
    $decoded_elements = imap_mime_header_decode($mime_encoded_text);
    $decoded_string = '';

    foreach ($decoded_elements as $element) {
        // Convert the text to UTF-8 from its original charset, if necessary
        if ($element->charset != 'utf-8' && $element->charset != 'default') {
            $decoded_string .= mb_convert_encoding($element->text, 'UTF-8', $element->charset);
        } else {
            $decoded_string .= $element->text;
        }
    }
    return $decoded_string;
}

//$converted = imap_utf8($original) ;
$converted = custom_imap_utf8_decode($original);


//printf("Original: %s\nConverted: %s\n", $original, $converted) ;
echo "Original:". $original;
echo "<br>";
echo "Converted:". $converted; 
?>

The result will be:

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Excellent, thank you . This addresses my needs and provides an explanation why this might be occurring. Double whammy! I appreciate the answer and the code.
Thanks a lot. Have a nice day.
Please note that Generative AI (e.g., ChatGPT) is banned and read Help Center: AI policy. It is not permitted to use AI tools to generate or reword content that you post on Stack Overflow.
1

imap_utf8() is not a general RFC-2047 decoder and it only decodes valid RFC-2047 encoded words. Seems your input is not a single encoded word. It's a header-style string

RFC-2047 explicitly states:

  • Encoded words must not be inside quoted strings
  • Encoded words are decoded only when parsing a full header

RFC-2047


Use imap_mime_header_decode() instead

$original = '"=?utf-8?Q?part1=40part2.com?=" <[email protected]>';

$decodedParts = imap_mime_header_decode($original);

$decoded = '';
foreach ($decodedParts as $part) {
    $decoded .= $part->text;
}

printf("Original: %s\nDecoded: %s\n", $original, $decoded );

1 Comment

Thank you, Abdulla. While your answer will probably serve my needs, I have accepted an alternate answer because it also handles the possibility of multibyte data. I appreciate your efforts on this, though. Don't want you to think otherwise.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.