Skip to content

regex search vs. UTF-8 encoding #250

@szepeviktor

Description

@szepeviktor

Bug Report

Describe the current, buggy behavior

wp db search '\p{Cf}' --regex

Regexp search for character classes finds individual bytes of an UTF-8 encoded character.
e.g. í in "hírlevél"
the result is displayed like "blog h▒▒rlevél feliratkozás"

How to search in UTF-8 encoded text?

BTW wp db search "$(printf '\xc3')" --regex also finds the first byte of í (actually all characters encoded on two bytes)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions