Python 3 Data Processing - Data Conversion bytes/string/ASCII/GBK

When using Python3 for data parsing, the commonly used data types are: byte, string, list, dictionary (converted to JSON)
Different data encoding formats result in different byte arrays stored in the data.

Common data encoding formats, ASCII code, GBK code, UTF8 code.

Data Conversion Reference Example

The following examples are for reference only, and the performance is subject to actual testing conditions. Here, we only consider whether data conversion can be achieved.

1. Convert bytes to strings (str)
rec_msg = b'x12x55xaaxFFx55x34'
out_s = ''  
for i in range(0, len(rec_msg)):     #get byte array word data, pay attention to quotation marks' 'there is a space between them 
out_s = out_s + ' ' + (hex(int(rec_msg[i]))).upper()[2:].zfill(2)

print(out_s)    # >>> 12 55 AA FF 55 34

Analysis of conversion principle: Separate byte, convert it to int(), then hex() returns to hexadecimal number, truncate the prefix of 0x(), and then traverse and concatenate it into a string.

2. Convert string (str) to byte array
hex_string = '01 02 27 11 00 08'            #simulate request read DISCRETE instructions for information (not added) crc value) 
heartbeat = bytearray.fromhex(hex_string)
print(heartbeat)    # >>> bytearray(b"x01x02'x11x00x08")
3. Convert int to hexadecimal string
hex(10)  # >> 0xa, type=str
4. Convert hexadecimal strings to int
int(hex(12), 16)    # 12
int('FF', 16)       # 255
int('FFEF', 16)     # 65519
5. Convert hexadecimal string/int to binary string
bin(int('FF', 16))     # >>> '0b11111111'
bin(0)[2:].zfill(8)    # >>> '00000000'
6. Convert list to string
register_list = [1, 2, 3, 4,]
str_set = ''
for i in range(len(register_list)):
str_set = str_set + register_list[i] + ' '
7. Cut the string by space and transfer it to the list
out_s = '12 22 34 45 56'
out_s.split()    #cut a string by space and transfer it to the list 
# >>> ['12', '22', '34', '45', '56']
8. Convert GBK encoded strings (Chinese) to bytes and hexadecimal strings

GBK code: (1 Chinese corresponds to 2 bytes).

a = 'related to'
a.encode('GBK')         # b'xd3xeb'
type(a.encode('GBK'))   # <class 'bytes'>
print('{}'.format(a.encode('GBK')).replace("b'\x", '').replace('\x', '').replace("'", '').strip().upper())
# 'D3EB'
9. Convert bytes/bytearray/hexadecimal string to GBK encoded string (Chinese)
a = b'xbfxc6'             # bytes
print(a.decode('GBK'))      # 'section'
b = 'bfc6 bfc6'             # bytes
b = bytearray.fromhex(b)    # bytearray
c = b.decode('GBK')         # str () string (chinese) 
print('c:', c)              #  c: science and technology 
10. UTF-8 encoding format is also converted using the same method
a = 'china'
b = a.encode('utf-8')      # b'xe4xb8xadxe5x9bxbd'
d = b'xe4xb8xadxe5x9bxbd'
d.decode('utf-8')          # 'china'
11. ASCII code conversion
a = 't'
ord(a)         # 116
hex(ord(a))    # '0x74'
chr () you can directly use hexadecimal numbers or int convert to corresponding ASCII code character 
a = 0x30
chr(a)         # '0'
chr(48)        # '0'
12. 8-base string/byte conversion
oct(10)           # '0o12'
int(b'0o12', 8)   # 10
int('12', 8)      # 10
13. Cross conversion, such as converting existing GBK and ASCII encoded strings to hexadecimal strings

❌ Error Example.

a = 'Python 3 data conversion test'
a.encode('GBK')    # b'Python3xcaxfdxbexddxd7xaaxbbxbbtest'

As mentioned above, using GBK encoding alone cannot convert non Chinese characters to corresponding hexadecimal bytes. In this case, the string can be split and converted one by one.

lst = []
for i in range(len(a)):
lst.append(hex(ord(a[i]))[2:])
lst    # ['50', '79', '74', '68', '6f', '6e', '33', '6570', '636e', '8f6c', '6362', '74', '65', '73', '74']

It can be observed that after the ord() conversion, Chinese occupies 2 bytes and English occupies 1 byte.

Related articles