Python cut text in pieces to upload to Wordpress Blog

Question

I have a piece that already has some formatting. Now I need to convert this to a format so I can use the Wordpress API to send it to wordpress.

This is an example of my text:

'**H1: Some text**\n\nSome text as paragraph.\n\n**H2: A subheader**\n\nText from the subheader.\n\nA line break with some more text.\n\n**H2: Another sub hearder**\n\n**H3: A sub sub header

I tried this:

test = myFullText
header1 = re.findall('H1.*?ph.', test)

And

 test = myFullText
 header1 = re.findall('H1.*?\n\n.', test)

Both give me empty "header1"

More general question. I assume the findall function is the best approach for my use case. Or is there another option to achieve this. Like I mentioned. My ultimate goal is to create a Wordpress blogpost from this text.

Yes, it fine, Better you can use regular expressions Match headers with optional content following headerpattern = r"**(H\d): (.*?)**" headers = re.findall(header_pattern, test) — Arpan Gautam
– Arpan Gautam, Commented Jul 7 at 9:21

Friedrich · Accepted Answer · 2025-07-07 10:58:29Z

3

Yes, it fine, Better you can use regular expressions Match headers with optional content following

headerpattern = r"\*\*(H\d): (.*?)\*\*"
headers = re.findall(header_pattern, test)

edited Jul 7 at 10:58

Friedrich

5,52416 gold badges86 silver badges63 bronze badges

answered Jul 7 at 9:22

Arpan Gautam

965 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

hacking_mike Jul 7 at 9:35

I use chatGpt for my initial text. Now it returns # for H1 and ## for H2 etc. Using your example I am able to get the individual texts. However it includes the # character. How can I exclude that?

Arpan Gautam Jul 7 at 9:43

pattern = r"^(#{1,6})\s+(.*)$" matches = re.findall(pattern, text, re.MULTILINE) for level, content in matches: print(f"H{len(level)}: {content}") Markdown defines up to 6 levels of headers, so: #{1,6} means “match between 1 and 6 # characters”

hacking_mike Jul 7 at 10:05

This is great. I indeed get all the headers. One last ;-) question, how can I also include the paragraphs? Now I get this: '#' -so this is a H1 'Some text' - H1 content And now I would like to add: 'Some text as paragraph' - Paragraph/content belonging to H1

Arpan Gautam Jul 7 at 10:56

import re text = """ # Header 1 This is the first paragraph under header 1. ## Header 2 This is some text under header 2. Another paragraph under the same header. ### Header 3 More content here. """ # Pattern: Match headers and their following content pattern = r"^(#{1,6})\s+(.*?)\n(.*?)(?=\n#{1,6}\s+|\Z)" # \Z = end of text matches = re.findall(pattern, text, re.DOTALL | re.MULTILINE) for level, header, content in matches: header_level = len(level) clean_content = content.strip() print(f"H{header_level}: {header}") print (f"Paragraph:\n{clean_co

Collectives™ on Stack Overflow

Python cut text in pieces to upload to Wordpress Blog

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related