0

I am trying to replace multiple instances of a given Unicode string in a group of LaTeX files. For example, the original line reads:

\greek{῾Ρώμην}\nft{2}{ \greek{Ῥώμη, ῥώμη}: \textit{bodily strength, might, Roma, Rome}}

And I would replace it with the following line:

\greek{῾Ρώμην}\nft{2}{ \greek{Ῥώμη, ῥώμη}: \textit{Roma, Rome, bodily strength, might}}

I have created a substitute file (subs.tsv) to hold all my replacements for these files. It follows a pattern of target-substitution across two columns. For the above replacments, it would read thus:

"\greek{῾Ρώμην}\nft{2}{ \greek{Ῥώμη, ῥώμη}: \textit{bodily strength, might, Roma, Rome}}" "\greek{῾Ρώμην}\nft{2}{ \greek{Ῥώμη, ῥώμη}: \textit{Roma, Rome, bodily strength, might}}"

When I run this through Python, however, I get a mass of gobbledygook that looks like this:

Øgreekæ῾ΡώμηνåØnftæ2åæ ØgreekæῬώμη, ῥώμηå: Øtextitæbodily strength, might, Roma, Romeåå" "Øgreekæ῾ΡώμηνåØnftæ2åæ ØgreekæῬώμη, ῥώμηå: ØtextitæRoma, Rome, bodily strength, mightåå

Current coding effort runs like this:

#!/usr/bin/env python3

from os import listdir
from glob import glob

fileset = '.'

with open('subs.tsv', encoding='utf-8') as subsfile:
    subs = subsfile.readlines()



for filename in glob('./*tex'): 
    with open(filename, encoding='utf-8', errors='ignore') as f:
        filecontents = f.read()
        for subline in subs:
            subitem = subline.split('\t')
            target = subitem[0]
            sub = subitem[1]
            filesubbed = filecontents.replace(target, sub)

Thanks in advance for your consideration and any help you can offer.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.