I am trying to replace multiple instances of a given Unicode string in a group of LaTeX files. For example, the original line reads:
\greek{῾Ρώμην}\nft{2}{ \greek{Ῥώμη, ῥώμη}: \textit{bodily strength, might, Roma, Rome}}
And I would replace it with the following line:
\greek{῾Ρώμην}\nft{2}{ \greek{Ῥώμη, ῥώμη}: \textit{Roma, Rome, bodily strength, might}}
I have created a substitute file (subs.tsv) to hold all my replacements for these files. It follows a pattern of target-substitution across two columns. For the above replacments, it would read thus:
"\greek{῾Ρώμην}\nft{2}{ \greek{Ῥώμη, ῥώμη}: \textit{bodily strength, might, Roma, Rome}}" "\greek{῾Ρώμην}\nft{2}{ \greek{Ῥώμη, ῥώμη}: \textit{Roma, Rome, bodily strength, might}}"
When I run this through Python, however, I get a mass of gobbledygook that looks like this:
Øgreekæ῾ΡώμηνåØnftæ2åæ ØgreekæῬώμη, ῥώμηå: Øtextitæbodily strength, might, Roma, Romeåå" "Øgreekæ῾ΡώμηνåØnftæ2åæ ØgreekæῬώμη, ῥώμηå: ØtextitæRoma, Rome, bodily strength, mightåå
Current coding effort runs like this:
#!/usr/bin/env python3
from os import listdir
from glob import glob
fileset = '.'
with open('subs.tsv', encoding='utf-8') as subsfile:
subs = subsfile.readlines()
for filename in glob('./*tex'):
with open(filename, encoding='utf-8', errors='ignore') as f:
filecontents = f.read()
for subline in subs:
subitem = subline.split('\t')
target = subitem[0]
sub = subitem[1]
filesubbed = filecontents.replace(target, sub)
Thanks in advance for your consideration and any help you can offer.