linux - encoding: unrtf SYMBOL.charmap needs to be altered -
i'm trying convert files rtf text. originals created windows application (probably word) conversion taking place on linux server. tool wish use unrtf
since comes linux distro (sles !!.x) pre-installed... or @ least didn't have intall it.
there isn't lot of doco on unrtf
. works , there man page limited info. problem encoding coming out iso-8859-1 , need iso-8859-15 in order euro symbol (€). i'm getting not symbol (¬). viewing document in hex-mode see there value of xac00 @ point symbol € should be.
searching web found out € has unicde value of x20ac , ¬ has unicode value of x00ac. bit more searching suggested encoding of iso-8859-15 correct value x00a4. lot of information found contradictory , confusing (not mention way off topic unrtf
after all).
amongst commands i've tried are:
unrtf --text $rtf > $xrtf unrtf --text $rtf | iconv -c -f utf-8 -t iso-8859-15 > $xrtf
where $rtf , $xrtf input , output files respectively. checked supposed encoding of rtf file
file -bi $rtf
and returned answer of iso-8859-1. tried following:
unrtf --text $rtf | iconv -c -f iso-8859-1 -t iso-8859-15 > $xrtf
in 1 final grasp @ straws tried creating own symbol.charmap file , changed value not symbol "u<20ac>" syntax file. tried command:
unrtf --text -p $home/usr/local/share/unrtf $rtf > $xrtf
all these attempts achieved absolutely nothing... except second 1 removed not symbol altogether virtue of -c option (i think).
anybody have ideas on how achieve desired conversion?
i haven't got complete solution, have effective work-around. first thing note encodings iso-8859-1 , iso-8859-15 same (see this link). there 8 differences. secondly, how chars displayed depends on software reading file , not on conversion software (in case unrtf
).
thus task reduced 1 symbol (€ in place of ¬), since others not in use in relevant files. comes down changing "xac" "xa4" in each file after conversion. can done simple sed command:
sed 's/\xac/\xa4/g' temp1.txt > temp2.txt
that's it. say: it's work-around.
changing symbol.charmap file should have worked, i'm no expert on unrtf
, maybe did incorrectly.
Comments
Post a Comment