linux - encoding: unrtf SYMBOL.charmap needs to be altered -


i'm trying convert files rtf text. originals created windows application (probably word) conversion taking place on linux server. tool wish use unrtf since comes linux distro (sles !!.x) pre-installed... or @ least didn't have intall it.

there isn't lot of doco on unrtf. works , there man page limited info. problem encoding coming out iso-8859-1 , need iso-8859-15 in order euro symbol (€). i'm getting not symbol (¬). viewing document in hex-mode see there value of xac00 @ point symbol € should be.

searching web found out € has unicde value of x20ac , ¬ has unicode value of x00ac. bit more searching suggested encoding of iso-8859-15 correct value x00a4. lot of information found contradictory , confusing (not mention way off topic unrtf after all).

amongst commands i've tried are:

unrtf --text $rtf > $xrtf  unrtf --text $rtf | iconv -c -f utf-8 -t iso-8859-15  > $xrtf 

where $rtf , $xrtf input , output files respectively. checked supposed encoding of rtf file

file -bi $rtf 

and returned answer of iso-8859-1. tried following:

unrtf --text $rtf | iconv -c -f iso-8859-1 -t iso-8859-15  > $xrtf 

in 1 final grasp @ straws tried creating own symbol.charmap file , changed value not symbol "u<20ac>" syntax file. tried command:

unrtf --text -p $home/usr/local/share/unrtf $rtf > $xrtf 

all these attempts achieved absolutely nothing... except second 1 removed not symbol altogether virtue of -c option (i think).

anybody have ideas on how achieve desired conversion?

i haven't got complete solution, have effective work-around. first thing note encodings iso-8859-1 , iso-8859-15 same (see this link). there 8 differences. secondly, how chars displayed depends on software reading file , not on conversion software (in case unrtf).

thus task reduced 1 symbol (€ in place of ¬), since others not in use in relevant files. comes down changing "xac" "xa4" in each file after conversion. can done simple sed command:

sed 's/\xac/\xa4/g' temp1.txt > temp2.txt 

that's it. say: it's work-around.

changing symbol.charmap file should have worked, i'm no expert on unrtf, maybe did incorrectly.


Comments

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

javascript - angular ng-required radio button not toggling required off in firefox 33, OK in chrome -

xcode - Swift Playground - Files are not readable -