representation of unicode characters

Started by Thomas Bock, August 19, 2020, 10:06:43 AM

Previous topic - Next topic

Thomas Bock

A customer expects "B%C3BCchse" for the german word "Büchse" e.g. This looks like an URL encoded UTF8 character. So I gave it a try, but had no luck so far. This is what I tried:

thing$="Büchse"
thingUTF8$=cvs(thing$,"ASCII:UTF8")
print hta(thingUTF8$), " OK"
thingURL$=cvs(thingUTF8$,"UTF8:URL")
print hta(thingURL$)," nOK"
print thingURL$

Is there a way to do it with CVS()?

Devon Austen

#1
"B%C3BCchse" is not the correct URL encoding for "Büchse". The correct encoding would be either "B%F9chse" keeping it ascii or "B%C3%B9chse" encoding UTF8.

With CVS you can get "B%F9chse" by just doing CVS(thing$,"ASCII:URL"). See Mike's post for how to get the other format.
Principal Software Engineer for PVX Plus Technologies LTD.

Mike King

Thomas

Are you sure about what the customer expects?
If I convert the value you have first from ANSI (ISO 8859-1) to UTF8 then to URL encoding I get the following:

->thing$="Büchse"
->thingUTF8$=cvs(thing$,"ASCII:UTF8")
->print thingUTF8$
Büchse
->print cvs(thingUTF8$,"ASCII:URL")
B%C3%BCchse


That's awfully close to what you posted so is it possible in your example you missed the second %?
Mike King
President - BBSysco Consulting - http://www.bbsysco.com
eMail: mike.king@bbsysco.com

Thomas Bock

According to his specifiaction all unicocde characters must be written using the pattern %NNNN. There are several examples showing this.
That kind of encoding is new to me, too. Perhaps I can convince him to use Mike's approach.

Thomas Bock

The URL encoding was just my guess because of the leading "%".
I think I must encode/decode this myself, as CVS has no option for that kind of notation.

Mike King

Generally you don't use URL encoding on Unicode data but rather UTF-8 data.  Here sis a bit of discussion on the subject which generally recommends Using UTF8.

https://stackoverflow.com/questions/912811/what-is-the-proper-way-to-url-encode-unicode-characters

Mike King
President - BBSysco Consulting - http://www.bbsysco.com
eMail: mike.king@bbsysco.com

Devon Austen

If they are not using this for a URL and need to use the non standard %NNNN encoding then yes you would have to do it yourself. One possible way would be to go through the string character by character and do a CVS(chrstr$,"ASCII:UTF8") if the result is different you can add the % at the beginning and add it to the output string if the result of CVS is the same just add it as is to output string.
Principal Software Engineer for PVX Plus Technologies LTD.