PxPlus User Forum

Main Board => Discussions => Programming => Topic started by: Thomas Bock on August 19, 2020, 10:06:43 AM

Title: representation of unicode characters
Post by: Thomas Bock on August 19, 2020, 10:06:43 AM
A customer expects "B%C3BCchse" for the german word "Büchse" e.g. This looks like an URL encoded UTF8 character. So I gave it a try, but had no luck so far. This is what I tried:
Code: [Select]
thing$="Büchse"
thingUTF8$=cvs(thing$,"ASCII:UTF8")
print hta(thingUTF8$), " OK"
thingURL$=cvs(thingUTF8$,"UTF8:URL")
print hta(thingURL$)," nOK"
print thingURL$
Is there a way to do it with CVS()?
Title: Re: representation of unicode characters
Post by: Devon Austen on August 19, 2020, 10:34:56 AM
"B%C3BCchse" is not the correct URL encoding for "Büchse". The correct encoding would be either "B%F9chse" keeping it ascii or "B%C3%B9chse" encoding UTF8.

With CVS you can get "B%F9chse" by just doing CVS(thing$,"ASCII:URL"). See Mike's post for how to get the other format.
Title: Re: representation of unicode characters
Post by: Mike King on August 19, 2020, 10:53:13 AM
Thomas

Are you sure about what the customer expects?
If I convert the value you have first from ANSI (ISO 8859-1) to UTF8 then to URL encoding I get the following:

->thing$="Büchse"
->thingUTF8$=cvs(thing$,"ASCII:UTF8")
->print thingUTF8$
Büchse
->print cvs(thingUTF8$,"ASCII:URL")
B%C3%BCchse


That's awfully close to what you posted so is it possible in your example you missed the second %?
Title: Re: representation of unicode characters
Post by: Thomas Bock on August 20, 2020, 01:53:51 AM
According to his specifiaction all unicocde characters must be written using the pattern %NNNN. There are several examples showing this.
That kind of encoding is new to me, too. Perhaps I can convince him to use Mike's approach.
Title: Re: representation of unicode characters
Post by: Thomas Bock on August 20, 2020, 06:55:34 AM
The URL encoding was just my guess because of the leading "%".
I think I must encode/decode this myself, as CVS has no option for that kind of notation.
Title: Re: representation of unicode characters
Post by: Mike King on August 20, 2020, 06:59:17 AM
Generally you don't use URL encoding on Unicode data but rather UTF-8 data.  Here sis a bit of discussion on the subject which generally recommends Using UTF8.

https://stackoverflow.com/questions/912811/what-is-the-proper-way-to-url-encode-unicode-characters

Title: Re: representation of unicode characters
Post by: Devon Austen on August 20, 2020, 08:15:07 AM
If they are not using this for a URL and need to use the non standard %NNNN encoding then yes you would have to do it yourself. One possible way would be to go through the string character by character and do a CVS(chrstr$,"ASCII:UTF8") if the result is different you can add the % at the beginning and add it to the output string if the result of CVS is the same just add it as is to output string.