PxPlus User Forum

Twitter Twitter Twitter

Author Topic: representation of unicode characters  (Read 1356 times)

Thomas Bock

  • Diamond Member
  • *****
  • Posts: 177
    • View Profile
representation of unicode characters
« on: August 19, 2020, 10:06:43 AM »
A customer expects "B%C3BCchse" for the german word "Büchse" e.g. This looks like an URL encoded UTF8 character. So I gave it a try, but had no luck so far. This is what I tried:
Code: [Select]
thing$="Büchse"
thingUTF8$=cvs(thing$,"ASCII:UTF8")
print hta(thingUTF8$), " OK"
thingURL$=cvs(thingUTF8$,"UTF8:URL")
print hta(thingURL$)," nOK"
print thingURL$
Is there a way to do it with CVS()?

Devon Austen

  • Administrator
  • Diamond Member
  • *****
  • Posts: 382
  • Don’t Panic
    • View Profile
    • PVX Plus Technologies
Re: representation of unicode characters
« Reply #1 on: August 19, 2020, 10:34:56 AM »
"B%C3BCchse" is not the correct URL encoding for "Büchse". The correct encoding would be either "B%F9chse" keeping it ascii or "B%C3%B9chse" encoding UTF8.

With CVS you can get "B%F9chse" by just doing CVS(thing$,"ASCII:URL"). See Mike's post for how to get the other format.
« Last Edit: August 19, 2020, 11:09:55 AM by Devon Austen »
Principal Software Engineer for PVX Plus Technologies LTD.

Mike King

  • Diamond Member
  • *****
  • Posts: 3810
  • Mike King
    • View Profile
    • BBSysco Consulting
Re: representation of unicode characters
« Reply #2 on: August 19, 2020, 10:53:13 AM »
Thomas

Are you sure about what the customer expects?
If I convert the value you have first from ANSI (ISO 8859-1) to UTF8 then to URL encoding I get the following:

->thing$="Büchse"
->thingUTF8$=cvs(thing$,"ASCII:UTF8")
->print thingUTF8$
Büchse
->print cvs(thingUTF8$,"ASCII:URL")
B%C3%BCchse


That's awfully close to what you posted so is it possible in your example you missed the second %?
Mike King
President - BBSysco Consulting
eMail: mike.king@bbsysco.com

Thomas Bock

  • Diamond Member
  • *****
  • Posts: 177
    • View Profile
Re: representation of unicode characters
« Reply #3 on: August 20, 2020, 01:53:51 AM »
According to his specifiaction all unicocde characters must be written using the pattern %NNNN. There are several examples showing this.
That kind of encoding is new to me, too. Perhaps I can convince him to use Mike's approach.

Thomas Bock

  • Diamond Member
  • *****
  • Posts: 177
    • View Profile
Re: representation of unicode characters
« Reply #4 on: August 20, 2020, 06:55:34 AM »
The URL encoding was just my guess because of the leading "%".
I think I must encode/decode this myself, as CVS has no option for that kind of notation.

Mike King

  • Diamond Member
  • *****
  • Posts: 3810
  • Mike King
    • View Profile
    • BBSysco Consulting
Re: representation of unicode characters
« Reply #5 on: August 20, 2020, 06:59:17 AM »
Generally you don't use URL encoding on Unicode data but rather UTF-8 data.  Here sis a bit of discussion on the subject which generally recommends Using UTF8.

https://stackoverflow.com/questions/912811/what-is-the-proper-way-to-url-encode-unicode-characters

Mike King
President - BBSysco Consulting
eMail: mike.king@bbsysco.com

Devon Austen

  • Administrator
  • Diamond Member
  • *****
  • Posts: 382
  • Don’t Panic
    • View Profile
    • PVX Plus Technologies
Re: representation of unicode characters
« Reply #6 on: August 20, 2020, 08:15:07 AM »
If they are not using this for a URL and need to use the non standard %NNNN encoding then yes you would have to do it yourself. One possible way would be to go through the string character by character and do a CVS(chrstr$,"ASCII:UTF8") if the result is different you can add the % at the beginning and add it to the output string if the result of CVS is the same just add it as is to output string.
Principal Software Engineer for PVX Plus Technologies LTD.