Sharing some discussions

To give you a feel of the friendliness of the Pharo community, here is a part of a discussion:


Hi Offray,

> On 27 Jul 2018, at 12:39, Offray Vladimir Luna Cárdenas <offray.luna at mutabit.com> wrote:
> 
> Hi,
> 
> I was ready to show a friend the Pharo web capabilities with the
> classical "myString asUrl retrieveContents", but the friend gave me a
> url that contains non Latin characters[1] and then I got an
> ZnInvalidUTF8 error.
> 
> [1]
> http://www.bidchance.com/freesearch.do?&filetype=&channel=&currentpage=1&searchtype=zb&queryword=%BF%A6%CA%B2&displayStyle=&pstate=&field=&leftday=&province=&bidfile=&project=&heshi=&recommend=&field=&jing=&starttime=&endtime=&attachment=
> 
> How can I process web addresses in Pharo that contain non latin
> characters like the one in [1]?

I am on holiday, so I cannot go too deep into this, but AFAIU the URL is wrong (or it assumes a specific context with a non-standard encoding).

In a URL's query part, non-ASCII data is first UTF-8 encoded, then percent encoded (this is the modern way).

I don't read Chinese, so it is hard to infer much from the original site, but I am assuming the search is for '喀什', a city called Kashgar, https://en.wikipedia.org/wiki/Kashgar_(disambiguation).

The string in question can be written as (to avoid copy/paste problems):

  String with: 21888 asCharacter with: 20160 asCharacter.

The encoding in a URL has to be:

  ZnPercentEncoder new encode: (String with: 21888 asCharacter with: 20160 asCharacter).

This gives us for example the following URL:

  'https://www.google.com/search?q=%E5%96%80%E4%BB%80' asUrl.

Which parses OK and contains the correct encoded string (decoded in the URL object):

  'https://www.google.com/search?q=%E5%96%80%E4%BB%80' asUrl queryAt: #q.

If you copy/paste that URL in your browser it should resolve to stuff about Kashgar.

Obviously the website www.bidchance.com does something else (non-standard ?).

HTH,

Sven

> Thanks,
> 
> Offray
Advertisements
%d bloggers like this: