Sven’s Magic Touch

I wanted to get all the lines of http://www.pallier.org/ressources/dicofr/liste.de.mots.francais.frgut.txt

And ZnEasy get: ‘http://www.pallier.org/ressources/dicofr/liste.de.mots.francais.frgut.txt’

did not like to proceed…

Here is Sven trip and tricks at work!

This page returns text that is latin1 (iso-8859-1) encoded, but describes it as ‘text/plain’ without further qualification. Zn then assumes the encoding is utf8 (the most reasonable default today). Mime-types can specify the encoding as follows: ‘text/plain;charset=utf8’ or ‘text/plain;charset=latin1’.

Here is how to override the default in Zn

(ZnDefaultCharacterEncoder
value: ZnCharacterEncoder latin1
during: [
ZnClient new
get: ‘http://www.pallier.org/ressources/dicofr/liste.de.mots.francais.frgut.txt’ ]) lines.

The above will give you an array of 336531 words (it is a bit slow because it is lot of data).

This is a common problem 😉

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: