Better management of encoding of environment variables

Hi all,

Thanks Ben for reading.

For those wanting a follow up, I’ve proposed this pull request:
https://github.com/pharo-project/pharo/pull/1980.
I’m still working on avoiding dependencies against UFFI, fixing one other
test.
This is however almost finished, and given that I had to adapt the
original *abstract
proposal* to fit the real system, here is an updated version:

API Proposal for OSEnvironment and friends
=========================

OSEnvironment is the common denominator for all platforms. They should
implement at least the following messages with the following semantics:

– *at: aVariableName [ifAbsent:/ifAbsentPut:/ifPresent:ifAbsent:]*

Gets the String value of an environment variable called `aVariableName`.
It is the system reponsibility to manage the encoding of *both arguments
and return values*.

– *at: aVariableName put: aValue*

Sets the environment variable called `aVariableName` to value `aValue`.
It is the system reponsibility to manage the encoding of *both arguments
and return values*.

– *removeKey: aVariableName*

Removes the environment variable called `aVariableName`.
It is the system reponsibility to manage the encoding of *both arguments
and return values*.

API Extensions for *Nix Systems (OSX & Linux)
=========================

Since *Nixes environment variables are binary data that could be encoded in
any encoding, the following methods provide more flexibility to access such
data in the encoding of the choice of the user, or even in binary form.

– *at: aVariableName encoding: anEncoding
[ifAbsent:/ifAbsentPut:/ifPresent:ifAbsent:/put:] / removeKey:**
aVariableName
encoding: anEncoding*

Variants of the common API from OSEnvironment.
The encoding used as argument will be used to encode/decode *both arguments
and return values*.

– *rawAt: anEncodedVariableName encoding: anEncoding
[ifAbsent:/ifAbsentPut:/ifPresent:ifAbsent:/put:] / removeRawKey:*
*anEncodedVariableName*

Variants of the common API from OSEnvironment.
These methods assume arguments and return values are encoded/decoded by the
user, so no marshalling or decoded is done by it.

Rationale
=========================

– Encoding/Decoding should be applied not only to values but to
variables names too. In most cases Ascii overlaps with utf* and Latin*
encodings, but this cannot be simply assumed.
– Windows requires calling the right *Wide version of the functions from
C, plus the correct encoding routine. This could be implemented as an FFI
call or by modifying the VM to do it properly instead of calling the Ascii
version.
– Unix FileSystems and environment variables could mix strings in
different encodings, thus the flexibility added by the low level *Nix
extensions.

Other Implementation Details
=========================

– VM primitives returning paths Strings should be carefuly managed to
decode them, since they are actually C strings (so byte arrays) disguised
as ByteStrings.
– Similar changes had to be applied to correctly obtain the current
working directory in case it is a wide string.

 

Guille

Advertisements
%d bloggers like this: