New Files in Pharo – Migration Guide, How To’s and examples

Hi all,
I’ve put some minutes summarizing the new APIs provided by the combination of the new File implementation and the Zn encoders. They all basically follow the decorator pattern to stack different responsibilities such as buffering, encoding, line ending conversions.
Please, do not hesitate to give your feedback.
Guille
1. Basic Files
By default files are binary. Not buffered.
(File named: ‘name’) readStream.
(File named: ‘name’) readStreamDo: [ :stream | … ].
(File named: ‘name’) writeStream.
(File named: ‘name’) writeStreamDo: [ :stream | … ].
2. Encoding
To add encoding, wrap a stream with a corresponding ZnCharacterRead/WriteStream.
“Reading”
utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: ‘utf8’.
utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: ‘utf16’.
“Writing”
utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: ‘utf8’.
utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: ‘utf16’.
3. Buffering
To add buffering, wrap a stream with a corresponding ZnBufferedRead/WriteStream.
bufferedReadStream := ZnBufferedReadStream on: aStream.
bufferedWriteStream := ZnBufferedWriteStream on: aStream.
It is in general better to buffer the reading on the binary file and apply the encoding on the buffer in memory than the other way around. See
[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: ‘utf8’) upToEnd
]] timeToRun. “0:00:00:09.288”
[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile encoding: ‘utf8’)) upToEnd
]] timeToRun. “0:00:00:14.189”
4. File System
By default, file system files are buffered and utf8 encoded to keep backwards compatibility.
‘name’ asFileReference readStreamDo: [ :bufferedUtf8Stream | … ].
‘name’ asFileReference writeStreamDo: [ :bufferedUtf8Stream | … ].
FileStream also provides access to plain binary files using the #binaryRead/WriteStream messages. Binary streams are buffered by default also.
‘name’ asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | … ].
‘name’ asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | … ].
If you want a file with another encoding (to come in the PR https://github.com/pharo-project/pharo/pull/1134), you can specify it while obtaining the stream:
‘name’ asFileReference
    readStreamEncoded: ‘utf16’
    do: [ :bufferedUtf16Stream | … ].
‘name’ asFileReference
    writeStreamEncoded: ‘utf8’
    do: [ :bufferedUtf16Stream | … ].
5. Line Ending Conventions
If you want to write files following a specific line ending convention, use the ZnNewLineWriterStream.
This stream decorator will transform any line ending (cr, lf, crlf) into a defined line ending.
By default it chooses the platform line ending convention.
lineWriter := ZnNewLineWriterStream on: aStream.
If you want to choose another line ending convention you can do:
lineWriter forCr.
lineWriter forLf.
lineWriter forCrLf.
lineWriter forPlatformLineEnding.
6. About performance questions
Well, I’d say it we did it in the name of modularity. And yes, I believe that having separate responsibilities help in designing, testing and ensuring more easily the correctness of each of the parts in isolation.

I’ve done also some profiling and it does not look like we’ve lost in performance either (reading and decoding a 35MB file):
[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: ‘utf8’) next: binaryFile size.
]] timeToRun. “0:00:00:01.976”
[file := Smalltalk sourcesFile fullName.
(MultiByteFileStream fileNamed: file)
converter: (TextConverter newForEncoding: ‘utf8’);
upToEnd
] timeToRun. “0:00:00:02.147”
Advertisements
%d bloggers like this: