How to parse ndjson in Pharo with NeoJSON

Reading the ‘format’ is easy, just keep on doing #next for each JSON expression (whitespace is ignored).
| data reader |
data := ‘{“smalltalk”: “cool”}
{“pharo”: “cooler”}’.
reader := NeoJSONReader on: data readStream.
Array streamContents: [ :out |
  [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].

Preventing intermediary data structures is easy too, use streaming.
| client reader data networkStream |
(client := ZnClient new)
  streaming: true;
  url: ‘https://github.com/NYPL-publicdomain/data-and-utilities/blob/master/items/pd_items_1.ndjson?raw=true’;
  get.
networkStream := ZnCharacterReadStream on: client contents.
reader := NeoJSONReader on: networkStream.
data := Array streamContents: [ :out |
  [ reader atEnd ] whileFalse: [ out nextPut: reader next ] ].
client close.
data.

It took a couple of seconds, it is 80MB+ over the network for 50K items after all.
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: