ARM Rasp first positive feedback

As doing is better than many hand waving we want to share with you the following information.

Hi,

This is fantastic news on several levels – the new JIT itself, but even more the new approach, process, infrastructure and tests.

(Sorry, but this is a long/technical mail).

A week ago I was one of the first people outside the development team to be able to test the new ARM64 JIT VM on hardware they did not even test on.

In particular I used an Amazon AWS EC2 T4g.micro instance (1 GB) with Ubuntu Server 20.04.1 LTS.

These machines use an ARM64 CPU (AWS Graviton2, Neoverse N1, Cortex-A76, ARM v8).

ubuntu@ip-172-30-0-23:~/test$ uname -a
Linux ip-172-30-0-23 5.4.0-1030-aws #31-Ubuntu SMP Fri Nov 13 11:42:04 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

ubuntu@ip-172-30-0-23:~/test$ lscpu 
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       ARM
Model:                           1
Model name:                      Neoverse-N1
Stepping:                        r3p1
BogoMIPS:                        243.75
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        2 MiB
L3 cache:                        32 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

My reaction after one hour ?

Wow, wow, wow, this is incredible. It all just works and it seems pretty fast as well. I played with the vm/image for a couple of minutes and so far everything worked as expected and I had no crashes at all.

The order of magnitude of 1 tinyBenchmarks is very similar to other (server) machines:

"'1894542090 bytecodes/sec; 146296146 sends/sec'" "arm64"
"'2767567567 bytecodes/sec; 258718969 sends/sec'" "macOS"
"'1227082085 bytecodes/sec; 109422120 sends/sec'" "aws"
"'2101590559 bytecodes/sec; 166532391 sends/sec'" "t3 lxd"

Here is a benchmark in the HTTP space, how fast can ZnServer respond to multiple concurrent requests over the local network:

$ ./pharo Pharo.image eval --no-quit 'ZnServer startDefaultOn: 1701' &
$ ab -k -n 1024 -c 8 http://localhost:1701/small
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 102 requests
Completed 204 requests
Completed 306 requests
Completed 408 requests
Completed 510 requests
Completed 612 requests
Completed 714 requests
Completed 816 requests
Completed 918 requests
Completed 1020 requests
Finished 1024 requests
Server Software:        Zinc
Server Hostname:        localhost
Server Port:            1701
Document Path:          /small
Document Length:        124 bytes
Concurrency Level:      8
Time taken for tests:   0.268 seconds
Complete requests:      1024
Failed requests:        0
Keep-Alive requests:    1024
Total transferred:      317440 bytes
HTML transferred:       126976 bytes
Requests per second:    3814.45 [#/sec] (mean)
Time per request:       2.097 [ms] (mean)
Time per request:       0.262 [ms] (mean, across all concurrent requests)
Transfer rate:          1154.76 [Kbytes/sec] received
Connection Times (ms)
             min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    2  19.0      0     267
Waiting:        0    2  19.0      0     267
Total:          0    2  19.0      0     267
Percentage of the requests served within a certain time (ms)
 50%      0
 66%      0
 75%      0
 80%      0
 90%      0
 95%      0
 98%      0
 99%     42
100%    267 (longest request)

That is 3800 req/s with 8 concurrent threads, each response 124 bytes. And the output document is dynamically generated each time !

Now a cached static binary document, first a small one (64 bytes):

$ ab -k -n 1024 -c 8 http://localhost:1701/bytes
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 102 requests
Completed 204 requests
Completed 306 requests
Completed 408 requests
Completed 510 requests
Completed 612 requests
Completed 714 requests
Completed 816 requests
Completed 918 requests
Completed 1020 requests
Finished 1024 requests
Server Software:        Zinc
Server Hostname:        localhost
Server Port:            1701
Document Path:          /bytes
Document Length:        64 bytes
Concurrency Level:      8
Time taken for tests:   0.214 seconds
Complete requests:      1024
Failed requests:        0
Keep-Alive requests:    1024
Total transferred:      256000 bytes
HTML transferred:       65536 bytes
Requests per second:    4778.62 [#/sec] (mean)
Time per request:       1.674 [ms] (mean)
Time per request:       0.209 [ms] (mean, across all concurrent requests)
Transfer rate:          1166.65 [Kbytes/sec] received
Connection Times (ms)
             min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    2  12.3      0     207
Waiting:        0    2  12.3      0     207
Total:          0    2  12.3      0     207
Percentage of the requests served within a certain time (ms)
 50%      0
 66%      0
 75%      0
 80%      0
 90%      0
 95%      0
 98%      5
 99%     64
100%    207 (longest request)

That is 4700 req/s

Now a larger one, 1024 bytes:

$ ab -k -n 1024 -c 8 http://localhost:1701/bytes/1024
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 102 requests
Completed 204 requests
Completed 306 requests
Completed 408 requests
Completed 510 requests
Completed 612 requests
Completed 714 requests
Completed 816 requests
Completed 918 requests
Completed 1020 requests
Finished 1024 requests
Server Software:        Zinc
Server Hostname:        localhost
Server Port:            1701
Document Path:          /bytes/1024
Document Length:        1024 bytes
Concurrency Level:      8
Time taken for tests:   0.228 seconds
Complete requests:      1024
Failed requests:        0
Keep-Alive requests:    1024
Total transferred:      1241088 bytes
HTML transferred:       1048576 bytes
Requests per second:    4484.93 [#/sec] (mean)
Time per request:       1.784 [ms] (mean)
Time per request:       0.223 [ms] (mean, across all concurrent requests)
Transfer rate:          5308.34 [Kbytes/sec] received
Connection Times (ms)
             min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    2  16.2      0     227
Waiting:        0    2  16.2      0     227
Total:          0    2  16.2      0     227
Percentage of the requests served within a certain time (ms)
 50%      0
 66%      0
 75%      0
 80%      0
 90%      0
 95%      0
 98%      0
 99%     41
100%    227 (longest request)

Still 4400 req/s – 1024 requests finished in about 0.25 seconds, transferring 1Mb.

These are really good numbers !

And under this load, the image+vm remained totally stable.

Great, great work and thanks again everyone for the effort. You can be very proud for this achievement.

Sven

%d bloggers like this: