[jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

# Saam Barati (4 days ago)

To elaborate: I ran this same experiment before. And I forgot to turn off the RegExp JIT and got results similar to what you got. Once I turned off the RegExp JIT, I saw no perf difference.

Contact us to advertise here
# Yusuke Suzuki (3 days ago)

On Thu, Sep 20, 2018 at 12:54 AM Saam Barati <sbarati at apple.com> wrote:

To elaborate: I ran this same experiment before. And I forgot to turn off the RegExp JIT and got results similar to what you got. Once I turned off the RegExp JIT, I saw no perf difference.

Yeah, I disabled JIT and RegExpJIT explicitly by using

export JSC_useJIT=false export JSC_useRegExpJIT=false

and I checked no JIT code is generated by running dumpDisassembly. And I also put CRASH() in ExecutableAllocator::singleton() to ensure no executable memory is allocated. The result is the same. I think useJIT=false disables RegExp JIT too.

                                       baseline

patched

ai-astar 3499.046+-14.772 ^ 1897.624+-234.517 ^ definitely 1.8439x faster audio-beat-detection 1803.466+-491.965 970.636+-428.051 might be 1.8580x faster audio-dft 1756.985+-68.710 ^ 954.312+-528.406 ^ definitely 1.8411x faster audio-fft 1637.969+-458.129 850.083+-449.228 might be 1.9268x faster audio-oscillator 1866.006+-569.581 ^ 967.194+-82.521 ^ definitely 1.9293x faster imaging-darkroom 2156.526+-591.042 ^ 1231.318+-187.297 ^ definitely 1.7514x faster imaging-desaturate 3059.335+-284.740 ^ 1754.128+-339.941 ^ definitely 1.7441x faster imaging-gaussian-blur 16034.828+-1930.938 ^ 7389.919+-2228.020 ^ definitely 2.1698x faster json-parse-financial 60.273+-4.143 53.935+-28.957 might be 1.1175x faster json-stringify-tinderbox 39.497+-3.915 38.146+-9.652 might be 1.0354x faster stanford-crypto-aes 873.623+-208.225 ^ 486.350+-132.379 ^ definitely 1.7963x faster stanford-crypto-ccm 538.707+-33.979 ^ 285.944+-41.570 ^ definitely 1.8840x faster stanford-crypto-pbkdf2 1929.960+-649.861 ^ 1044.320+-1.182 ^ definitely 1.8481x faster stanford-crypto-sha256-iterative 614.344+-200.228 342.574+-123.524 might be 1.7933x faster

<arithmetic> 2562.183+-207.456 ^

1304.749+-312.963 ^ definitely 1.9637x faster

I think this result is not related to RegExp JIT since ai-astar is not using RegExp.

Best

# Saam Barati (3 days ago)

Interesting! I must have not run this experiment correctly when I did it.

# Yusuke Suzuki (3 days ago)

I've just set up MacBook Pro to measure the effect on macOS.

The results are the followings.

VMs tested:

"baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/jsc

"patched" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/jsc

Collected 2 samples per benchmark/VM, with 2 VM invocations per benchmark. Emitted a call to gc() between sample

measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime()

function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in

milliseconds.

                                       baseline

patched

ai-astar 1738.056+-49.666 ^ 1568.904+-44.535 ^ definitely 1.1078x faster

audio-beat-detection 1127.677+-15.749 ^ 972.323+-23.908 ^ definitely 1.1598x faster

audio-dft 942.952+-107.209 919.933+-310.247 might be 1.0250x faster

audio-fft 985.489+-47.414 ^ 796.955+-25.476 ^ definitely 1.2366x faster

audio-oscillator 967.891+-34.854 ^ 801.778+-18.226 ^ definitely 1.2072x faster

imaging-darkroom 1265.340+-114.464 ^ 1099.233+-2.372 ^ definitely 1.1511x faster

imaging-desaturate 1737.826+-40.791 ? 1749.010+-167.969 ?

imaging-gaussian-blur 7846.369+-52.165 ^ 6392.379+-1025.168 ^ definitely 1.2275x faster

json-parse-financial 33.141+-0.473 33.054+-1.058

json-stringify-tinderbox 20.803+-0.901 20.664+-0.717

stanford-crypto-aes 401.589+-39.750 376.622+-12.111 might be 1.0663x faster

stanford-crypto-ccm 245.629+-45.322 228.013+-8.976 might be 1.0773x faster

stanford-crypto-pbkdf2 941.178+-28.744 864.462+-60.083 might be 1.0887x faster

stanford-crypto-sha256-iterative 299.988+-47.729 270.849+-32.356 might be 1.1076x faster

<arithmetic> 1325.281+-2.613 ^

1149.584+-75.875 ^ definitely 1.1528x faster

Interestingly, the improvement is not so large. In Linux box, it was 2x. But in macOS, it is 15%. But I think it is very nice if we can get 15% boost without any drawbacks.

# Filip Pizlo (3 days ago)

I think that we should move to removing JSVALUE32_64, since it doesn’t get significant testing or maintenance anymore. I’d love it if 32-bit targets used the cloop with JSVALUE64, so that we can rip out the 32-bit jit and offlineasm backends, and remove the 32-bit representation code from the runtime.

I’m fine with using asm llint

# Michael Catanzaro (2 days ago)

I believe Guillaume has previously established that results in a substantial performance regression for WPE. It is currently running in production on tens of millions of consumer set top boxes. I think that's substantial testing. :)

Michael

# Geoffrey Garen (2 days ago)

Interestingly, the improvement is not so large. In Linux box, it was 2x. But in macOS, it is 15%.

Is there something Linux-specific about CLoop or LLInt, or is this a compiler difference?

Thanks, Geoff

# Michael Catanzaro (2 days ago)

On Thu, Sep 20, 2018 at 11:49 AM, Geoffrey Garen <ggaren at apple.com>

wrote:

Is there something Linux-specific about CLoop or LLInt, or is this a compiler difference?

No clue. I'll refer you to the results of Guillaume's investigation:

lists.webkit.org/pipermail/webkit-dev/2018-February/029877.html

Michael

# Filip Pizlo (2 days ago)

Most JSC development focuses on JSVALUE64. JSVALUE32_64 is currently years behind JSVALUE64 - it has no concurrent JIT, no concurrent GC, no FTL. We regularly do tuning that ends up affecting both JSVALUE32_64 and JSVALUE64 without even testing its impact on JSVALUE32_64. JSVALUE32_64 is a second-class citizen in JSC.

I propose this:

  • Enable cloop/JSVALUE64 to work on 32-bit. I don’t think it does right now, but that’s probably trivial to fix.
  • Switch Darwin ports to that configuration for 32-bit.
  • When changes land to support new features, make it mandatory to support JSVALUE64 and optional to support JSVALUE32_64. Such changes should include whoever volunteers to maintain JSVALUE32_64 in CC.

If you guys consider JSVALUE32_64 to be critical, then you can go ahead and maintain it. We’ll let JSVALUE32_64 stay in the tree so long as someone is maintaining it.

# Michael Catanzaro (2 days ago)

On Thu, Sep 20, 2018 at 12:02 PM, Filip Pizlo <fpizlo at apple.com> wrote:

  • Enable cloop/JSVALUE64 to work on 32-bit. I don’t think it does right now, but that’s probably trivial to fix.
  • Switch Darwin ports to that configuration for 32-bit.
  • When changes land to support new features, make it mandatory to support JSVALUE64 and optional to support JSVALUE32_64. Such changes should include whoever volunteers to maintain JSVALUE32_64 in CC.

If you guys consider JSVALUE32_64 to be critical, then you can go ahead and maintain it. We’ll let JSVALUE32_64 stay in the tree so long as someone is maintaining it.

Yes that's fine with us. I think that's the previous agreement, anyway. :)

Michael

# Yusuke Suzuki (2 days ago)

Yeah, I'm not planning to enable LLInt ASM interpreter on 32bit architectures since no buildbot exists for this configuration. And we should make 32bit architectures JSVALUE64, so LLInt JSVALUE32_64 should be removed in the future.

On Fri, Sep 21, 2018 at 2:33 AM Michael Catanzaro <mcatanzaro at igalia.com>

wrote:

# Guillaume Emont (a day ago)

Quoting Yusuke Suzuki (2018-09-21 10:10:59)

Yeah, I'm not planning to enable LLInt ASM interpreter on 32bit architectures since no buildbot exists for this configuration.

I'm confused. Do you mean you don't want to enable LLint instead of CLoop, for the case when JIT is disabled on 32-bit architectures? FTR, the configuration LLInt(with offlineasm)+jit+dfg is tested in 32-bit testbots for at least mips, armv7 and x86.

And we should make 32bit architectures JSVALUE64, so LLInt JSVALUE32_64 should be removed in the future.

See what Filip and Michael were saying. We believe that we need JSVALUE32_64, and we are willing to maintain it, as the performance gap between LLInt or CLoop and JIT+DFG on 32-bit architectures is significant.

Guillaume

# Filip Pizlo (a day ago)

On Sep 21, 2018, at 9:44 AM, Guillaume Emont <guijemont at igalia.com> wrote:

Quoting Yusuke Suzuki (2018-09-21 10:10:59)

Yeah, I'm not planning to enable LLInt ASM interpreter on 32bit architectures since no buildbot exists for this configuration.

I'm confused. Do you mean you don't want to enable LLint instead of CLoop, for the case when JIT is disabled on 32-bit architectures?

If you guys want to take responsibility for 32-bit then you can enable whatever LLInt config you want on 32-bit.

FTR, the configuration LLInt(with offlineasm)+jit+dfg is tested in 32-bit testbots for at least mips, armv7 and x86.

And we should make 32bit architectures JSVALUE64, so LLInt JSVALUE32_64 should be removed in the future.

See what Filip and Michael were saying. We believe that we need JSVALUE32_64, and we are willing to maintain it, as the performance gap between LLInt or CLoop and JIT+DFG on 32-bit architectures is significant.

I’m saying we should remove JSVALUE32_64. That is my preference. I’m letting it stay in tree so long as someone maintains it, but honestly I’d prefer it if it wasn’t maintained and if we could let it die.

I’d like to see the majority of JSC development move to 64-bit. I’d prefer if new features or enhancements were 64-bit only, since that means that it will take less time to develop and test them. I think that folks doing JSC development should be encouraged to land changes only for 64-bit since that’s our focus as a project.

Want more features?

Request early access to our private beta of readable email premium.