Accidental binary bloating via C/C++ class/struct + Objective-C

# Yusuke Suzuki (2 days ago)

Hello WebKittens,

I recently striped 830KB binary size in WebKit just by using a work-around. This email describes what happened so far, to prevent from happening again.

Problem

When C/C++ struct/class is included in field types and method types in Objective-C, Objective-C compiler puts type-enconding-string which gathers type information one-leve deep for C/C++ struct/class if

  1. The type is a pointer to C/C++ struct/class
  2. The type is a value of C/C++ struct/class
  3. The type is a reference to C/C++ struct/class

However, our WebKit C/C++ struct/class is typically very complex type using a lot of templates. Unfortunately, Objective-C compiler includes expanded template definition as a string and adds it as a type-enconding-string into the release binary!

For example, trac.webkit.org/changeset/254152/webkit, trac.webkit.org/changeset/254152/webkit is removing JSC::VM& from Objective-C signature, and it reduces 200KB binary size!

Another example is trac.webkit.org/changeset/254241/webkit, trac.webkit.org/changeset/254241/webkit, which removes a lot of WebCore::WebView* etc. from Objective-C method signature, and reduces 630KB binary.

Solution for now

We can purge type-encoding-string if we use Objective-C NS_DIRECT feature (which makes Objective-C function as C function calling convention, removing metadata). However, this does not work universally: with NS_DIRECT, Objective-C override does not work. This means we need to be extra-careful when using it.

So, as a simple, but effective work-around, in the above patch, we introduced NakedRef<T> / NakedPtr<T>. This is basically raw pointer / raw reference to T, with a wrapper class.

This leverages the behavior of Objective-C compiler’s mechanism “one-level deep type information collection”. Since NakedRef<T> / NakedPtr<T> introduces one-level deep field,

Objective-C compiler does not collect the type information of T if NakedPtr<T> is included in the fields / signatures, while the compiler collects information when T* is used.

So, if you are using T& / T* C/C++ struct/class in Objective-C, let’s convert it to NakedRef<T> / NakedPtr<T>. Then you could save much binary size immediately without causing any performance problem.

Future work

We would like to avoid including such types accidentally in Objective-C. We should introduce build-time hook script which detects such a thing. I uploaded the PoC script in bugs.webkit.org/show_bug.cgi?id=205968, bugs.webkit.org/show_bug.cgi?id=205968, and I’m personally planning to introduce such a hook into a part of build process.

Contact us to advertise here
# Filip Pizlo (2 days ago)

Wow, that sounds like an awesome find!

# Joseph Pecoraro (2 days ago)

This is a great idea!

# Darin Adler (2 days ago)

On Jan 13, 2020, at 5:52 PM, Yusuke Suzuki <ysuzuki at apple.com> wrote:

We can purge type-encoding-string if we use Objective-C NS_DIRECT feature (which makes Objective-C function as C function calling convention, removing metadata). However, this does not work universally: with NS_DIRECT, Objective-C override does not work. This means we need to be extra-careful when using it.

Yes, we need to be careful, but NS_DIRECT is likely to have more benefit than just binary shrinking, and it should do even more binary shrinking than hiding types from Objective-C. Use of NS_DIRECT will likely help performance, sometimes in a measurable way.

However, besides the risk of making something “non-overridable”, I think it’s only available in new versions of the clang compiler.

So, as a simple, but effective work-around, in the above patch, we introduced NakedRef<T> / NakedPtr<T>. This is basically raw pointer / raw reference to T, with a wrapper class. This leverages the behavior of Objective-C compiler’s mechanism “one-level deep type information collection”. Since NakedRef<T> / NakedPtr<T> introduces one-level deep field, Objective-C compiler does not collect the type information of T if NakedPtr<T> is included in the fields / signatures, while the compiler collects information when T* is used.

Very exciting. Does this cover all the cases we care about? Does come up for types that are not references or pointers? Maybe we can pass arguments by const reference? What about return values?

Future work

We would like to avoid including such types accidentally in Objective-C. We should introduce build-time hook script which detects such a thing. I uploaded the PoC script in bugs.webkit.org/show_bug.cgi?id=205968, bugs.webkit.org/show_bug.cgi?id=205968, and I’m personally planning to introduce such a hook into a part of build process.

Beautiful. Well worth doing.

Thanks!

— Darin

# Yusuke Suzuki (2 days ago)

On Jan 13, 2020, at 19:28, Darin Adler <darin at apple.com> wrote:

On Jan 13, 2020, at 5:52 PM, Yusuke Suzuki <ysuzuki at apple.com <mailto:ysuzuki at apple.com>> wrote:

We can purge type-encoding-string if we use Objective-C NS_DIRECT feature (which makes Objective-C function as C function calling convention, removing metadata). However, this does not work universally: with NS_DIRECT, Objective-C override does not work. This means we need to be extra-careful when using it.

Yes, we need to be careful, but NS_DIRECT is likely to have more benefit than just binary shrinking, and it should do even more binary shrinking than hiding types from Objective-C. Use of NS_DIRECT will likely help performance, sometimes in a measurable way.

Right. I guess that many parts of code requiring high-performance are already in C++ in WebKit project. But still, we should explore the places we can use NS_DIRECT, e.g. internal accessors of Objective-C classes. We could get benefit. One thing I would like to check is JSC APIs, since typical operation inside JSC APIs are very small. If one API is calling another internal objective-C methods, which are never overridden, we could get benefit by annotating these internal methods w/ NS_DIRECT.

However, besides the risk of making something “non-overridable”, I think it’s only available in new versions of the clang compiler.

So, as a simple, but effective work-around, in the above patch, we introduced NakedRef<T> / NakedPtr<T>. This is basically raw pointer / raw reference to T, with a wrapper class. This leverages the behavior of Objective-C compiler’s mechanism “one-level deep type information collection”. Since NakedRef<T> / NakedPtr<T> introduces one-level deep field, Objective-C compiler does not collect the type information of T if NakedPtr<T> is included in the fields / signatures, while the compiler collects information when T* is used.

Very exciting. Does this cover all the cases we care about? Does come up for types that are not references or pointers? Maybe we can pass arguments by const reference? What about return values?

Many cases are covered. Non-covered part is code passing C/C++ struct/class as values. We could shrink it if we allocate it by using std::unique_ptr / RefPtr / Ref and passing std::unique_ptr&& / RefPtr&& / Ref&& (since they are also introducing one-level nested structure). For the places using raw pointers / raw references, we can attempt to use NakedPtr<> / NakedRef<>.

Future work

We would like to avoid including such types accidentally in Objective-C. We should introduce build-time hook script which detects such a thing. I uploaded the PoC script in bugs.webkit.org/show_bug.cgi?id=205968, bugs.webkit.org/show_bug.cgi?id=205968, and I’m personally planning to introduce such a hook into a part of build process.

Beautiful. Well worth doing.

Yeah, we should do it. While the above two patches removed toooooo large type-encoding-strings (like, one type-encoding-string took more than 100KB…), we still have many of very long type-encoding-strings (like 6KB). Removing this can reduce binary-size. And it could be nice for performance if this string is used some part of Objective-C runtime hash-table etc. (Not sure whether it happens) by avoiding including such a large string into a hash table.

Want more features?

Request early access to our private beta of readable email premium.