Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HHH-18976 Avoid usage of Array.newInstance #9589

Merged
merged 6 commits into from
Jan 14, 2025
Merged

Conversation

yrodiere
Copy link
Member

@yrodiere yrodiere commented Jan 8, 2025

https://hibernate.atlassian.net/browse/HHH-18976

A simpler solution than #9569, without any attempt to solve HHH-16809 (on contrary to #9576), because that makes things complicated.

Two things in here:

  1. Sets up forbiddenapis to forbid usage of Array.newInstance (and util methods known to use it) by default. This is mostly as a heads-up to not use it "just because", since the check can be bypassed if necessary by annotating the method with @AllowReflection.
  2. Reworks a few implementations to avoid usage of Array.newInstance. E.g.:
    • use array.clone() where relevant -- i.e. when we want the same array type with the same length.
    • don't use Array.newInstance just to retrieve the Class of an array type -- since we ultimately just look use that Class to look up a descriptor in a registry.
    • use new Object[...] instead where possible -- e.g. when the array ends up being used internally only, and we don't leverage array type support so we only need Object[].
  3. Reworks a few implementations to avoid Arrays.copyOf as well (because it relies on Array.newInstance). => Not necessary, see conversation below.

This should get rid of most problematic uses of reflection when compiling to native binaries (see Jira issue).
There are still a few uses of Array.newInstance but as far as I can see they are completely justified and cannot easily be worked around: array mapping, JSON/XML serialization, instantiation of query result types, ...

@yrodiere yrodiere force-pushed the HHH-18976 branch 3 times, most recently from b581178 to 4340eab Compare January 9, 2025 08:42
Copy link
Member Author

@yrodiere yrodiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments to the main areas of changes -- the rest is mostly dumb find and replace.

@sebersole if you can (please!) confirm those commented sections look fine by you, we can probably merge and solve the upgrade issue in Quarkus?

Comment on lines 9 to 21
# Reflection-related
@defaultMessage Use 'new Object[]' instead if possible. This forbidden method requires reflection and may not work in natively compiled applications. If you really must use this forbidden method, annotate the calling method with @AllowReflection.

java.lang.reflect.Array#newInstance(java.lang.Class, int)
java.lang.reflect.Array#newInstance(java.lang.Class, int[])
org.hibernate.internal.util.collections.ArrayHelper#newInstance(java.lang.Class, int)
org.hibernate.internal.util.collections.ArrayHelper#filledArray(java.lang.Object, java.lang.Class, int)
org.hibernate.internal.util.collections.ArrayHelper#join(java.lang.Object[], java.lang.Object[])

@defaultMessage Use '.clone()' or a type-specific 'ArrayHelper.copyOf' instead if possible. This forbidden method requires reflection and may not work in natively compiled applications. If you really must use this forbidden method, annotate the calling method with @AllowReflection.

java.util.Arrays#copyOf(java.lang.Object[], int)
java.util.Arrays#copyOf(java.lang.Object[], int, java.lang.Class)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these constraints are reasonable -- especially since we can bypass them where necessary. Essentially this gives a heads-up to not introduce new uses of these methods by mistake, but doesn't prevent it if we need to.

private final ConcurrentHashMap<Type, JavaType<?>> descriptorsByType = new ConcurrentHashMap<>();
private final ConcurrentHashMap<String, JavaType<?>> descriptorsByTypeName = new ConcurrentHashMap<>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and other changes in the same file) avoids the need to use reflection to create an array Type, just for the purpose of retrieving the corresponding JavaType.

Regardless of the original intent, I believe using type names (strings) as keys would be more consistent with BasicTypeRegistry? (see for example org.hibernate.type.BasicTypeRegistry#getRegisteredType(java.lang.reflect.Type))

Comment on lines -194 to +193
ids.toArray( LoaderHelper.createTypedArray( ids.get( 0 ).getClass(), ids.size() ) ),
ids.toArray( new Object[0] ),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is still done, but deeper within the multiLoad call, and only when necessary (see MultiIdEntityLoaderArrayParam).


public AbstractMultiIdEntityLoader(EntityMappingType entityDescriptor, SessionFactoryImplementor sessionFactory) {
this.entityDescriptor = entityDescriptor;
this.sessionFactory = sessionFactory;
identifierMapping = getLoadable().getIdentifierMapping();
idArray = (Object[]) Array.newInstance( identifierMapping.getJavaType().getJavaTypeClass(), 0 );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the main problem affecting Quarkus, FWIW.
This being done in the abstract class meant it was done for MultiIdEntityLoaderArrayParam -- where it's needed, and generally innocuous since it handles things like Integer[] -- but also MultiIdEntityLoaderStandardParam -- where it's not needed, and possibly problematic as it could be handling things like MyIdClass[], requiring MyIdClass[] (the array type) to be registered for reflection in native images.
It's not trivial to detect all ID types in Quarkus (think of orm.xml using property access, or IDs defined in mappedsuperclasses with generics), so this need for registering MyIdClass[] for reflection was a problem.

@franz1981
Copy link
Contributor

@yrodiere I would ask the JDK folks, because Arrays.copyOf is not actually executing that Java code (in Hotspot), since it's an intrinsics candidate (which means Hotspot will replace the Java code with a completely different optimized ad-hoc code): see https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/Arrays.java#L3505C7-L3505C24

@yrodiere
Copy link
Member Author

yrodiere commented Jan 9, 2025

@yrodiere I would ask the JDK folks, because Arrays.copyOf is not actually executing that Java code (in Hotspot), since it's an intrinsics candidate (which means Hotspot will replace the Java code with a completely different optimized ad-hoc code): see https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/Arrays.java#L3505C7-L3505C24

@franz1981 Okay, I must admit the last part about copyOf is from memory -- I remember problems with it with native compilation as well, but I may be wrong.
That being said... the problem is precisely with native image compilation with GraalVM, so does Hotspot even matter?

@franz1981
Copy link
Contributor

the problem is precisely with native image compilation with GraalVM, so does Hotspot even matter?

I can tell you when it could matter: the reason why is an intrinsics is because the zeroing of the array "could" be avoided between the creation of the array and the actual copy - while we loose this advantage by performing ourselves the array creation + copy (in theory).
In short; I would make sure with a benchmark that it does things without impacting JVM mode since is what the most of users are using

@yrodiere
Copy link
Member Author

yrodiere commented Jan 9, 2025

the problem is precisely with native image compilation with GraalVM, so does Hotspot even matter?

I can tell you when it could matter: the reason why is an intrinsics is because the zeroing of the array "could" be avoided between the creation of the array and the actual copy - while we loose this advantage by performing ourselves the array creation + copy (in theory). In short; I would make sure with a benchmark that it does things without impacting JVM mode since is what the most of users are using

I can have the ArrayHelper.copyOf implementations just delegate to Arrays.copyOf? Then they would only serve as a way to clarify "we don't do reflection here, we only do String[] or Object[]". And no need for a benchmark.

EDIT: well we would potentially do reflection with GraalVM, but just for String[]/Object[], which is reasonable

@yrodiere
Copy link
Member Author

yrodiere commented Jan 9, 2025

I'll also double check in Quarkus that Arrays.copyOf actually requires reflection like Array.newInstance does...

…lans

It's more consistent, and happens to get rid of ArrayMutabilityPlan,
which involved an unnecessary use of Array.newInstance.

I've also seen claims that clone() performs better than
Array.newInstance() due to not having to zero-out the allocated memory,
but I doubt that's relevant here.
No functional impact, it's just less redundant.
…Standard

MultiEntityLoaderStandard is used for arbitrary ID types, including IdClass,
making it very problematic to instantiate T[] where T is the ID type:
in native images, it requires registering T[] for reflection for every T
that can possibly be used as an ID type.

Fortunately, MultiEntityLoaderStandard does not, in fact, need
concrete-type arrays: Object[] works perfectly well with this
implementation, and only the other implementation,
MultiIdEntityLoaderArrayParam, actually needs concrete-type arrays.
We're truly in a lucky streak, because MultiIdEntityLoaderArrayParam is
only used for well-known, basic types such as Integer, which can easily
be registered for reflection in native images, and likely will be for
other reasons anyway.

Some of the dynamic instantiations were originally introduced to fix
the following issue:

* HHH-17201 -- tested in MultiIdEntityLoadTests

The corresponding tests still pass after removing these dynamic array
instantiations.
@yrodiere
Copy link
Member Author

I'll also double check in Quarkus that Arrays.copyOf actually requires reflection like Array.newInstance does...

It does not, for some reason.

I removed the commits about Arrays.copyOf. Thanks for the heads-up @franz1981!

I also checked (locally) that this changeset solves the specific reflection-related problem we had in my PR to upgrade Quarkus to ORM 7.0 (quarkusio/quarkus#41310).

@yrodiere
Copy link
Member Author

Steve is likely busy, but others have reviewed and I addressed objections.

Let's merge :)

@yrodiere yrodiere merged commit 5ffff1b into hibernate:main Jan 14, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants