-
Notifications
You must be signed in to change notification settings - Fork 80
Build process custom Maven plugin
As long as this page looks, it is still intended for discussion; there is a lot of room still left for design decisions, better ideas, and so on.
Since PL/Java was Mavenized in 2013, the pljava-so
subproject
has been built using the nar-maven-plugin
, assisted by the
maven-antrun-plugin
running some actions in an Ant build.xml
that
were not available as a Maven plugin.
Since Kenneth Olson added MSVC support in 2014, there has also been
some JavaScript in that process. With Ant already used in the build, an Ant
script
target offers a useful way for small bits of straight code to express
clearly what was unwieldy or impossible to express in strict Maven declarative
style using only existing plugins.
Experience with this combination has shown that the nar-maven-plugin
is not
a perfect fit for the needs of this build. Some of the reasons were covered in
the GSoC 2020 project idea.
It appears that there are enough details specific to PostgreSQL and PL/Java to warrant developing a custom Maven plugin to perform this build. It's been confirmed that Maven can build a plugin subproject and use it in another subproject in a single build, so this should not complicate the PL/Java build process by adding another step.
PGXS
is the native extension building system supplied in PostgreSQL for
building extensions in C or C++, implemented as a system of makefiles for
GNU make
. It encapsulates knowledge of the compiler and linker invocations
needed on different supported platforms, and standard actions such as running
pg_config
to obtain needed options and flags for the specific PostgreSQL
installation being built against.
The current build system has been essentially an effort to cobble
nar-maven-plugin
, Ant, and JavaScript into something that does what PGXS
does, but in cross-platform Java and from Maven. The issues filed against the
current system tend to reflect where it departs from what PGXS
would do.
The proposed new plugin has pgxs
in the name to suggest the intended
behavior similarity, but not necessarily to require it be implemented a
particular way, such as directly over the real PGXS
. (While the PGXS
makefiles are supplied with PostgreSQL, typically in a development package,
and PL/Java already requires C compilers and linkers and a JDK to build,
if there are platforms where GNU make
would be a separate install,
relying directly on PGXS
would lengthen the list of prerequisites for
building PL/Java from source.)
It is not a goal to eliminate JavaScript from the build process. It has
proven too useful to have a clear and compact scripting language available
to express some parts of the build process directly. Makefiles, too, as used
in PGXS
, get their flexibility from allowing bits of script wherever needed,
with the make
machinery supplying the dependency resolution and ordering.
It is not a goal to produce something as complete and flexible and automated
as make
. Really, PL/Java needs to compile some C files and link them. The
work is in selecting the right command sequences to do that on several different
platforms, relying on correct values from pg_config
.
The current build system could be said to have the role of scripting turned inside out. It uses JavaScript where necessary to do something the available Maven plugins would not do, instead of where it would be the clearest expression of a task.
For example, details of what options might be passed to a platform's compiler
or linker can be buried inside nar-maven-plugin
and not available for easy
inspection by reading PL/Java's POM. At the same time, a reader of the POM sees
72 lines of JavaScript to quote a string for C, an algorithm that hasn't
changed in twenty years and is not an interesting part of PL/Java's build.
The new plugin's philosophy should be to turn that around, implement the boring details and building blocks inside the plugin, and make some available as public methods that bits of script might call.
Ideally, it will have a configuration syntax that allows some JavaScript
inlined into the XML, the same as is now being done with the
maven-antrun-plugin
. Done right, we could drop the reliance on both
nar-maven-plugin
and maven-antrun-plugin
and move all of the build
logic into pom.xml
rather than having some of it split off in build.xml
.
Both our current build process and PGXS
do this. For PGXS
it happens
in Makefile.global
when it is included by PGXS.mk
.
For the current PL/Java build it happens in build.xml; ant
ends up writing values into a pgsql.properties
file just so
a Maven plugin can read the file and set the properties in Maven.
We should be able to simplify that and just set Maven's properties.
(But beware! They can't just be set as pljava-so
's properties;
pljava-packaging
uses them too, currently by reading that file again.)
Both PGXS
and nar-maven-plugin
contain embedded information on the
needed compiling and linking commands for different supported platforms.
The build-specific information from pg_config
gets merged into those
generic recipes.
For PGXS
, the embedded rules for linking a shared object on the different
platforms are found in Makefile.shlib
.
For the nar-maven-plugin
, they are found in a file inside the plugin's jar
file in the Maven repository, so the easiest way to inspect them is
on github. It is inconvenient to add support to PL/Java for
additional platforms, because the nar-maven-plugin
does not offer a way to
supply some additional platform definitions that add to its built-in set.
There is only the option of using a system property to specify another file
to be used instead of the built-in one, so a new platform can only be
supported by dropping a new file into the source directory and editing the
docs to tell a person building for that platform to use an extra command-line
option. (This commit is an example.)
The super-ambitious idea would be a Java parser for GNU makefiles that would gather the information directly from PostgreSQL's files. But that would be a major undertaking and we could easily get away with less.
If the plugin had a configuration section in the POM that would accept some
simple JavaScript (maybe looking like JSON, or like a map from platform names
to functions, etc.), that would be an easily readable form for the recipes
They would be right there in the POM for inspection, not buried somewhere
inside a jar file, and would be easy to edit to add new platforms in the
future. The initial set could just be human-populated by looking at
Makefile.shlib
, which wouldn't be much work, and changes are
infrequent.
(Just for the record, the Java makefile parser idea would not be completely bonkers; somebody has started one but no commits in several years. It has a TODO saying no support for many things I'm sure are in PostgreSQL makefiles, but it also has commits more recent than the TODO that mention some of those things, so maybe the TODO is out of date. I have not tried to run it. It is not BSD licensed, so would not belong in a PL/Java distribution, but if somebody forked it, got it complete enough to read PostgreSQL's makefiles, and deployed it to Maven Central, it would be usable as a build-time dependency. It might also get used more widely.)
The PGXS makefiles appear to support filtering of which global symbols to export for most or all supported platforms, something PL/Java does not currently do. That is worth supporting if we can: PL/Java only has a couple of entry points that need to be exported for PostgreSQL to call, and keeping the rest of its global symbols non-exported would reduce the pressure to give others long unwieldy names hoping they won't collide with other extensions.
When invoking other programs on Windows, various characters in command-line
arguments can cause the receiving program to parse the command line incorrectly
(background in issue 190). The root cause is inherent in the Java
Process
API itself, whose design doesn't take the weirdness of Windows
command-line parsing into effect.
What makes Windows command-line parsing weird is that Windows doesn't do it: it is up to each program to parse the command line it receives, and different ones can use different rules.
In practice, most programs use the C library they are linked to, so the number of different rule sets out there is more like the number of C libraries/versions in use, but that number is greater than 1 and with rules that differ. (I am relying on this source for these details.)
That means on Windows, when invoking a program with arguments, in some way which lexical rules to apply must also be somehow specified, so that any interesting argument values can have the correct escaping done to ensure they are recovered correctly when the invoked program parses them.
Windows might have something comparable to ldd
that could look at a target
executable and report what runtime library it links to, so the right rules
could be selected. But autodetection might be more ambitious than we need
(and could still have exceptions anyway, programs that link to an unknown
library, or do their own custom argument parsing). So a way to specify the
right set of rules to use will be necessary anyway, and probably sufficient.
There's no need to implement every different set of lexical rules any Windows program has ever used, as long as we figure out what rules are used by each of the compiling/linking tools our build recipes will use, and make sure to implement those.
An obvious idea would look like subclasses of ProcessBuilder
that apply
different escaping rules. Code would pass individual arguments to the builder
in the usual way, but they would have the right rules applied when the process
is started.
However, ProcessBuilder
is final
, so the API can't look exactly like that.
It could look like some other class that has a start
method taking a
ProcessBuilder
argument and returning a Process
, much like the start
method of ProcessBuilder
itself. The argument list of a ProcessBuilder
can be retrieved and modified. A POSIX-flavored subclass would have a start
method that does nothing special, and directly calls start
on the
ProcessBuilder
; a Windows-flavored one could modify the arguments first.
It won't be enough to simply transform the argument list, before calling
ProcessBuilder.start
, so that the values have the right escaping for the
target program. That's because this argument list still has to go through
Java's start
implementation, which will try to add Windows escaping
itself (using its built-in rules that were the problem in the first place).
It can be absorbing to try to think of ways to transform the arguments so that the end result of (our transformation) followed by (Java's transformation) produces the right values for the target program. But depending on the rules in play, there is no guarantee it is even possible.
When I faced a similar problem another time, I ended up transforming the command
and arguments into an invocation of python
, with the original arguments
passed through base64
(no spaces, no punctuation, nothing to go wrong) and
a snippet of Python code that would un-base64 the arguments and invoke the
intended target.
On Windows, a similar idea using PowerShell might be most natural.
I don't know enough about PowerShell to say whether there's an advantage either way between just passing encoded args as I was doing with Python (assuming PowerShell has functions to decode base64), or to construct a serialized object to pass to PowerShell as by the PowerShell Remoting Protocol.
This is not directly related to building the native code, but with adding
a custom plugin anyway, it would be helpful to have one simple Mojo
that implements MavenReport
and allows a bit of JavaScript to be
given and run in the site
life cycle.
The reason is that I am very unhappy with the maven-javadoc-plugin
, as
this commit and this one will show.
By the end of those two commits, it seems PL/Java now gets its Javadoc built
by threading one tricky path, maybe even the only path, between the existing
bugs in the plugin and javadoc
. The arrival any day of one more bug in the
wrong place might leave zero paths.
That is way too much work and too much risk, considering that the docs could
be built simply by launching javadoc
with nearly all default options. The
extra work here has only been to get maven-javadoc-plugin
not to add extra
options that cause failure.
In the build life cycle, using maven-antrun-plugin
again with four lines
of JavaScript to launch javadoc
with the right options would solve the
problem once and for all.
But the maven-antrun-plugin
's Mojos do not implement MavenReport
, so that
isn't an available option in the site
life cycle.