22

I have an interesting chicken-and-egg problem and a potential solution to it (see my posted answer), but that solution uses CMake in an unusual way. Better alternatives or comments would be welcome.

THE PROBLEM:

The simple version of the problem can be described as a single CMake project with the following characteristics:

  1. One of the build targets is a command-line executable which I'll call mycomp, the source of which is in a mycompdir and making any modifications to the contents of that directory is not possible.
  2. The project contains text files (I'll call them foo.my and bar.my) which need mycomp run on them to produce a set of C++ sources and headers and some CMakeLists.txt files defining libraries built from those sources.
  3. Other build targets in the same project need to link against the libraries defined by those generated CMakeLists.txt files. These other targets also have sources which #include some of the generated headers.

You can think of mycomp as being something like a compiler and the text files in step 2 as some sort of source files. This presents a problem, because CMake needs the CMakeLists.txt files at configure time, but mycomp is not available until build time and therefore isn't available on the first run to create the CMakeLists.txt files early enough.

NON-ANSWER:

Normally, an ExternalProject-based superbuild arrangement would be a potential solution to this, but the above is a considerable simplification of the actual project I am dealing with and I don't have the freedom to split the build into different parts or perform other large scale restructuring work.

1 Answer 1

26

The crux of the problem is needing mycomp to be available when CMake is run so that the generated CMakeLists.txt files can be created and then pulled in with add_subdirectory(). A possible way to achieve this is to use execute_process() to run a nested cmake-and-build from the main build. That nested cmake-and-build would use the exact same source and binary directories as the top level CMake run (unless cross compiling). The general structure of the main top level CMakeLists.txt would be something like this:

# Usual CMakeLists.txt setup stuff goes here...

if(EARLY_BUILD)
    # This is the nested build and we will only be asked to
    # build the mycomp target (see (c) below)
    add_subdirectory(mycompdir)

    # End immediately, we don't want anything else in the nested build
    return()
endif()

# This is the main build, setup and execute the nested build
# to ensure the mycomp executable exists before continuing

# (a) When cross compiling, we cannot re-use the same binary dir
#     because the host and target are different architectures
if(CMAKE_CROSSCOMPILING)
    set(workdir "${CMAKE_BINARY_DIR}/host")
    execute_process(COMMAND ${CMAKE_COMMAND} -E make_directory "${workdir}")
else()
    set(workdir "${CMAKE_BINARY_DIR}")
endif()

# (b) Nested CMake run. May need more -D... options than shown here.
execute_process(COMMAND ${CMAKE_COMMAND} -G "${CMAKE_GENERATOR}"
                        -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
                        -DCMAKE_MAKE_PROGRAM=${CMAKE_MAKE_PROGRAM}
                        -DEARLY_BUILD=ON
                        ${CMAKE_SOURCE_DIR}
               WORKING_DIRECTORY "${workdir}")

# (c) Build just mycomp in the nested build. Don't specify a --config
#     because we cannot know what config the developer will be using
#     at this point. For non-multi-config generators, we've already
#     specified CMAKE_BUILD_TYPE above in (b).
execute_process(COMMAND ${CMAKE_COMMAND} --build . --target mycomp
                WORKING_DIRECTORY "${workdir}")

# (d) We want everything from mycompdir in our main build,
#     not just the mycomp target
add_subdirectory(mycompdir)

# (e) Run mycomp on the sources to generate a CMakeLists.txt in the
#     ${CMAKE_BINARY_DIR}/foobar directory. Note that because we want
#     to support cross compiling, working out the location of the
#     executable is a bit more tricky. We cannot know whether the user
#     wants debug or release build types for multi-config generators
#     so we have to choose one. We cannot query the target properties
#     because they are only known at generate time, which is after here.
#     Best we can do is hardcode some basic logic.
if(MSVC)
    set(mycompsuffix "Debug/mycomp.exe")
elseif(CMAKE_GENERATOR STREQUAL "Xcode")
    set(mycompsuffix "Debug/mycomp")
else()
    set(mycompsuffix "mycomp")
endif()
set(mycomp_EXECUTABLE "${workdir}/mycompdir/${mycompsuffix}")
execute_process(COMMAND "${mycomp_EXECUTABLE}" -outdir foobar ${CMAKE_SOURCE_DIR}/foo.my ${CMAKE_SOURCE_DIR}/bar.my)

# (f) Now pull that generated CMakeLists.txt into the main build.
#     It will create a CMake library target called foobar.
add_subdirectory(${CMAKE_BINARY_DIR}/foobar ${CMAKE_BINARY_DIR}/foobar-build)

# (g) Another target which links to the foobar library
#     and includes headers from there
add_executable(gumby gumby.cpp)
target_link_libraries(gumby PUBLIC foobar)
target_include_directories(gumby PUBLIC foobar)

If we don't re-use the same binary directory at (b) and (c) as we use for the main build, we end up building mycomp twice, which we obviously want to avoid. For cross compiling, we cannot avoid that, so in such cases we build the mycomp tool off to the side in a separate binary directory.

I've experimented with the above approach and indeed it appears to work in the real world project that prompted the original question, at least for the Unix Makefiles, Ninja, Xcode (OS X and iOS) and Visual Studio generators. Part of the attractiveness of this approach is that it only requires a modest amount of code to be added just to the top level CMakeLists.txt file. Nevertheless, there are some observations that should be made:

  • If the compiler or linker commands for mycomp and its sources are different in any way between the nested build and the main build, the mycomp target ends up getting rebuilt a second time at (d). If there are no differences, mycomp only gets built once when not cross compiling, which is exactly what we want.
  • I see no easy way to pass exactly the same arguments to the nested invocation of CMake at (b) as was passed to the top level CMake run (basically the problem described here). Reading CMakeCache.txt isn't an option since it won't exist on the first invocation and it would not give you any new or changed arguments from the current run anyway. The best I can do is to set those CMake variables I think are potentially going to be used and which may influence the compiler and linker commands of mycomp. This can be worked around by adding more and more variables as I encounter ones I discover I need, but that's not ideal.
  • When re-using the same binary directory, we are relying on CMake not starting to write any of its files to the binary directory until the generate stage (well, at least until after the build at (c) completes). For the generators tested, it appears we are okay, but I don't know if all generators on all platforms follow this behaviour too (and I can't test every single combination to find out!). This is the part that gives me the greatest concern. If anyone can confirm with reasoning and/or evidence that this is safe for all generators and platforms, that would be valuable (and worth an upvote if you want to address this as a separate answer).

UPDATE: After using the above strategy on a number of real world projects with staff of varying levels of familiarity with CMake, some observations can be made.

  • Having the nested build re-use the same build directory as the main build can occasionally lead to problems. Specifically, if a user kills the CMake run after the nested build completes but before the main build does, the CMakeCache.txt file is left with EARLY_BUILD set to ON. This then makes all subsequent CMake runs act like a nested build, so the main build is essentially lost until the CMakeCache.txt file is manually removed. It is possible that an error somewhere in one of the project's CMakeLists.txt file may also lead to a similar situation (unconfirmed). Performing the nested build off to the side in its own separate build directory has worked very well though with no such problems.

  • The nested build should probably be Release rather than Debug. If not re-using the same build directory as the main build (now what I'd recommend), we no longer care about trying to avoid compiling the same file twice, so may as well make mycomp as fast as possible.

  • Use ccache so that any costs due to rebuilding some files twice with different settings are minimised. Actually, we found using ccache typically makes the nested build very quick since it rarely changed compared to the main build.

  • The nested build probably needs to have CMAKE_BUILD_WITH_INSTALL_RPATH set to FALSE on some platforms so that any libraries mycomp needs can be found without having to set environment variables, etc.

2
  • The problem I am running into here, using a very simplified project just to learn this strategy, is that the early build's configuration step seems to be latching on to the cross compiler and won't use the host-native compiler (thus failing to build "mycomp"). Jan 15, 2020 at 15:02
  • 2
    ... and after restarting this several times I think I found the culprit. The top-level CMakeLists.txt needs to be "administrative" and have no languages set. (Remember, I was using a really simple example.) Making that change seems to segregate the build into "two halves" properly (for anyone else coming to this problem). Jan 15, 2020 at 16:18

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.