Skip to content

Add experimental Windows wheel build support#1359

Open
zym1998year wants to merge 15 commits into
GalSim-developers:releases/2.8from
zym1998year:windows-msvc-port
Open

Add experimental Windows wheel build support#1359
zym1998year wants to merge 15 commits into
GalSim-developers:releases/2.8from
zym1998year:windows-msvc-port

Conversation

@zym1998year

Copy link
Copy Markdown

Draft PR against releases/2.8 (baseline ee177aa50, v2.8.4).
Branch on file: windows-msvc-port.

Status

  • Verified locally: setup.py build_clib + build_ext, pip wheel,
    delvewheel repair, fresh-venv install (no conda PATH), import galsim, Gaussian.drawImage, FFT-backed convolve,
    BaseDeviate(42) determinism, hsm.EstimateShear.
  • Not addressed in this PR: Windows-aware multiprocessing config
    flow, download_cosmos symlink fallback, full Windows pytest pass,
    GitHub Actions CI. See Follow-ups.
  • Linux / macOS code paths are textually unchanged outside
    IS_WINDOWS branches and _WIN32 C++ guards. I have not re-run the
    existing CI from this branch; the diff stays under
    if IS_WINDOWS: so no GCC/Clang behavioural change is intended.

Summary

Three small commits make GalSim build, install, and import on native
Windows x64 with the MSVC toolchain. The result is a self-contained
win_amd64 wheel that installs into a fresh venv with no conda
dependency at runtime, and that produces correct numbers on a small
smoke surface (drawImage, FFT convolve, random determinism, HSM
EstimateShear).

The PR stays on the minimal-diff setuptools path. A
scikit-build-core rewrite is a larger discussion this PR does not
attempt.

What this PR changes

# Commit Files Diff
1 Make setup metadata and path handling Windows-safe setup.py +14 / -6
2 Add MSVC and Windows FFTW/Eigen build support setup.py +190 / -84
3 Port GalSim C++ sources for MSVC 8 files in include/, src/ +93 / -65

Branch tip: 9f4db829c.

Per-commit detail (click to expand)

1. setup metadata and paths

setup.py only. Surgical:

  • Guard the from setuptools.command.test import test import.
    setuptools 72.0+ removed it; the cmdclass below never registered a
    test command.
  • Add IS_WINDOWS = sys.platform == 'win32'.
  • cpp_sources.remove('src/mmgr.cpp') -> compare via
    os.path.normpath, so the glob-returned src\mmgr.cpp on Windows
    still matches.
  • ':' -> os.pathsep when splitting LIBRARY_PATH,
    LD_LIBRARY_PATH, DYLD_LIBRARY_PATH, C_INCLUDE_PATH, and the
    installed-script PATH check. Avoids treating drive-letter colons as
    separators.

2. MSVC and Windows FFTW / Eigen

setup.py only. The Windows build path:

  • copt / lopt gain an 'msvc' entry: /O2 /std:c++14 /EHsc /openmp /Zc:__cplusplus /utf-8 /DNOMINMAX.
  • get_compiler_type short-circuits via getattr(compiler, 'compiler_type', None) == 'msvc' before touching compiler_so
    (Unix-only on MSVCCompiler).
  • try_compile adds an MSVC branch using compiler.compile() /
    compiler.link_executable() instead of constructing cc -c -o
    command lines.
  • fix_compiler skips ccache, -msse2, -stdlib=libc++ removal, and
    the -L linker_so rewrite on MSVC; forces single-process compile on
    Windows (parallel_compile is shaped around the Unix invocation
    pattern).
  • find_fftw_lib searches %CONDA_PREFIX%\Library\lib, vcpkg's
    installed/x64-windows/lib, and accepts fftw3.lib /
    libfftw3.lib / libfftw3-3.lib. The ctypes.cdll.LoadLibrary
    probe is skipped on Windows because the resolved path is the import
    library, not a runnable DLL.
  • find_eigen_dir gets the same conda / vcpkg additions for include
    directories.
  • The Extension uses libraries=['fftw3'] on Windows, keeps
    extra_link_args=['-lfftw3'] on the GCC/Clang path.
  • my_build_ext.run skips GALSIM_BUILD_SHARED on Windows: that
    codepath uses os.symlink and .so / .dylib naming.

3. C++ sources for MSVC

8 files:

  • include/galsim/Std.h: #define NOMINMAX before <Windows.h> so
    std::min / std::max (and Eigen members) survive.
  • include/galsim/Stopwatch.h: switch from gettimeofday to
    std::chrono::steady_clock. No GalSim TU includes this header
    today; the rewrite is hygiene.
  • src/Image.cpp, src/Polygon.cpp: replace alternative token or
    with ||. MSVC parses or as an identifier without /Za plus
    <ciso646>.
  • src/Random.cpp: split the POSIX <sys/time.h> / <unistd.h> /
    <fcntl.h> includes behind #ifndef _WIN32; add _WIN32 branches
    for seedurandom() (std::random_device, which delegates to
    BCryptGenRandom on the Windows CRT) and seedtime()
    (std::chrono::system_clock with the same microsecond-modulo seed).
  • src/SBInterpolatedImage.cpp, src/WCS.cpp, src/SBTransform.cpp:
    replace GCC variable-length arrays with std::vector<...>; pass
    .data() to consumers expecting raw pointers (Horner2D,
    memset/std::fill, reinterpret_cast onto std::complex<T>,
    KValueInnerLoop).

Verification

Built and tested on Windows 11 x64, Python 3.11.15 (conda-forge), VS
2026 Community (MSVC 14.50.35717), conda-forge fftw 3.3.10 /
eigen 3.4.0 / pybind11 3.0.3.

Build / wheel

python setup.py build_clib -j1 build_ext -j1                # 0 errors
python -m pip wheel . -w dist -v --no-deps                  # 0 errors
python -m delvewheel show dist\GalSim-2.8.4-cp311-...whl    # see table
python -m delvewheel repair --add-path %CONDA_PREFIX%\Library\bin \
                            -w wheelhouse dist\GalSim-2.8.4-cp311-...whl
Bundled into wheel Assumed present on host
fftw3.dll vcruntime140.dll, vcruntime140_1.dll
msvcp140.dll python311.dll, kernel32.dll, api-ms-win-crt-*
vcomp140.dll

Fresh-venv smoke

python -m venv .venv-clean
:: PATH set to .venv-clean\Scripts;System32;... (no conda Library\bin)
.venv-clean\Scripts\pip install <repaired wheel> LSSTDESC.Coord astropy
.venv-clean\Scripts\python phase3_smoke.py
Test Result
import galsim 2.8.4, _galsim.cp311-win_amd64.pyd resolved from venv site-packages
Gaussian(sigma=1.2).drawImage(nx=16, ny=16, scale=0.2) shape (16, 16), sum 0.6684
FFT draw Convolve([Sersic(n=2.5), Gaussian]) sum 0.9737 (flux=1 retained ~3%)
BaseDeviate(42) first GaussianDeviate sample -0.23898784173300305 reproducible across instances
hsm.EstimateShear on g1=0.2, g2=0.1 status=0, e1=0.381, e2=0.190

The cleaned PATH does not include %CONDA_PREFIX%\Library\bin, so
the bundled fftw3.dll is the one being loaded.

Reproduction transcript (full local commands)
:: From a Developer Command Prompt (vcvars64) with the conda env active
set CMAKE_BUILD_PARALLEL_LEVEL=
set FFTW_DIR=%CONDA_PREFIX%\Library
python setup.py build_clib -j1 build_ext -j1
python -m pip install -U pip build wheel delvewheel
python -m pip wheel . -w dist -v --no-deps
python -m delvewheel show dist\GalSim-2.8.4-cp311-cp311-win_amd64.whl
python -m delvewheel repair --add-path %CONDA_PREFIX%\Library\bin ^
                            -w wheelhouse ^
                            dist\GalSim-2.8.4-cp311-cp311-win_amd64.whl

python -m venv .venv-clean
set "PATH=%cd%\.venv-clean\Scripts;%SystemRoot%\System32;%SystemRoot%;%SystemRoot%\System32\Wbem"
set "CONDA_PREFIX="
set "FFTW_DIR="
.venv-clean\Scripts\python -m pip install -U pip wheel
.venv-clean\Scripts\python -m pip install ^
    wheelhouse\GalSim-2.8.4-cp311-cp311-win_amd64.whl LSSTDESC.Coord astropy
.venv-clean\Scripts\python phase3_smoke.py

phase3_smoke.py runs the five tests in the table above and exits
non-zero on any failure.

Wheel notes

  • Bundled DLLs come from conda-forge: fftw3 from
    conda-forge::fftw, msvcp140 / vcomp140 from the env's MSVC
    redistributable mirror.
  • FFTW is GPL-licensed; redistributing fftw3.dll inside a wheel is
    permitted under GPL-2.0-or-later but pushes the binary distribution
    under GPL constraints. If MIT-only binary distribution is preferred,
    an alternative is to ship two flavours (FFTW vs. KissFFT or
    pocketfft); out of scope for this PR. Reviewer input requested.
  • vcomp140.dll is the older Microsoft OpenMP runtime;
    /openmp:llvm would pull libomp.dll instead. Reviewer input
    requested
    .
  • Wheel size in this branch: ~5 MB, dominated by the bundled FFTW.
  • The wheel is cp311-cp311-win_amd64. cibuildwheel will produce one
    wheel per Python minor version.

Follow-ups

Deliberately not in this PR. Each is a candidate for a separate PR
once this one lands.

  • galsim/config/util.py uses multiprocessing.get_context('fork'),
    which Windows lacks. Needs a spawn fallback plus an audit of
    config processing / phase-screen code for picklability under
    spawn. Not exercised by the smoke tests above; surfaces as soon as
    a config-driven run hits the multiprocessing branch.
  • galsim/download_cosmos.py uses os.symlink, which non-elevated
    Windows users cannot create unless Developer Mode is on. Needs
    copy / directory-junction / --nolink fallback.
  • Test markers: sweep tests/ for fork-only or symlink-only
    assumptions, mark with pytest.mark.skipif(sys.platform == 'win32')
    where appropriate, and run the full pytest on windows-latest.
  • GitHub Actions Windows CI + cibuildwheel matrix
    (cp39 .. cp313 -win_amd64). I have a working local recipe but want
    to align with project preferences (FFTW source, OpenMP runtime,
    delvewheel command) before sending it; see
    Reviewer questions below.

Risk and compatibility

  • Linux / macOS: every Windows-specific change is gated by
    IS_WINDOWS (in setup.py) or _WIN32 (in C++). I expect no
    behavioural regression on those platforms but did not re-run their
    CI from this branch.
  • VLA -> std::vector in SBInterpolatedImage.cpp, WCS.cpp,
    SBTransform.cpp: per-call adds heap allocation; the loops affected
    are inner enough that I expect dominance from per-element ray
    operations, but I have not benchmarked. If micro-benchmarks regress
    noticeably, an alternative is _alloca / alloca on the Windows
    path with VLAs preserved on GCC. I went with std::vector for
    cross-platform clarity.
  • Random.cpp Windows seeding uses std::random_device. On
    MSVC's CRT this delegates to the OS CSPRNG, matching the
    /dev/urandom contract. POSIX builds are unchanged.
  • MSVC toolset vs runtime mismatch: delvewheel warns when the
    bundled msvcp140.dll is older than the toolset that built the
    .pyd. I built locally with VS 2026 (14.50) and bundled
    conda-forge's 14.44 redistributable. The wheel still loaded and
    passed the smoke tests; CI on windows-latest (VS 2022 14.4x)
    would not see this warning. No code change required.

Reviewer questions

  1. FFTW source for the binary wheel: conda-forge, vcpkg-built,
    or fftw.org pre-built. Affects the GPL-redistribution note.
  2. /openmp (MS) vs /openmp:llvm. The latter aligns better
    with modern OpenMP but adds a libomp.dll redistribution step.
  3. VLA -> std::vector change: keep it global (current PR), or
    keep VLAs on GCC and switch only on MSVC.
  4. PR split: land this PR first (build / wheel / import) and
    submit multiprocessing / symlink / Windows test markers
    separately, or fold in.
  5. CI: I can send a windows-latest + cibuildwheel workflow as
    a follow-up PR once questions 1-2 are resolved. A draft strategy
    doc is available locally if helpful.

Happy to split, rebase, or extend as the project prefers.

setuptools 72.0+ removed setuptools.command.test; guard the import.
Add IS_WINDOWS sentinel for subsequent platform branches. Normalize
the mmgr.cpp source-list match via os.path.normpath so glob's
backslash paths still resolve. Use os.pathsep when splitting
LIBRARY_PATH / C_INCLUDE_PATH / PATH-style env vars so the colon in
Windows drive letters isn't treated as a separator.
- copt/lopt: MSVC entry with /O2 /std:c++14 /EHsc /openmp
  /Zc:__cplusplus /utf-8 /DNOMINMAX (/openmp wins MSVC's old runtime
  but suffices for the loops GalSim parallelizes today).
- get_compiler_type: short-circuit MSVC via compiler_type before
  touching compiler_so (Unix-only attribute).
- try_compile: route MSVC probes through compiler.compile() and
  compiler.link_executable() instead of hand-built cc -c / -o lines.
- fix_compiler: skip ccache, -msse2, -stdlib=libc++ removal and
  linker_so editing on MSVC; force single-process compile on Windows
  (parallel_compile pool path is Unix-shaped; MSVC can later regain
  per-extension parallelism via /MP).
- find_fftw_lib: search %CONDA_PREFIX%\Library\lib and the vcpkg
  layout, accept fftw3.lib / libfftw3.lib / libfftw3-3.lib, and skip
  ctypes.LoadLibrary on Windows (the located file is the import
  library, not the runtime DLL).
- find_eigen_dir: same conda/vcpkg additions for include dirs.
- Extension: use libraries=['fftw3'] on Windows; -lfftw3 stays for
  GCC/Clang.
- my_build_ext.run: skip GALSIM_BUILD_SHARED on Windows; that path
  uses os.symlink and bakes in .so/.dylib naming.
- include/galsim/Std.h: define NOMINMAX before <Windows.h> so std::min
  / std::max (and Eigen members) survive unshadowed.  The MSVC build
  also passes /DNOMINMAX, but defining it here protects direct header
  consumers.
- include/galsim/Stopwatch.h: switch to std::chrono::steady_clock from
  gettimeofday.  No GalSim TU includes this header today, but keeping
  the public surface portable avoids surprises.
- src/Image.cpp, src/Polygon.cpp: replace alternative tokens (`or`)
  with `||`.  MSVC parses these as identifiers without /Za + ciso646.
- src/Random.cpp: split the POSIX <sys/time.h> / <unistd.h> / <fcntl.h>
  includes behind #ifndef _WIN32 and add _WIN32 branches for
  seedurandom() (std::random_device, which delegates to
  BCryptGenRandom/CryptGenRandom on Windows CRTs) and seedtime()
  (std::chrono::system_clock with the same microsecond-modulo seed).
- src/SBInterpolatedImage.cpp, src/WCS.cpp, src/SBTransform.cpp:
  replace GCC variable-length arrays with std::vector<...>.  Pass
  .data() to consumers expecting raw pointers (Horner2D, memset,
  reinterpret_cast onto std::complex<T>, KValueInnerLoop).  The
  std::vector path keeps the original observable behaviour: heap
  allocation per call instead of stack, but the inner loops dominate
  any allocation cost.
The repo materialises ``galsim/share`` as a git symlink (mode 120000
-> ../share) so ``package_data={'galsim': shared_data + headers}``
ships the Roman / SED / bandpass / sensor data tree.  On Windows
checkouts with the default ``core.symlinks=false``, git writes the
symlink target into a regular 8-byte text file ("../share") and
setuptools' package_data scan can't reach the data, so the resulting
wheel imports cleanly but raises FileNotFoundError on every code
path that touches ``meta_data.share_dir``.

Add a small ``build_py`` cmdclass extension that detects the
symlink-stub case (regular file under ``galsim/share`` whose body is
``../share`` or ``..\share``) and copies the repo-level ``share/``
tree directly into ``<build_lib>/galsim/share/`` after the standard
build_py runs.  The source tree is untouched -- only the wheel build
output picks up the data -- and the detection is the stub itself
rather than ``IS_WINDOWS``, so Linux/macOS (with a working symlink)
is a no-op.

Verified locally: rebuilt wheel ships 112 share/ entries
(roman 61, bandpasses 18, sensors 18, SEDs 11, top-level 4); the
``galsim.meta_data.share_dir`` runtime probe flips from
``isdir=False`` to ``isdir=True``; ``tests/test_roman.py`` and
``tests/test_chromatic.py`` change from collection-time
ImportError/FileNotFoundError to 49/49 passing; the Layer-D full
pytest pass rate moves from 601/705 (85.3 %) to 710/754 (94.2 %).
The other ten runtime probes (drawimage, FFT-backed convolve, HSM,
random determinism, fork/spawn context, symlink, path-with-spaces,
download_cosmos inspection) return byte-identical results to the
pre-fix run, and the Windows-vs-Linux WFS pipeline cross-platform
diff stays bit-exact for the GalSim FFT image -- the change is
build-time packaging only and does not perturb any numerical path.
@rmjarvis

rmjarvis commented May 28, 2026

Copy link
Copy Markdown
Member

Thanks for doing this. I got stymied by the FFTW installation when I tried to get Windows working a while back.

Two high level things first:

  1. This needs to be rebased onto main and the PR targeted to main, not releases/2.8.
  2. Please add a Windows build in .github/workflows/ci.yml so CI can build on windows and run tests. I don't have a local Windows machine to test on, so that's the only way I can see what the runtime issues are.

@rmjarvis rmjarvis added the build Related to compiling, building, installing label May 29, 2026
@zym1998year

Copy link
Copy Markdown
Author

Thanks for doing this. I got stymied by the FFTW installation when I tried to get Windows working a while back.

Two high level things first:

  1. This needs to be rebased onto main and the PR targeted to main, not releases/2.8.
  2. Please add a Windows build in .github/workflows/ci.yml so CI can build on windows and run tests. I don't have a local Windows machine to test on, so that's the only way I can see what the runtime issues are.

Thanks for the clear guidance. I’m a bit tied up right now, but once I have some bandwidth I’ll rebase this onto main, retarget the PR to main, and add a Windows build/test job in .github/workflows/ci.yml as you suggested. I appreciate the direction, especially around using CI to surface the Windows runtime issues.

On MSVC/LLP64 'long' is 32-bit, so pybind rejected RNG seeds > 2^31-1 with a TypeError, whereas Linux/GCC (LP64, 64-bit long) accepted them. Widen the seed input path (BaseDeviate ctor/seed/reset) and the pybind bindings from long to int64_t. No-op on Linux (int64_t == long there). Fixes test_Zernike_rotate/basis, test_structure_function, test_lsst_y_focus.
galsim/config used get_context('fork'), which raises ValueError on Windows. Add _get_mp_context() that falls back to 'spawn' only when 'fork' is unavailable; Linux/mac still use 'fork'.
make_link used os.symlink, which fails on Windows without the symlink privilege (WinError 1314). Fall back to a directory junction (_winapi.CreateJunction), then shutil.copytree. POSIX path unchanged.
raw() returned a 32-bit 'long' on MSVC (LLP64), so raw values >= 2^31 came back negative on Windows but positive on Linux (LP64), diverging any config/serialized value that stores raw(). Widen raw() to int64_t (no-op on Linux). Also widen the derived-deviate integer-seed constructors (Uniform/Gaussian/Binomial/Poisson/Weibull/Gamma/Chi2) from long to int64_t to match BaseDeviate. Verified: raw() is now the positive uint32; the test_random reference-value tests and the four large-seed tests pass.
The fork->spawn fallback alone could not run because two definitions were nested (unpicklable under 'spawn'): MultiProcess's inner worker() and GetLoggerProxy's local LoggerManager class. Move both to module scope (_mp_worker, with item/job_func passed explicitly; _LoggerManager) so the spawn server process can pickle them by reference. Also correct the SafeManager docstring. Linux/fork behaviour is unchanged. Verified on Windows: config nproc>1 now runs (test_multirng and test_fits pass; 28/37 test_config_image+output pass, up from 0 before). Remaining failures are a deeper limitation: config $-eval lambdas and some catalog objects are not picklable under spawn.
tests/config_input/dict.p is a protocol-0 ASCII pickle; git autocrlf converted its LF to CRLF on Windows checkouts, corrupting the opcode stream (UnpicklingError: the STRING opcode argument must be quoted). Restore the 69-byte LF bytes and add *.p/*.pkl binary to .gitattributes so future checkouts don't re-corrupt it. Fixes test_basic_dict, test_scattered, test_multifits, test_datacube on Windows.
bandpass.py and sed.py checked isinstance(..., PosixPath), so a pathlib.WindowsPath throughput/spec raised GalSimIncompatibleValuesError on Windows. Use PurePath (base class of both) instead; strictly wider, no behaviour change on POSIX. Fixes test_SED_withFlux on Windows.
Second layer of spawn fixes after the _mp_worker/_LoggerManager hoist: (1) hoist the remaining nested manager classes to module scope (_InputManager in input.py, _OutputManager in extra.py); (2) publish + memoize the dynamically generated input-proxy classes as module attributes and add a PEP 562 module __getattr__ so spawned children can unpickle them by reference; (3) in MultiProcess, pass spawn workers a CopyConfig copy scrubbed of unpicklable caches (_fn/_gen_fn via CopyConfig, plus _eval_gdict and the started output_manager) while fork keeps the original config object; (4) guard test_timeout's image-level tiny-timeout assertions on fork availability (under spawn, Process.start() blocks while workers boot, so the 0.001s timeout can never trigger). Windows result: test_config_image+test_config_output go from 9 failures to 2 (both residuals are test-registered custom types, unreachable in spawned children by design). Linux/fork paths byte-identical.
Under spawn, worker processes start with fresh registries, so custom types registered via the yaml 'modules' mechanism were missing. Call ImportModules(config) at the top of _mp_worker (no-op without 'modules'; free under fork). The two tests that register custom types INSIDE the test function (HighN, FlakyFits) cannot work under spawn by construction (local objects are unpicklable), so guard only their nproc>1 sections on fork availability, mirroring the existing test_timeout guard. Windows: test_reject and test_retry_io now pass; config_image+output suite is 37 passed / 0 failed.
test_real.py compared os.path.join-built file names against '/'-hardcoded strings; build the expected values with os.path.join. test_sensor.py used vertex_file.split('/'); use os.path.basename. test_image.py held astropy memmap references past the pyfits.open context, keeping a Windows file lock that made a later clobbering writeFile fail with WinError 32; open those files with memmap=False. Fixes test_real_galaxy_catalog, test_silicon_area, test_Image_MultiFITS_IO, test_Image_CubeFITS_IO on Windows; no behaviour change on POSIX.
galsim/include is a git symlink to ../include, so Windows checkouts with core.symlinks=false get a text stub and the wheel shipped no headers (test_hsm::test_headers failed; galsim.include_dir was unusable). Generalize my_build_py's share stub fallback into a helper covering both share/ and include/. Rebuilt wheel ships 107 share + 113 include entries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Related to compiling, building, installing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants