Add experimental Windows wheel build support#1359
Open
zym1998year wants to merge 15 commits into
Open
Conversation
setuptools 72.0+ removed setuptools.command.test; guard the import. Add IS_WINDOWS sentinel for subsequent platform branches. Normalize the mmgr.cpp source-list match via os.path.normpath so glob's backslash paths still resolve. Use os.pathsep when splitting LIBRARY_PATH / C_INCLUDE_PATH / PATH-style env vars so the colon in Windows drive letters isn't treated as a separator.
- copt/lopt: MSVC entry with /O2 /std:c++14 /EHsc /openmp /Zc:__cplusplus /utf-8 /DNOMINMAX (/openmp wins MSVC's old runtime but suffices for the loops GalSim parallelizes today). - get_compiler_type: short-circuit MSVC via compiler_type before touching compiler_so (Unix-only attribute). - try_compile: route MSVC probes through compiler.compile() and compiler.link_executable() instead of hand-built cc -c / -o lines. - fix_compiler: skip ccache, -msse2, -stdlib=libc++ removal and linker_so editing on MSVC; force single-process compile on Windows (parallel_compile pool path is Unix-shaped; MSVC can later regain per-extension parallelism via /MP). - find_fftw_lib: search %CONDA_PREFIX%\Library\lib and the vcpkg layout, accept fftw3.lib / libfftw3.lib / libfftw3-3.lib, and skip ctypes.LoadLibrary on Windows (the located file is the import library, not the runtime DLL). - find_eigen_dir: same conda/vcpkg additions for include dirs. - Extension: use libraries=['fftw3'] on Windows; -lfftw3 stays for GCC/Clang. - my_build_ext.run: skip GALSIM_BUILD_SHARED on Windows; that path uses os.symlink and bakes in .so/.dylib naming.
- include/galsim/Std.h: define NOMINMAX before <Windows.h> so std::min / std::max (and Eigen members) survive unshadowed. The MSVC build also passes /DNOMINMAX, but defining it here protects direct header consumers. - include/galsim/Stopwatch.h: switch to std::chrono::steady_clock from gettimeofday. No GalSim TU includes this header today, but keeping the public surface portable avoids surprises. - src/Image.cpp, src/Polygon.cpp: replace alternative tokens (`or`) with `||`. MSVC parses these as identifiers without /Za + ciso646. - src/Random.cpp: split the POSIX <sys/time.h> / <unistd.h> / <fcntl.h> includes behind #ifndef _WIN32 and add _WIN32 branches for seedurandom() (std::random_device, which delegates to BCryptGenRandom/CryptGenRandom on Windows CRTs) and seedtime() (std::chrono::system_clock with the same microsecond-modulo seed). - src/SBInterpolatedImage.cpp, src/WCS.cpp, src/SBTransform.cpp: replace GCC variable-length arrays with std::vector<...>. Pass .data() to consumers expecting raw pointers (Horner2D, memset, reinterpret_cast onto std::complex<T>, KValueInnerLoop). The std::vector path keeps the original observable behaviour: heap allocation per call instead of stack, but the inner loops dominate any allocation cost.
The repo materialises ``galsim/share`` as a git symlink (mode 120000
-> ../share) so ``package_data={'galsim': shared_data + headers}``
ships the Roman / SED / bandpass / sensor data tree. On Windows
checkouts with the default ``core.symlinks=false``, git writes the
symlink target into a regular 8-byte text file ("../share") and
setuptools' package_data scan can't reach the data, so the resulting
wheel imports cleanly but raises FileNotFoundError on every code
path that touches ``meta_data.share_dir``.
Add a small ``build_py`` cmdclass extension that detects the
symlink-stub case (regular file under ``galsim/share`` whose body is
``../share`` or ``..\share``) and copies the repo-level ``share/``
tree directly into ``<build_lib>/galsim/share/`` after the standard
build_py runs. The source tree is untouched -- only the wheel build
output picks up the data -- and the detection is the stub itself
rather than ``IS_WINDOWS``, so Linux/macOS (with a working symlink)
is a no-op.
Verified locally: rebuilt wheel ships 112 share/ entries
(roman 61, bandpasses 18, sensors 18, SEDs 11, top-level 4); the
``galsim.meta_data.share_dir`` runtime probe flips from
``isdir=False`` to ``isdir=True``; ``tests/test_roman.py`` and
``tests/test_chromatic.py`` change from collection-time
ImportError/FileNotFoundError to 49/49 passing; the Layer-D full
pytest pass rate moves from 601/705 (85.3 %) to 710/754 (94.2 %).
The other ten runtime probes (drawimage, FFT-backed convolve, HSM,
random determinism, fork/spawn context, symlink, path-with-spaces,
download_cosmos inspection) return byte-identical results to the
pre-fix run, and the Windows-vs-Linux WFS pipeline cross-platform
diff stays bit-exact for the GalSim FFT image -- the change is
build-time packaging only and does not perturb any numerical path.
Member
|
Thanks for doing this. I got stymied by the FFTW installation when I tried to get Windows working a while back. Two high level things first:
|
Author
Thanks for the clear guidance. I’m a bit tied up right now, but once I have some bandwidth I’ll rebase this onto main, retarget the PR to main, and add a Windows build/test job in .github/workflows/ci.yml as you suggested. I appreciate the direction, especially around using CI to surface the Windows runtime issues. |
On MSVC/LLP64 'long' is 32-bit, so pybind rejected RNG seeds > 2^31-1 with a TypeError, whereas Linux/GCC (LP64, 64-bit long) accepted them. Widen the seed input path (BaseDeviate ctor/seed/reset) and the pybind bindings from long to int64_t. No-op on Linux (int64_t == long there). Fixes test_Zernike_rotate/basis, test_structure_function, test_lsst_y_focus.
galsim/config used get_context('fork'), which raises ValueError on Windows. Add _get_mp_context() that falls back to 'spawn' only when 'fork' is unavailable; Linux/mac still use 'fork'.
make_link used os.symlink, which fails on Windows without the symlink privilege (WinError 1314). Fall back to a directory junction (_winapi.CreateJunction), then shutil.copytree. POSIX path unchanged.
raw() returned a 32-bit 'long' on MSVC (LLP64), so raw values >= 2^31 came back negative on Windows but positive on Linux (LP64), diverging any config/serialized value that stores raw(). Widen raw() to int64_t (no-op on Linux). Also widen the derived-deviate integer-seed constructors (Uniform/Gaussian/Binomial/Poisson/Weibull/Gamma/Chi2) from long to int64_t to match BaseDeviate. Verified: raw() is now the positive uint32; the test_random reference-value tests and the four large-seed tests pass.
The fork->spawn fallback alone could not run because two definitions were nested (unpicklable under 'spawn'): MultiProcess's inner worker() and GetLoggerProxy's local LoggerManager class. Move both to module scope (_mp_worker, with item/job_func passed explicitly; _LoggerManager) so the spawn server process can pickle them by reference. Also correct the SafeManager docstring. Linux/fork behaviour is unchanged. Verified on Windows: config nproc>1 now runs (test_multirng and test_fits pass; 28/37 test_config_image+output pass, up from 0 before). Remaining failures are a deeper limitation: config $-eval lambdas and some catalog objects are not picklable under spawn.
tests/config_input/dict.p is a protocol-0 ASCII pickle; git autocrlf converted its LF to CRLF on Windows checkouts, corrupting the opcode stream (UnpicklingError: the STRING opcode argument must be quoted). Restore the 69-byte LF bytes and add *.p/*.pkl binary to .gitattributes so future checkouts don't re-corrupt it. Fixes test_basic_dict, test_scattered, test_multifits, test_datacube on Windows.
bandpass.py and sed.py checked isinstance(..., PosixPath), so a pathlib.WindowsPath throughput/spec raised GalSimIncompatibleValuesError on Windows. Use PurePath (base class of both) instead; strictly wider, no behaviour change on POSIX. Fixes test_SED_withFlux on Windows.
Second layer of spawn fixes after the _mp_worker/_LoggerManager hoist: (1) hoist the remaining nested manager classes to module scope (_InputManager in input.py, _OutputManager in extra.py); (2) publish + memoize the dynamically generated input-proxy classes as module attributes and add a PEP 562 module __getattr__ so spawned children can unpickle them by reference; (3) in MultiProcess, pass spawn workers a CopyConfig copy scrubbed of unpicklable caches (_fn/_gen_fn via CopyConfig, plus _eval_gdict and the started output_manager) while fork keeps the original config object; (4) guard test_timeout's image-level tiny-timeout assertions on fork availability (under spawn, Process.start() blocks while workers boot, so the 0.001s timeout can never trigger). Windows result: test_config_image+test_config_output go from 9 failures to 2 (both residuals are test-registered custom types, unreachable in spawned children by design). Linux/fork paths byte-identical.
Under spawn, worker processes start with fresh registries, so custom types registered via the yaml 'modules' mechanism were missing. Call ImportModules(config) at the top of _mp_worker (no-op without 'modules'; free under fork). The two tests that register custom types INSIDE the test function (HighN, FlakyFits) cannot work under spawn by construction (local objects are unpicklable), so guard only their nproc>1 sections on fork availability, mirroring the existing test_timeout guard. Windows: test_reject and test_retry_io now pass; config_image+output suite is 37 passed / 0 failed.
test_real.py compared os.path.join-built file names against '/'-hardcoded strings; build the expected values with os.path.join. test_sensor.py used vertex_file.split('/'); use os.path.basename. test_image.py held astropy memmap references past the pyfits.open context, keeping a Windows file lock that made a later clobbering writeFile fail with WinError 32; open those files with memmap=False. Fixes test_real_galaxy_catalog, test_silicon_area, test_Image_MultiFITS_IO, test_Image_CubeFITS_IO on Windows; no behaviour change on POSIX.
galsim/include is a git symlink to ../include, so Windows checkouts with core.symlinks=false get a text stub and the wheel shipped no headers (test_hsm::test_headers failed; galsim.include_dir was unusable). Generalize my_build_py's share stub fallback into a helper covering both share/ and include/. Rebuilt wheel ships 107 share + 113 include entries.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft PR against
releases/2.8(baselineee177aa50, v2.8.4).Branch on file:
windows-msvc-port.Status
setup.py build_clib + build_ext,pip wheel,delvewheel repair, fresh-venv install (no conda PATH),import galsim,Gaussian.drawImage, FFT-backed convolve,BaseDeviate(42)determinism,hsm.EstimateShear.multiprocessingconfigflow,
download_cosmossymlink fallback, full Windows pytest pass,GitHub Actions CI. See Follow-ups.
IS_WINDOWSbranches and_WIN32C++ guards. I have not re-run theexisting CI from this branch; the diff stays under
if IS_WINDOWS:so no GCC/Clang behavioural change is intended.Summary
Three small commits make GalSim build, install, and import on native
Windows x64 with the MSVC toolchain. The result is a self-contained
win_amd64wheel that installs into a fresh venv with no condadependency at runtime, and that produces correct numbers on a small
smoke surface (
drawImage, FFT convolve, random determinism, HSMEstimateShear).
The PR stays on the minimal-diff setuptools path. A
scikit-build-corerewrite is a larger discussion this PR does notattempt.
What this PR changes
setup.pysetup.pyinclude/,src/Branch tip:
9f4db829c.Per-commit detail (click to expand)
1. setup metadata and paths
setup.pyonly. Surgical:from setuptools.command.test import testimport.setuptools 72.0+ removed it; the cmdclass below never registered a
testcommand.IS_WINDOWS = sys.platform == 'win32'.cpp_sources.remove('src/mmgr.cpp')-> compare viaos.path.normpath, so theglob-returnedsrc\mmgr.cppon Windowsstill matches.
':'->os.pathsepwhen splittingLIBRARY_PATH,LD_LIBRARY_PATH,DYLD_LIBRARY_PATH,C_INCLUDE_PATH, and theinstalled-script PATH check. Avoids treating drive-letter colons as
separators.
2. MSVC and Windows FFTW / Eigen
setup.pyonly. The Windows build path:copt/loptgain an'msvc'entry:/O2 /std:c++14 /EHsc /openmp /Zc:__cplusplus /utf-8 /DNOMINMAX.get_compiler_typeshort-circuits viagetattr(compiler, 'compiler_type', None) == 'msvc'before touchingcompiler_so(Unix-only on
MSVCCompiler).try_compileadds an MSVC branch usingcompiler.compile()/compiler.link_executable()instead of constructingcc -c -ocommand lines.
fix_compilerskips ccache,-msse2,-stdlib=libc++removal, andthe
-Llinker_so rewrite on MSVC; forces single-process compile onWindows (
parallel_compileis shaped around the Unix invocationpattern).
find_fftw_libsearches%CONDA_PREFIX%\Library\lib, vcpkg'sinstalled/x64-windows/lib, and acceptsfftw3.lib/libfftw3.lib/libfftw3-3.lib. Thectypes.cdll.LoadLibraryprobe is skipped on Windows because the resolved path is the import
library, not a runnable DLL.
find_eigen_dirgets the same conda / vcpkg additions for includedirectories.
libraries=['fftw3']on Windows, keepsextra_link_args=['-lfftw3']on the GCC/Clang path.my_build_ext.runskipsGALSIM_BUILD_SHAREDon Windows: thatcodepath uses
os.symlinkand.so/.dylibnaming.3. C++ sources for MSVC
8 files:
include/galsim/Std.h:#define NOMINMAXbefore<Windows.h>sostd::min/std::max(and Eigen members) survive.include/galsim/Stopwatch.h: switch fromgettimeofdaytostd::chrono::steady_clock. No GalSim TU includes this headertoday; the rewrite is hygiene.
src/Image.cpp,src/Polygon.cpp: replace alternative tokenorwith
||. MSVC parsesoras an identifier without/Zaplus<ciso646>.src/Random.cpp: split the POSIX<sys/time.h>/<unistd.h>/<fcntl.h>includes behind#ifndef _WIN32; add_WIN32branchesfor
seedurandom()(std::random_device, which delegates toBCryptGenRandom on the Windows CRT) and
seedtime()(
std::chrono::system_clockwith the same microsecond-modulo seed).src/SBInterpolatedImage.cpp,src/WCS.cpp,src/SBTransform.cpp:replace GCC variable-length arrays with
std::vector<...>; pass.data()to consumers expecting raw pointers (Horner2D,memset/std::fill,reinterpret_castontostd::complex<T>,KValueInnerLoop).Verification
Built and tested on Windows 11 x64, Python 3.11.15 (conda-forge), VS
2026 Community (MSVC 14.50.35717), conda-forge
fftw 3.3.10/eigen 3.4.0/pybind11 3.0.3.Build / wheel
fftw3.dllvcruntime140.dll,vcruntime140_1.dllmsvcp140.dllpython311.dll,kernel32.dll,api-ms-win-crt-*vcomp140.dllFresh-venv smoke
import galsim2.8.4,_galsim.cp311-win_amd64.pydresolved from venv site-packagesGaussian(sigma=1.2).drawImage(nx=16, ny=16, scale=0.2)(16, 16), sum0.6684Convolve([Sersic(n=2.5), Gaussian])0.9737(flux=1 retained ~3%)BaseDeviate(42)firstGaussianDeviatesample-0.23898784173300305reproducible across instanceshsm.EstimateShearong1=0.2, g2=0.1status=0,e1=0.381,e2=0.190The cleaned PATH does not include
%CONDA_PREFIX%\Library\bin, sothe bundled
fftw3.dllis the one being loaded.Reproduction transcript (full local commands)
phase3_smoke.pyruns the five tests in the table above and exitsnon-zero on any failure.
Wheel notes
fftw3fromconda-forge::fftw,msvcp140/vcomp140from the env's MSVCredistributable mirror.
fftw3.dllinside a wheel ispermitted under GPL-2.0-or-later but pushes the binary distribution
under GPL constraints. If MIT-only binary distribution is preferred,
an alternative is to ship two flavours (FFTW vs. KissFFT or
pocketfft); out of scope for this PR. Reviewer input requested.
vcomp140.dllis the older Microsoft OpenMP runtime;/openmp:llvmwould pulllibomp.dllinstead. Reviewer inputrequested.
cp311-cp311-win_amd64. cibuildwheel will produce onewheel per Python minor version.
Follow-ups
Deliberately not in this PR. Each is a candidate for a separate PR
once this one lands.
galsim/config/util.pyusesmultiprocessing.get_context('fork'),which Windows lacks. Needs a
spawnfallback plus an audit ofconfig processing / phase-screen code for picklability under
spawn. Not exercised by the smoke tests above; surfaces as soon asa config-driven run hits the multiprocessing branch.
galsim/download_cosmos.pyusesos.symlink, which non-elevatedWindows users cannot create unless Developer Mode is on. Needs
copy/ directory-junction /--nolinkfallback.tests/for fork-only or symlink-onlyassumptions, mark with
pytest.mark.skipif(sys.platform == 'win32')where appropriate, and run the full pytest on
windows-latest.(
cp39 .. cp313 -win_amd64). I have a working local recipe but wantto align with project preferences (FFTW source, OpenMP runtime,
delvewheel command) before sending it; see
Reviewer questions below.
Risk and compatibility
IS_WINDOWS(insetup.py) or_WIN32(in C++). I expect nobehavioural regression on those platforms but did not re-run their
CI from this branch.
std::vectorinSBInterpolatedImage.cpp,WCS.cpp,SBTransform.cpp: per-call adds heap allocation; the loops affectedare inner enough that I expect dominance from per-element ray
operations, but I have not benchmarked. If micro-benchmarks regress
noticeably, an alternative is
_alloca/allocaon the Windowspath with VLAs preserved on GCC. I went with
std::vectorforcross-platform clarity.
Random.cppWindows seeding usesstd::random_device. OnMSVC's CRT this delegates to the OS CSPRNG, matching the
/dev/urandomcontract. POSIX builds are unchanged.delvewheelwarns when thebundled
msvcp140.dllis older than the toolset that built the.pyd. I built locally with VS 2026 (14.50) and bundledconda-forge's 14.44 redistributable. The wheel still loaded and
passed the smoke tests; CI on
windows-latest(VS 2022 14.4x)would not see this warning. No code change required.
Reviewer questions
or fftw.org pre-built. Affects the GPL-redistribution note.
/openmp(MS) vs/openmp:llvm. The latter aligns betterwith modern OpenMP but adds a
libomp.dllredistribution step.std::vectorchange: keep it global (current PR), orkeep VLAs on GCC and switch only on MSVC.
submit
multiprocessing/symlink/ Windows test markersseparately, or fold in.
windows-latest+ cibuildwheel workflow asa follow-up PR once questions 1-2 are resolved. A draft strategy
doc is available locally if helpful.
Happy to split, rebase, or extend as the project prefers.