I noticed a test failure downstream where a machine learning estimator was performing a partial_fit that leverages the Moore-Penrose pseudoinverse for a closed form solution (lanl/GFDL#117). Although only 1/100 elements mismatched, the relative tolerance diff was quite substantial on that element (1.46476215) vs. previous NumPy/OpenBLAS combinations. This failure showed up starting with NumPy 2.5.0 from 3 days ago, which shipped with a newer version of OpenBLAS.
I then dissected out a plain text, but rather lengthy, reproducer on the NumPy level. I've pushed it up to GitHub at https://github.com/tylerjereddy/pinv_issue, and it prints out a single element of np.pinv output from a fixed 2D input. The input is a (25, 25) array carefully dissected out of our in house testsuite--the size of that array in plain text is the main reason I've provided it in a separate repo--it is perhaps too large to dump here and so you can confirm it is plain text, etc.
Below are the results it prints in a few different scenarios -- the main problem with consistency appears to be with the OpenBLAS that ships with the latest x86_64 NumPy Linux wheels, since building NumPy 2.5.0 from source with an older OpenBLAS passes our testing just fine. I realize that these are just "proxy indicators" for the problem rather than the full picture, especially since different values on different platforms and with different linalg backends are acceptable, but nonetheless, the outcome on NumPy 2.5.0 with OpenBLAS from wheel is not acceptable in the broader testing, so I tried to cut this down to a proxy reproducer. Just to emphasize here--it is "ok" that some of the values in the table below are different, but the 0.3609... value with NumPy 2.5.0 + the OpenBLAS that ships with that wheel, is not ok (though this may only be part of the total numerical instability story). It is likely that the input array is pathological in a variety of ways, and this is not uncommon for batched input over a design matrix. Nonetheless, most other platforms and linalg backends and older OpenBLAS versions are doing just fine here in the broader numerical testing we do, and I wanted to keep the proxy test focused on a single float value for now.
| NumPy version |
Result |
NumPy 2.5.0 wheel on x86_64 Linux |
0.360954047299216 |
NumPy 2.5.0 built from source on x86_64 Linux with OpenBLAS 0.3.20 |
0.5567409444424644 (passes our broader numerical testing) |
NumPy 2.4.6 wheel on x86_64 Linux |
0.08970465208832239 |
NumPy 2.3.4 wheel on x86_64 Linux |
0.08970465208832239 |
NumPy 2.5.0 wheel on M series Mac |
0.1422214449163197 |
NumPy 2.4.6 wheel on M series Mac |
0.1422214449163197 |
I then talked to @seberg and @ngoldbaum and they suggested I bisect on OpenBLAS and open an issue here, so I bisected OpenBLAS alongside NumPy 2.5.0 to find the first point where x86_64 Linux + OpenBLAS produces the problem value above with NumPy 2.5.0 (the 0.3609... value) from wheel install (see below) when linked to NumPy. In fact, I ended up needing to use LD_PRELOAD to get the right behavior (force NumPy to link the correct source-built OpenBLAS, pkg config env vars/settings were not sufficient on their own...). I realize that this isn't a reproducer at the C level for OpenBLAS, and that these single proxy floats are just part of the story, but thought I'd get this started. I can work toward that if you really need it.
To further complicate matters, some of the commits in the bisection had to be skipped because they failed to build; presumably those commits really are broken for other reasons. So, the best I could do was this output that narrows it down to two possible problem commits
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
f5f789fc529ffcc4fb7dd5f525fee1e719339a3a
d9bb8f369f5d80cb0b074f01bde156a6dda1f193
We cannot bisect more!
Those both reference Reference-LAPACK PRs. I did confirm that the latest develop commit (a36e22c) is still broken (produces 0.360954047299216 using the reproducer above on x86_64 Linux).
Sorry if this is a little tricky to follow, but it is the result of several hours of careful dissection and I did not use AI--this is a real problem affecting our estimator research at DOE, and we're happy to help out if we can, but would probably need a bit more guidance. I wonder if an expert can look at the suspicious commits with a knowledge of the ops used by pinv and get a pretty good sense for where the issue is.
I noticed a test failure downstream where a machine learning estimator was performing a
partial_fitthat leverages the Moore-Penrose pseudoinverse for a closed form solution (lanl/GFDL#117). Although only 1/100 elements mismatched, the relative tolerance diff was quite substantial on that element (1.46476215) vs. previous NumPy/OpenBLAS combinations. This failure showed up starting with NumPy2.5.0from 3 days ago, which shipped with a newer version of OpenBLAS.I then dissected out a plain text, but rather lengthy, reproducer on the NumPy level. I've pushed it up to GitHub at https://github.com/tylerjereddy/pinv_issue, and it prints out a single element of
np.pinvoutput from a fixed 2D input. The input is a(25, 25)array carefully dissected out of our in house testsuite--the size of that array in plain text is the main reason I've provided it in a separate repo--it is perhaps too large to dump here and so you can confirm it is plain text, etc.Below are the results it prints in a few different scenarios -- the main problem with consistency appears to be with the OpenBLAS that ships with the latest x86_64 NumPy Linux wheels, since building NumPy
2.5.0from source with an older OpenBLAS passes our testing just fine. I realize that these are just "proxy indicators" for the problem rather than the full picture, especially since different values on different platforms and with different linalg backends are acceptable, but nonetheless, the outcome on NumPy2.5.0with OpenBLAS from wheel is not acceptable in the broader testing, so I tried to cut this down to a proxy reproducer. Just to emphasize here--it is "ok" that some of the values in the table below are different, but the0.3609...value with NumPy2.5.0+ the OpenBLAS that ships with that wheel, is not ok (though this may only be part of the total numerical instability story). It is likely that the input array is pathological in a variety of ways, and this is not uncommon for batched input over a design matrix. Nonetheless, most other platforms and linalg backends and older OpenBLAS versions are doing just fine here in the broader numerical testing we do, and I wanted to keep the proxy test focused on a single float value for now.2.5.0wheel on x86_64 Linux0.3609540472992162.5.0built from source on x86_64 Linux with OpenBLAS0.3.200.5567409444424644(passes our broader numerical testing)2.4.6wheel on x86_64 Linux0.089704652088322392.3.4wheel on x86_64 Linux0.089704652088322392.5.0wheel on M series Mac0.14222144491631972.4.6wheel on M series Mac0.1422214449163197I then talked to @seberg and @ngoldbaum and they suggested I bisect on OpenBLAS and open an issue here, so I bisected OpenBLAS alongside NumPy
2.5.0to find the first point where x86_64 Linux + OpenBLAS produces the problem value above with NumPy2.5.0(the0.3609...value) from wheel install (see below) when linked to NumPy. In fact, I ended up needing to useLD_PRELOADto get the right behavior (force NumPy to link the correct source-built OpenBLAS, pkg config env vars/settings were not sufficient on their own...). I realize that this isn't a reproducer at the C level for OpenBLAS, and that these single proxy floats are just part of the story, but thought I'd get this started. I can work toward that if you really need it.To further complicate matters, some of the commits in the bisection had to be skipped because they failed to build; presumably those commits really are broken for other reasons. So, the best I could do was this output that narrows it down to two possible problem commits
Those both reference Reference-LAPACK PRs. I did confirm that the latest
developcommit (a36e22c) is still broken (produces0.360954047299216using the reproducer above on x86_64 Linux).Sorry if this is a little tricky to follow, but it is the result of several hours of careful dissection and I did not use AI--this is a real problem affecting our estimator research at DOE, and we're happy to help out if we can, but would probably need a bit more guidance. I wonder if an expert can look at the suspicious commits with a knowledge of the ops used by
pinvand get a pretty good sense for where the issue is.