matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn #15973

bicici · 2020-04-14T14:27:46Z

The freeze / hang can happen with large matrices and in parallel settings. For instance, sklearn/neural_network/_multilayer_perceptron.py is using safe_sparse_dot, which calls the matmul operator @ for ret = a @ b. Affected package:
sklearn/neural_network/_multilayer_perceptron.py
from ..utils.extmath import safe_sparse_dot
ret = a @ b

python also freeze with expressions like exp(3e400) for float('inf') with -Ofast in test_buffer.py and such freeze may be related to those types of operators together with -Ofast in cpython. Therefore, compiling with fewer optimization flags might also overcome the issue and prevent @ to freeze the program. The freeze occurs when matrices are about larger than 5000 x 100.

MKL inteloneapi 2021.1-beta05 is freezing. openblas is not freezing.

Test program:
``
import concurrent.futures
from numpy import random, matmul

def mmtest(X, i):
print ('matmul @ call', i)
y_hat = X @ X.T
print ('done', i)
return y_hat

def mmtest_matmul(X, i):
print ('matmul func call', i)
y_hat = matmul(X, X.T)
print ('done', i)
return y_hat

def f_mpmm(X):
executor = concurrent.futures.ProcessPoolExecutor(7)
futures = []
futures.append(executor.submit(mmtest, X, 0))
futures.append(executor.submit(mmtest, X, 1))
futures.append(executor.submit(mmtest, X, 2))
futures.append(executor.submit(mmtest, X, 3))
futures.append(executor.submit(mmtest_matmul, X, 4))
futures.append(executor.submit(mmtest_matmul, X, 5))
futures.append(executor.submit(mmtest_matmul, X, 6))
concurrent.futures.wait(futures)
executor.shutdown()

def f_mm(X):
mmtest(X, 0)
mmtest(X, 1)
mmtest(X, 2)
mmtest(X, 3)
mmtest_matmul(X, 4)
mmtest_matmul(X, 5)
mmtest_matmul(X, 6)

def test():
X = random.randn(5000, 100); y = random.randn(5000)
print ('testing serial')
f_mm(X)
print ('testing multiprocessing')
f_mpmm(X)

if name == 'main':
test()
``

Test output with numpy built with Intel MKL:
testing serial
matmul @ call 0
done 0
matmul @ call 1
done 1
matmul @ call 2
done 2
matmul @ call 3
done 3
matmul func call 4
done 4
matmul func call 5
done 5
matmul func call 6
done 6
testing multiprocessing
matmul @ call 0
matmul @ call 1
matmul @ call 2
matmul @ call 3
matmul func call 4
matmul func call 5
matmul func call 6

[frozen]

Test output with numpy built with openblas:
testing serial
matmul @ call 0
done 0
matmul @ call 1
done 1
matmul @ call 2
done 2
matmul @ call 3
done 3
matmul func call 4
done 4
matmul func call 5
done 5
matmul func call 6
done 6
testing multiprocessing
matmul @ call 0
matmul @ call 1
matmul @ call 2
matmul @ call 3
matmul func call 4
matmul func call 5
matmul func call 6
done 0
done 1
done 2
done 3
done 6
done 4
done 5

Related files:
sklearn/neural_network/_multilayer_perceptron.py
sklearn/utils/extmath.py

Related issues:
"parallel processes freezing when matrices are too big"
joblib/joblib#138
"matmul operator freeze within safe_sparse_dot and bug fix"
scikit-learn/scikit-learn#16919

seberg · 2020-04-14T14:56:18Z

@bicici can you give more details? I.e. I am not sure what you mean by "openblas ovecome this". Are you using an Intel patched version of NumPy or plain NumPy?
Or is it that a @ b just happens to be very slow compared to a.dot(b)? The last point could be since I think we sometimes do not call into blas for non-contiugous input with the @ operator. Something that may be a performance bug in some cases, and which maybe we can find a solution for.

bicici · 2020-04-14T19:52:19Z

When numpy use openblas, as noticed in joblib/joblib#138, does not freeze. When the same github numpy code (numpy1.18.x) and settings, compiled from source without error, pick Intel mkl libraries instead, they freeze.

In comparison, linux installed numpy and scipy already packed with openblas:
lib/python3.7/site-packages/numpy/.libs/libopenblasp-r0-34a18dc3.3.7.so
lib/python3.7/site-packages/scipy/.libs/libopenblasp-r0-34a18dc3.3.7.so
Python 3.7.3 (default, Dec 20 2019, 18:57:59)
[GCC 8.3.0] on linux
numpy.version
'1.18.2'

Different python3 versions in combination with different numpy / scipy versions stable / testing installed from debian repositories deb.debian.org/debian did start freezing beginning 23 March 2020 forcing single processor computation or risky computation. The freeze happened between matrix sizes of 1000 x 50 and 5000 x 100 and around 5000 x 50. An initial parallel programming freeze related experiment of mine was timestamped by IPython at Tue Mar 24 02:51:02 2020. A task that takes about 0.05 seconds to finish in serial mode freeze in parallel mode. This appears joblib and multiprocessing of python related. However, as I investigate further using scikit-learn sklearn.neural_network.MLPRegressor, there I found the freezing part in a @ b, which appears unnecessary instead of using a.dot(b) in that part of the code:
https://github.com/scikit-learn/scikit-learn/blame/bd9fd0f1a9a222c58bbf8aba45025d42c598a31e/sklearn/utils/extmath.py#L151

Note: python test_buffer.py freeze is mentioned here:
https://bugs.gentoo.org/599122

The concurrent findings of ...

freezes in joblib and scikit-learn code calling numpy modules hampering parallel processing,
with python when using -Ofast hampering faster code compilation,
and with numpy test errors with testing versions of libc6 2.30-4 and gcc-9 hampering robust results (might be -Ofast related as well or related to multiple factors),

may require white papers on the topics.

matmul's a @ b operator and python's float(3e400) operator both freeze in the contexts mentioned:
https://github.com/python/cpython/blob/8821200d85657ef3bbec78dcb43694449c05e896/Lib/test/test_buffer.py#L2854

Thank you for asking. I see new issues popping up as of today, which might be related:
https://bugs.python.org/issue36780
The freeze in parallel processing happened before the call to concurrent.futures.as_completed.

seberg · 2020-04-14T20:14:10Z

@bicici the issue you link in joblib notes clearly that this is an MKL bug/issue, do you think there is anything to do with NumPy at all?
If conda would be shipping compliation flags that run into this/make it worse, it may make sense to bring it up. If we have default compile flags we should change that also may be possible. But it sounds all like you are running into a bug in MKL while using non-standard compile flags.

Is your intention to ping MKL developers here, or is there something you expect from NumPy? I am seriously asking.

bicici · 2020-04-15T02:46:35Z

My trail currently leads me to the a @ b operator documented and used by numpy after its introduction by python. I blame the a @ b matmul operator for the freeze. Sparse multiplications can take significantly longer than dense however I find this as a less likely explanation in this case.

joblib's MKL issue might be linked to the same @ operator. a @ b operator might be critical and numpy's code using MKL can be compared with openblas. My findings point to an enlarging baloon which you might have experience with and it looks better that it is needled earlier (touched with a needle :) ). I only opened a ticket about the issue. The issue remains at large serious. I expect gains from informing both MKL and numpy. Thank you for taking action.

Parallel matmul calls are failing in the test program. Maybe numpy matrix multiplication is not supposed to be run in parallel, then this is deeper than python global interpreter lock (GIL). I checked again and numpy.dot calls also froze in the test program: using X.dot(X.T) instead of X @ X.T.

bicici · 2020-04-16T22:25:37Z

The default python in linux is also freezing on the following code:
dask/dask#3759
python is using fork instead of spawn in multiprocessing by default in context and this appears to be a related issue. For some reason this information is not popping up with google / bing search and reveals itself only after digging until numpy.matmul.

The stable version 3.7 of python is currently freezing and it should not be considered stable even though its tests pass during installation.

I use tbb threads and follow the approach in dask/dask#3759 of using 'spawn' context in multiprocessing which seems to fix.

This was referenced Apr 15, 2020

matmul operator freeze within safe_sparse_dot and bug fix scikit-learn/scikit-learn#16919

Closed

parallel processes freezing when matrices are too big joblib/joblib#138

Open

bicici changed the title ~~matmul operator @ can freeze / hang~~ matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn Apr 16, 2020

bicici closed this as completed Apr 16, 2020

piotrpiatyszek mentioned this issue Apr 22, 2021

Default multiprocessing context can cause freezes on Linux ModelOriented/DALEX#412

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn #15973

matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn #15973

bicici commented Apr 14, 2020 •

edited

Loading

seberg commented Apr 14, 2020

bicici commented Apr 14, 2020 •

edited

Loading

seberg commented Apr 14, 2020

bicici commented Apr 15, 2020 •

edited

Loading

bicici commented Apr 16, 2020 •

edited

Loading

matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn #15973

matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn #15973

Comments

bicici commented Apr 14, 2020 • edited Loading

seberg commented Apr 14, 2020

bicici commented Apr 14, 2020 • edited Loading

seberg commented Apr 14, 2020

bicici commented Apr 15, 2020 • edited Loading

bicici commented Apr 16, 2020 • edited Loading

bicici commented Apr 14, 2020 •

edited

Loading

bicici commented Apr 14, 2020 •

edited

Loading

bicici commented Apr 15, 2020 •

edited

Loading

bicici commented Apr 16, 2020 •

edited

Loading