Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn #15973

Closed
bicici opened this issue Apr 14, 2020 · 5 comments

Comments

@bicici
Copy link

bicici commented Apr 14, 2020

The freeze / hang can happen with large matrices and in parallel settings. For instance, sklearn/neural_network/_multilayer_perceptron.py is using safe_sparse_dot, which calls the matmul operator @ for ret = a @ b. Affected package:
sklearn/neural_network/_multilayer_perceptron.py
from ..utils.extmath import safe_sparse_dot
ret = a @ b

python also freeze with expressions like exp(3e400) for float('inf') with -Ofast in test_buffer.py and such freeze may be related to those types of operators together with -Ofast in cpython. Therefore, compiling with fewer optimization flags might also overcome the issue and prevent @ to freeze the program. The freeze occurs when matrices are about larger than 5000 x 100.

MKL inteloneapi 2021.1-beta05 is freezing. openblas is not freezing.

Test program:
``
import concurrent.futures
from numpy import random, matmul

def mmtest(X, i):
print ('matmul @ call', i)
y_hat = X @ X.T
print ('done', i)
return y_hat

def mmtest_matmul(X, i):
print ('matmul func call', i)
y_hat = matmul(X, X.T)
print ('done', i)
return y_hat

def f_mpmm(X):
executor = concurrent.futures.ProcessPoolExecutor(7)
futures = []
futures.append(executor.submit(mmtest, X, 0))
futures.append(executor.submit(mmtest, X, 1))
futures.append(executor.submit(mmtest, X, 2))
futures.append(executor.submit(mmtest, X, 3))
futures.append(executor.submit(mmtest_matmul, X, 4))
futures.append(executor.submit(mmtest_matmul, X, 5))
futures.append(executor.submit(mmtest_matmul, X, 6))
concurrent.futures.wait(futures)
executor.shutdown()

def f_mm(X):
mmtest(X, 0)
mmtest(X, 1)
mmtest(X, 2)
mmtest(X, 3)
mmtest_matmul(X, 4)
mmtest_matmul(X, 5)
mmtest_matmul(X, 6)

def test():
X = random.randn(5000, 100); y = random.randn(5000)
print ('testing serial')
f_mm(X)
print ('testing multiprocessing')
f_mpmm(X)

if name == 'main':
test()
``

Test output with numpy built with Intel MKL:
testing serial
matmul @ call 0
done 0
matmul @ call 1
done 1
matmul @ call 2
done 2
matmul @ call 3
done 3
matmul func call 4
done 4
matmul func call 5
done 5
matmul func call 6
done 6
testing multiprocessing
matmul @ call 0
matmul @ call 1
matmul @ call 2
matmul @ call 3
matmul func call 4
matmul func call 5
matmul func call 6

[frozen]

Test output with numpy built with openblas:
testing serial
matmul @ call 0
done 0
matmul @ call 1
done 1
matmul @ call 2
done 2
matmul @ call 3
done 3
matmul func call 4
done 4
matmul func call 5
done 5
matmul func call 6
done 6
testing multiprocessing
matmul @ call 0
matmul @ call 1
matmul @ call 2
matmul @ call 3
matmul func call 4
matmul func call 5
matmul func call 6
done 0
done 1
done 2
done 3
done 6
done 4
done 5

Related files:
sklearn/neural_network/_multilayer_perceptron.py
sklearn/utils/extmath.py

Related issues:
"parallel processes freezing when matrices are too big"
joblib/joblib#138
"matmul operator freeze within safe_sparse_dot and bug fix"
scikit-learn/scikit-learn#16919

@seberg
Copy link
Member

seberg commented Apr 14, 2020

@bicici can you give more details? I.e. I am not sure what you mean by "openblas ovecome this". Are you using an Intel patched version of NumPy or plain NumPy?
Or is it that a @ b just happens to be very slow compared to a.dot(b)? The last point could be since I think we sometimes do not call into blas for non-contiugous input with the @ operator. Something that may be a performance bug in some cases, and which maybe we can find a solution for.

@bicici
Copy link
Author

bicici commented Apr 14, 2020

When numpy use openblas, as noticed in joblib/joblib#138, does not freeze. When the same github numpy code (numpy1.18.x) and settings, compiled from source without error, pick Intel mkl libraries instead, they freeze.

In comparison, linux installed numpy and scipy already packed with openblas:
lib/python3.7/site-packages/numpy/.libs/libopenblasp-r0-34a18dc3.3.7.so
lib/python3.7/site-packages/scipy/.libs/libopenblasp-r0-34a18dc3.3.7.so
Python 3.7.3 (default, Dec 20 2019, 18:57:59)
[GCC 8.3.0] on linux
numpy.version
'1.18.2'

Different python3 versions in combination with different numpy / scipy versions stable / testing installed from debian repositories deb.debian.org/debian did start freezing beginning 23 March 2020 forcing single processor computation or risky computation. The freeze happened between matrix sizes of 1000 x 50 and 5000 x 100 and around 5000 x 50. An initial parallel programming freeze related experiment of mine was timestamped by IPython at Tue Mar 24 02:51:02 2020. A task that takes about 0.05 seconds to finish in serial mode freeze in parallel mode. This appears joblib and multiprocessing of python related. However, as I investigate further using scikit-learn sklearn.neural_network.MLPRegressor, there I found the freezing part in a @ b, which appears unnecessary instead of using a.dot(b) in that part of the code:
https://github.com/scikit-learn/scikit-learn/blame/bd9fd0f1a9a222c58bbf8aba45025d42c598a31e/sklearn/utils/extmath.py#L151

Note: python test_buffer.py freeze is mentioned here:
https://bugs.gentoo.org/599122

The concurrent findings of ...

  • freezes in joblib and scikit-learn code calling numpy modules hampering parallel processing,
  • with python when using -Ofast hampering faster code compilation,
  • and with numpy test errors with testing versions of libc6 2.30-4 and gcc-9 hampering robust results (might be -Ofast related as well or related to multiple factors),

may require white papers on the topics.

matmul's a @ b operator and python's float(3e400) operator both freeze in the contexts mentioned:
https://github.com/python/cpython/blob/8821200d85657ef3bbec78dcb43694449c05e896/Lib/test/test_buffer.py#L2854

Thank you for asking. I see new issues popping up as of today, which might be related:
https://bugs.python.org/issue36780
The freeze in parallel processing happened before the call to concurrent.futures.as_completed.

@seberg
Copy link
Member

seberg commented Apr 14, 2020

@bicici the issue you link in joblib notes clearly that this is an MKL bug/issue, do you think there is anything to do with NumPy at all?
If conda would be shipping compliation flags that run into this/make it worse, it may make sense to bring it up. If we have default compile flags we should change that also may be possible. But it sounds all like you are running into a bug in MKL while using non-standard compile flags.

Is your intention to ping MKL developers here, or is there something you expect from NumPy? I am seriously asking.

@bicici
Copy link
Author

bicici commented Apr 15, 2020

My trail currently leads me to the a @ b operator documented and used by numpy after its introduction by python. I blame the a @ b matmul operator for the freeze. Sparse multiplications can take significantly longer than dense however I find this as a less likely explanation in this case.

joblib's MKL issue might be linked to the same @ operator. a @ b operator might be critical and numpy's code using MKL can be compared with openblas. My findings point to an enlarging baloon which you might have experience with and it looks better that it is needled earlier (touched with a needle :) ). I only opened a ticket about the issue. The issue remains at large serious. I expect gains from informing both MKL and numpy. Thank you for taking action.

Parallel matmul calls are failing in the test program. Maybe numpy matrix multiplication is not supposed to be run in parallel, then this is deeper than python global interpreter lock (GIL). I checked again and numpy.dot calls also froze in the test program: using X.dot(X.T) instead of X @ X.T.

@bicici bicici changed the title matmul operator @ can freeze / hang matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn Apr 16, 2020
@bicici
Copy link
Author

bicici commented Apr 16, 2020

The default python in linux is also freezing on the following code:
dask/dask#3759
python is using fork instead of spawn in multiprocessing by default in context and this appears to be a related issue. For some reason this information is not popping up with google / bing search and reveals itself only after digging until numpy.matmul.

The stable version 3.7 of python is currently freezing and it should not be considered stable even though its tests pass during installation.

I use tbb threads and follow the approach in dask/dask#3759 of using 'spawn' context in multiprocessing which seems to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants