multiprocessing's default posix start method of `'fork'` is broken: change to ``'forkserver' || 'spawn'` #84559

itamarst · 2020-04-24T18:22:23Z

BPO	40379
Nosy	@pitrou, @mgorny, @Julian, @wimglenn, @applio, @itamarst

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-04-24.18:22:23.389>
labels = ['3.8', 'type-bug', '3.7', '3.9']
title = "multiprocessing's default start method of fork()-without-exec() is broken"
updated_at = <Date 2022-02-11.16:13:53.872>
user = 'https://bugs.python.org/itamarst'

bugs.python.org fields:

activity = <Date 2022-02-11.16:13:53.872>
actor = 'mgorny'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2020-04-24.18:22:23.389>
creator = 'itamarst'
dependencies = []
files = []
hgrepos = []
issue_num = 40379
keywords = []
message_count = 11.0
messages = ['367210', '367211', '368173', '380478', '392358', '392501', '392503', '392506', '392507', '392508', '413081']
nosy_count = 8.0
nosy_names = ['pitrou', 'mgorny', 'Julian', 'wim.glenn', 'itamarst', 'davin', 'itamarst2', 'aduncan']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue40379'
versions = ['Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9']

Linked PRs

The text was updated successfully, but these errors were encountered:

itamarst · 2020-04-24T18:22:23Z

By default, multiprocessing uses fork() without exec() on POSIX. For a variety of reasons this can lead to inconsistent state in subprocesses: module-level globals are copied, which can mess up logging, threads don't survive fork(), etc..

The end results vary, but quite often are silent lockups.

In real world usage, this results in users getting mysterious hangs they do not have the knowledge to debug.

The fix for these people is to use "spawn" by default, which is the default on Windows.

Just a small sample:

Today I talked to a scientist who spent two weeks stuck, until she found my article on the subject (https://codewithoutrules.com/2018/09/04/python-multiprocessing/). Basically multiprocessing locked up, doing nothing forever. Switching to "spawn" fixed it.
Default multiprocessing context is broken and should never be used dask/dask#3759 (comment) is someone who had issues fixed by "spawn".
matmul operator @ can freeze / hang when used with default python multiprocessing using fork context instead of spawn numpy/numpy#15973 is a NumPy issue which apparently impacted scikit-learn.

I suggest changing the default on POSIX to match Windows.

itamarst · 2020-04-24T18:31:06Z

Looks like as of 3.8 this only impacts Linux/non-macOS-POSIX, so I'll amend the above to say this will also make it consistent with macOS.

itamarst · 2020-05-05T15:35:22Z

Just got an email from someone for whom switching to "spawn" fixed a problem. Earlier this week someone tweeted about this fixing things. This keeps hitting people in the real world.

itamarst · 2020-11-06T22:02:53Z

Another person with the same issue: https://twitter.com/volcan01010/status/1324764531139248128

aduncan · 2021-04-29T23:10:59Z

I just ran into and fixed (thanks to itamarst's blog post) a problem likely related to this.

Multiprocessing workers performing work and sending a logging message back with success/fail info. I had a few intermittent deadlocks that became a recurring problem when I sped up the process that skipped tasks which had previously completed (I think this shortened the time between forking and attempting to send messages causing the third process to deadlock). After changing that it deadlocked *every time*.

Switching to "spawn" at the top of the main function has fixed it.

pitrou · 2021-04-30T18:54:13Z

The problem with changing the default is that this will break any application that depends on passing non-picklable data to the child process (in addition to the potentially unexpected performance impact).

The docs already contain a significant elaboration on the matter, but feel free to submit a PR that would make the various caveats more explicit:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

itamarst · 2021-04-30T19:00:00Z

This change was made on macOS at some point, so why not Linux? "spawn" is already the default on macOS and Windows.

pitrou · 2021-04-30T19:15:18Z

The macOS change was required before "fork" simply ceased to work.
Windows has always used "spawn", because no other method can be implemented on Windows.

itamarst · 2021-04-30T19:27:07Z

Given people's general experience, I would not say that "fork" works on Linux either. More like "99% of the time it works, 1% it randomly breaks in mysterious way".

pitrou · 2021-04-30T19:31:25Z

Agreed, but again, changing will break some applications.

We could switch to forkserver, but we should have a transition period where a FutureWarning will be displayed if people didn't explicitly set a start method.

mgorny · 2022-02-11T16:13:54Z

After updating PyPy3 to use Python 3.9's stdlib, we hit very bad hangs because of this — literally compiling a single file with "parallel" compileall could hang. In the end, we had to revert the change in how Python 3.9 starts workers because otherwise multiprocessing would be impossible to use:

https://foss.heptapod.net/pypy/pypy/-/commit/c594b6c48a48386e8ac1f3f52d4b82f9c3e34784

This is a very bad default and what's even worse is that it often causes deadlocks that are hard to reproduce or debug. Furthermore, since "fork" is the default, people are unintentionally relying on its support for passing non-pickleable projects and are creating non-portable code. The code often becomes complex and hard to change before they discover the problem.

Before we managed to figure out how to workaround the deadlocks in PyPy3, we were experimenting with switching the default to "spawn". Unfortunately, we've hit multiple projects that didn't work with this method, precisely because of pickling problems. Furthermore, they were surprised to learn that their code wouldn't work on macOS (in the end, many people perceive Python as a language for writing portable software).

Finally, back in 2018 I've made one of my projects do parallel work using multiprocessing. It gave its users great speedup but for some it caused deadlocks that I couldn't reproduce nor debug. In the end, I had to revert it. Now that I've learned about this problem, I'm wondering if this wasn't precisely because of "fork" method.

Provide a way for the calling code to specify which "multiprocessing context" to use to spawn subprocesses. See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods I'm using this to allow us to mock out multiprocessing with multithreading in doctests. This will also let you more easily test differences between "spawn" and "fork" modes. I'm defaulting to using "spawn" because I think "fork" mode was the cause of some mysterious hanging in tests. General consensus seems to be "spawn" is less buggy: python/cpython#84559 I've felt like tests are consistently faster with it. Also uses the `multiprocessing.Manager` as a context manager so it gets cleaned up correctly. This might have been the cause of other hanging in local cluster execution.

itamarst · 2022-09-21T22:03:51Z

Another example: Nelson Elhage reports that "as of recently(?) pytorch silently deadlocks (even without GPUs involved at all) using method=fork so that's been fun to debug".

Examples he provided:

Tensor operation hangs when used with multiprocessing pytorch/pytorch#82843
Dataloader hangs. Potential deadlock with set_num_threads in worker processes? pytorch/pytorch#75147
Deadlock with multiprocessing (using fork) and OpenMP / PyTorch should warn after OMP and fork that multithreading may be broken pytorch/pytorch#17199

ravwojdyla · 2022-12-06T22:15:28Z

After updating a couple of libraries in a project we are working on, the code would hang without much explanation. After much debugging, I think one of the reasons for our issues is the forking default (this issue). Our business logic does not use multiprocessing, but the underlying execution engine does (in our case Luigi). Turns out that gRPC client (which was buried deep into one of our dependencies) can hang in some cases when forked grpc/grpc#18075. This was the case for us, and was very tricky to debug.

gpshead · 2022-12-13T21:21:32Z

general plan:

A DeprecationWarning in 3.12 and 3.13 when the default not-explicitly-specified start method of fork is used on platforms where that is the default.
3.14: flip the default for all platforms to spawn.

per https://discuss.python.org/t/switching-default-multiprocessing-context-to-spawn-on-posix-as-well/21868

In order to migrate away from unsafe os.fork() usage in threaded processes (python/cpython#84559), add a returnproc parameter that is similar to returnpid, which causes spawn to return Process objects instead of pids. The Process API is a subset of asyncio.subprocess.Process. In the future, spawn will return objects of a different type but with a compatible interface to Process, in order to encapsulate implementation-dependent objects like multiprocessing.Process which are designed to manage the process lifecycle and need to persist until it exits. Trigger a UserWarning when the returnpid parameter is used, in order to encourage migration to returnproc (do not use DeprecationWarning since it is hidden by default). This warning will be temporarily suppressed for portage internals, until they finish migrating to returnproc. There are probably very few if any external consumers of spawn with the returnpid parameter, so it seems safe to move quickly with this deprecation. Bug: https://bugs.gentoo.org/916566 Signed-off-by: Zac Medico <[email protected]>

In order to migrate away from unsafe os.fork() usage in threaded processes (python/cpython#84559), add a returnproc parameter that is similar to returnpid, which causes spawn to return a single Process object instead of a list of pids. The Process API is a subset of asyncio.subprocess.Process. The returnproc parameter conflicts with the logfile parameter, since the caller is expected to use the fd_pipes parameter to implement logging (this was also true for the returnpid parameter). In the future, spawn will return objects of a different type but with a compatible interface to Process, in order to encapsulate implementation-dependent objects like multiprocessing.Process which are designed to manage the process lifecycle and need to persist until it exits. Trigger a UserWarning when the returnpid parameter is used, in order to encourage migration to returnproc (do not use DeprecationWarning since it is hidden by default). This warning will be temporarily suppressed for portage internals, until they finish migrating to returnproc. There are probably very few if any external consumers of spawn with the returnpid parameter, so it seems safe to move quickly with this deprecation. Bug: https://bugs.gentoo.org/916566 Signed-off-by: Zac Medico <[email protected]>

mcepl · 2024-03-28T15:29:25Z

The article is now https://pythonspeed.com/articles/python-multiprocessing/

Use forkserver when available to avoid issues with forking multi-threaded processes. See python/cpython#84559 for more context. This also removes a warning when running on python 3.12: ``` /opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1991) is multi-threaded, use of fork() may lead to deadlocks in the child. self.pid = os.fork() ```

Use forkserver when available to avoid issues with forking multi-threaded processes. See [1] for more context. This also removes a warning when running on python 3.12 that can be seen in CI: > /opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1991) is multi-threaded, use of fork() may lead to deadlocks in the child. > self.pid = os.fork() [1] python/cpython#84559

Provide a way for the calling code to specify which "multiprocessing context" to use to spawn subprocesses. See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods I'm using this to allow us to mock out multiprocessing with multithreading in doctests. This will also let you more easily test differences between "spawn" and "fork" modes. I'm defaulting to using "spawn" because I think "fork" mode was the cause of some mysterious hanging in tests. General consensus seems to be "spawn" is less buggy: python/cpython#84559 I've felt like tests are consistently faster with it. Also uses the `multiprocessing.Manager` as a context manager so it gets cleaned up correctly. This might have been the cause of other hanging in local cluster execution.

…in the child. See python/cpython#84559 and of course https://pythonspeed.com/articles/python-multiprocessing/

The default of `fork` is known to be problematic. Python itself is changing the default to `spawn`. The new default is expected to be in place for Python 3.14. Python references for the change to the default: * python/cpython#84559 * python/cpython#100618 We also have several places where this option had to be set to `spawn` to make tests work. The AMD code even checks and overrides the value if it's not set to `spawn`. Simplify things for everyone and just default to `spawn`, but leave the option in place just in case, at least for now. Signed-off-by: Russell Bryant <[email protected]>

…ver` (GH-101556) Change the default multiprocessing start method away from fork to forkserver or spawn on the remaining platforms where it was fork. See the issue for context. This makes the default far more thread safe (other than for people spawning threads at import time... - don't do that!). Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <[email protected]>

itamarst mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes type-bug An unexpected behavior, bug, or error labels Apr 24, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

vstinner added the topic-multiprocessing label Jun 6, 2022

ravwojdyla mentioned this issue Dec 6, 2022

Add support/option to pick TaskProcess worker process context spotify/luigi#3212

Open

gpshead changed the title ~~multiprocessing's default start method of fork()-without-exec() is broken~~ multiprocessing's default posix start method of fork()-without-exec() is broken: change the default so spawn Dec 13, 2022

gpshead added type-feature A feature request or enhancement and removed 3.9 only security fixes 3.8 (EOL) end of life 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Dec 13, 2022

gpshead self-assigned this Dec 13, 2022

erlend-aasland added the 3.13 bugs and security fixes label Jan 5, 2024

dentarg mentioned this issue Jan 8, 2024

[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called rails/rails#38560

Open

zmedico mentioned this issue Feb 1, 2024

process.spawn: Add returnproc parameter gentoo/portage#1247

Merged

mtelka mentioned this issue Feb 8, 2024

test_multiprocessing_fork raises a DeprecationWarning #114041

Closed

SyntaxColoring mentioned this issue Feb 10, 2024

perf(robot-server): Parallelize the migration to schema 3 Opentrons/opentrons#14465

Merged

gpadres mentioned this issue Mar 2, 2024

multiprocessing spawn and monkeypatch pytest-dev/pytest#12045

Closed

chenyuxyz mentioned this issue Mar 22, 2024

search: change to use "spawn" and limit the number of tasks per child tinygrad/tinygrad#3862

Merged

garrison mentioned this issue Apr 5, 2024

Timeout during generate_cutting_experiments in how_to_generate_exact_sampling_coefficients.ipynb Qiskit/qiskit-addon-cutting#515

Closed

eltoder mentioned this issue Apr 19, 2024

Use forkserver start method for multiprocessing CleanCut/green#296

Merged

rgommers mentioned this issue Jun 5, 2024

CI Add CPython 3.13 free-threaded test joblib/joblib#1589

Merged

kevint324 mentioned this issue Jul 12, 2024

dataloader hangs with fork and pin memory=True pytorch/pytorch#130610

Open

TheTechromancer mentioned this issue Jul 21, 2024

Tests Hang Arbitrarily blacklanternsecurity/bbot#1571

Closed

maurerle added a commit to maurerle/mango that referenced this issue Sep 2, 2024

forking for new processes is discouraged and might lead to deadlocks …

689530e

…in the child. See python/cpython#84559 and of course https://pythonspeed.com/articles/python-multiprocessing/

maurerle mentioned this issue Sep 2, 2024

forking for new processes is discouraged and might lead to deadlocks in the child. OFFIS-DAI/mango#94

Closed

russellb mentioned this issue Sep 18, 2024

[Core] Change multiproc method default to spawn vllm-project/vllm#8576

Closed

gpshead removed the 3.13 bugs and security fixes label Sep 19, 2024

gpshead changed the title ~~multiprocessing's default posix start method of 'fork' is broken: change to 'spawn'~~ multiprocessing's default posix start method of 'fork' is broken: change to `'forkserver' || 'spawn' Sep 19, 2024

gpshead added the 3.14 new features, bugs and security fixes label Sep 19, 2024

gtsiam mentioned this issue Sep 23, 2024

Move more things out of FrigateApp blakeblackshear/frigate#13897

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiprocessing's default posix start method of `'fork'` is broken: change to ``'forkserver' || 'spawn'` #84559

multiprocessing's default posix start method of `'fork'` is broken: change to ``'forkserver' || 'spawn'` #84559

itamarst mannequin commented Apr 24, 2020 •

edited by bedevere-bot

Loading

itamarst mannequin commented Apr 24, 2020

itamarst mannequin commented Apr 24, 2020

itamarst mannequin commented May 5, 2020

itamarst mannequin commented Nov 6, 2020

aduncan mannequin commented Apr 29, 2021

pitrou commented Apr 30, 2021

itamarst mannequin commented Apr 30, 2021

pitrou commented Apr 30, 2021

itamarst mannequin commented Apr 30, 2021

pitrou commented Apr 30, 2021

mgorny mannequin commented Feb 11, 2022

itamarst commented Sep 21, 2022

ravwojdyla commented Dec 6, 2022

gpshead commented Dec 13, 2022

mcepl commented Mar 28, 2024

multiprocessing's default posix start method of 'fork' is broken: change to `'forkserver' || 'spawn' #84559

multiprocessing's default posix start method of 'fork' is broken: change to `'forkserver' || 'spawn' #84559

Comments

itamarst mannequin commented Apr 24, 2020 • edited by bedevere-bot Loading

Linked PRs

itamarst mannequin commented Apr 24, 2020

itamarst mannequin commented Apr 24, 2020

itamarst mannequin commented May 5, 2020

itamarst mannequin commented Nov 6, 2020

aduncan mannequin commented Apr 29, 2021

pitrou commented Apr 30, 2021

itamarst mannequin commented Apr 30, 2021

pitrou commented Apr 30, 2021

itamarst mannequin commented Apr 30, 2021

pitrou commented Apr 30, 2021

mgorny mannequin commented Feb 11, 2022

itamarst commented Sep 21, 2022

ravwojdyla commented Dec 6, 2022

gpshead commented Dec 13, 2022

mcepl commented Mar 28, 2024

multiprocessing's default posix start method of `'fork'` is broken: change to ``'forkserver' || 'spawn'` #84559

multiprocessing's default posix start method of `'fork'` is broken: change to ``'forkserver' || 'spawn'` #84559

itamarst mannequin commented Apr 24, 2020 •

edited by bedevere-bot

Loading