Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-91539: improve performance of get_proxies_environment #91566

Merged
merged 26 commits into from
Oct 5, 2022

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Apr 15, 2022

Improve performance of get_proxies_environment when there are many environment variables. Improvements depend on the number of environment variables, but the method is several times faster in this PR.

Fixes #91539

Performance test details

Results:

0.1634683609008789
0.024187326431274414

with

import os
import time
import urllib.request

if 0:
    os.environ={ f'{ii}': 1 for ii in range(8000)}
    os.environ.update({ f'{ii}_proxy': 1 for ii in range(30)})

def getproxies_environment():
    """Return a dictionary of scheme -> proxy server URL mappings.
    Scan the environment for variables named <scheme>_proxy;
    this seems to be the standard convention.  If you need a
    different way, you can pass a proxies dictionary to the
    [Fancy]URLopener constructor.
    """
    # in order to prefer lowercase variables, process environment in
    # two passes: first matches any, second pass matches lowercase only

    # select only environment variables which end in (after making lowercase) _proxy 
    candidate_names = [name for name in os.environ.keys() if len(name)>5 and name[-6]=='_'] # fast selection of candidates
    environment = [(name, os.environ[name], name.lower()) for name in candidate_names if name[-6:].lower()=='_proxy'] 

    proxies = {}
    for name, value, name_lower in environment:
        if value and name_lower[-6:] == '_proxy':
            proxies[name_lower[:-6]] = value
    # CVE-2016-1000110 - If we are running as CGI script, forget HTTP_PROXY
    # (non-all-lowercase) as it may be set from the web server by a "Proxy:"
    # header from the client
    # If "proxy" is lowercase, it will still be used thanks to the next block
    if 'REQUEST_METHOD' in os.environ:
        proxies.pop('http', None)
    for name, value, name_lower in environment:
        if name[-6:] == '_proxy':
            if value:
                proxies[name_lower[:-6]] = value
            else:
                proxies.pop(name_lower[:-6], None)
    return proxies

nn=400
t0=time.time()
for ii in range(nn):
    urllib.request.getproxies_environment()
dt=time.time()-t0
print(dt)

t0=time.time()
for ii in range(nn):
    getproxies_environment()
dt=time.time()-t0
print(dt)

@eendebakpt eendebakpt marked this pull request as draft April 15, 2022 11:31
@eendebakpt eendebakpt marked this pull request as ready for review April 15, 2022 12:18
Lib/urllib/request.py Outdated Show resolved Hide resolved
Lib/urllib/request.py Outdated Show resolved Hide resolved
Lib/urllib/request.py Outdated Show resolved Hide resolved
Lib/urllib/request.py Outdated Show resolved Hide resolved
@eendebakpt eendebakpt force-pushed the performance/getproxies_environment branch 2 times, most recently from 0f694c9 to aeb96ea Compare May 18, 2022 08:14
@eendebakpt eendebakpt force-pushed the performance/getproxies_environment branch from aeb96ea to f961505 Compare May 18, 2022 08:52
@eendebakpt
Copy link
Contributor Author

@carljm Thanks for the suggestion. Benchmarks show it is just as fast, and much cleaner code.

Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this in first review, but the name of the function has a typo (extra underscore) in the NEWS entry.

Also suggested an added comment. (Wouldn't have bothered with this if it was the only thing, but may be worth it if you are doing one more update to fix NEWS anyway.)

Thanks for the improvements to this function!

Lib/urllib/request.py Show resolved Hide resolved
@eendebakpt
Copy link
Contributor Author

@ambv As the latest core dev touching Lib/urllib/request.py, would you be able to review this PR?

@iritkatriel iritkatriel added performance Performance or resource usage 3.12 bugs and security fixes labels Aug 31, 2022
@orsenthil orsenthil self-assigned this Oct 5, 2022
@orsenthil orsenthil added needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels Oct 5, 2022
Copy link
Member

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@orsenthil orsenthil merged commit aeb28f5 into python:main Oct 5, 2022
@miss-islington
Copy link
Contributor

Thanks @eendebakpt for the PR, and @orsenthil for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-97918 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Oct 5, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 5, 2022
…nGH-91566)

* improve performance of get_proxies_environment when there are many environment variables

* 📜🤖 Added by blurb_it.

* fix case of short env name

* fix formatting

* fix whitespace

* whitespace

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* whitespace

* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <[email protected]>
(cherry picked from commit aeb28f5)

Co-authored-by: Pieter Eendebak <[email protected]>
@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Oct 5, 2022
@bedevere-bot
Copy link

GH-97919 is a backport of this pull request to the 3.10 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 5, 2022
…nGH-91566)

* improve performance of get_proxies_environment when there are many environment variables

* 📜🤖 Added by blurb_it.

* fix case of short env name

* fix formatting

* fix whitespace

* whitespace

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* whitespace

* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <[email protected]>
(cherry picked from commit aeb28f5)

Co-authored-by: Pieter Eendebak <[email protected]>
miss-islington added a commit that referenced this pull request Oct 5, 2022
* improve performance of get_proxies_environment when there are many environment variables

* 📜🤖 Added by blurb_it.

* fix case of short env name

* fix formatting

* fix whitespace

* whitespace

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* whitespace

* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <[email protected]>
(cherry picked from commit aeb28f5)

Co-authored-by: Pieter Eendebak <[email protected]>
@eendebakpt eendebakpt deleted the performance/getproxies_environment branch October 5, 2022 19:23
@eendebakpt eendebakpt restored the performance/getproxies_environment branch October 5, 2022 19:23
@eendebakpt eendebakpt deleted the performance/getproxies_environment branch October 5, 2022 19:23
@eendebakpt eendebakpt restored the performance/getproxies_environment branch October 5, 2022 19:24
carljm added a commit to carljm/cpython that referenced this pull request Oct 6, 2022
* main: (66 commits)
  pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879)
  docs(typing): add "see PEP 675" to LiteralString (python#97926)
  pythongh-97850: Remove all known instances of module_repr() (python#97876)
  I changed my surname early this year (python#96671)
  pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768)
  pythongh-91539: improve performance of get_proxies_environment  (python#91566)
  build(deps): bump actions/stale from 5 to 6 (python#97701)
  pythonGH-95172 Make the same version `versionadded` oneline (python#95172)
  pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073)
  pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774)
  pythongh-93357: Port test cases to IsolatedAsyncioTestCase, part 2 (python#97896)
  pythongh-95196: Disable incorrect pickling of the C implemented classmethod descriptors (pythonGH-96383)
  pythongh-97758: Fix a crash in getpath_joinpath() called without arguments (pythonGH-97759)
  pythongh-74696: Pass root_dir to custom archivers which support it (pythonGH-94251)
  pythongh-97661: Improve accuracy of sqlite3.Cursor.fetchone docs (python#97662)
  pythongh-87092: bring compiler code closer to a preprocessing-opt-assembler organisation (pythonGH-97644)
  pythonGH-96704: Add {Task,Handle}.get_context(), use it in call_exception_handler() (python#96756)
  pythongh-93738: Documentation C syntax (:c:type:`PyTypeObject*` -> :c:expr:`PyTypeObject*`) (python#97778)
  pythongh-97825: fix AttributeError when calling subprocess.check_output(input=None) with encoding or errors args (python#97826)
  Add re.VERBOSE flag documentation example (python#97678)
  ...
carljm added a commit to carljm/cpython that referenced this pull request Oct 8, 2022
* main: (53 commits)
  pythongh-94808: Coverage: Test that maximum indentation level is handled (python#95926)
  pythonGH-88050: fix race in closing subprocess pipe in asyncio  (python#97951)
  pythongh-93738: Disallow pre-v3 syntax in the C domain (python#97962)
  pythongh-95986: Fix the example using match keyword (python#95989)
  pythongh-97897: Prevent os.mkfifo and os.mknod segfaults with macOS 13 SDK (pythonGH-97944)
  pythongh-94808: Cover `PyUnicode_Count` in CAPI (python#96929)
  pythongh-94808: Cover `PyObject_PyBytes` case with custom `__bytes__` method (python#96610)
  pythongh-95691: Doc BufferedWriter and BufferedReader (python#95703)
  pythonGH-88968: Add notes about socket ownership transfers (python#97936)
  pythongh-96865: [Enum] fix Flag to use CONFORM boundary (pythonGH-97528)
  pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879)
  docs(typing): add "see PEP 675" to LiteralString (python#97926)
  pythongh-97850: Remove all known instances of module_repr() (python#97876)
  I changed my surname early this year (python#96671)
  pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768)
  pythongh-91539: improve performance of get_proxies_environment  (python#91566)
  build(deps): bump actions/stale from 5 to 6 (python#97701)
  pythonGH-95172 Make the same version `versionadded` oneline (python#95172)
  pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073)
  pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774)
  ...
mpage pushed a commit to mpage/cpython that referenced this pull request Oct 11, 2022
…n#91566)

* improve performance of get_proxies_environment when there are many environment variables

* 📜🤖 Added by blurb_it.

* fix case of short env name

* fix formatting

* fix whitespace

* whitespace

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

* whitespace

* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst

Co-authored-by: Carl Meyer <[email protected]>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <[email protected]>

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

speed up urllib.request.getproxies_environment
6 participants