Skip to content

simonw/symbex

Repository files navigation

Symbex

PyPI Changelog Tests License

Find the Python code for specified symbols

ReadSymbex: search Python code for functions and classes, then pipe them into a LLMfor background on this project.

Installation

Install this tool usingpip:

pip install symbex

Or using Homebrew:

brew install simonw/llm/symbex

Usage

symbexcan search for names of functions and classes that occur at the top level of a Python file.

To search every.pyfile in your current directory and all subdirectories, run like this:

symbex my_function

You can search for more than one symbol at a time:

symbex my_function MyClass

Wildcards are supported - to search for everytest_function run this (note the single quotes to avoid the shell interpreting the*as a wildcard):

symbex'test_*'

To search for methods within classes, useclass.methodnotation:

symbex Entry.get_absolute_url

Wildcards are supported here as well:

symbex'Entry.*'
symbex'*.get_absolute_url'
symbex'*.get_*'

Or to view every method of every class:

symbex'*.*'

To search within a specific file, pass that file using the-foption. You can pass this more than once to search multiple files.

symbex MyClass -f my_file.py

To search within a specific directory and all of its subdirectories, use the-d/--directoryoption:

symbex Database -d~/projects/datasette

If you know that you want to inspect one or more modules that can be imported by Python, you can use the-m/--module nameoption. This example shows the signatures for every symbol available in theasynciopackage:

symbex -m asyncio -s --imports

You can search the directory containing the Python standard library using--stdlib.This can be useful for quickly looking up the source code for specific Python library functions:

symbex --stdlib -in to_thread

-inis explained below. If you provide--stdlibwithout any-dor-foptions then--silentwill be turned on automatically, since the standard library otherwise produces a number of different warnings.

The output starts like this:

# from asyncio.threads import to_thread
asyncdefto_thread(func,/,*args,**kwargs):
"""Asynchronouslyrunfunction*func*inaseparatethread.
#...

You can exclude files in specified directories using the-x/--excludeoption:

symbex Database -d~/projects/datasette -x~/projects/datasette/tests

Ifsymbexencounters any Python code that it cannot parse, it will print a warning message and continue searching:

# Syntax error in path/badcode.py: expected ':' (<unknown>, line 1)

Pass--silentto suppress these warnings:

symbex MyClass --silent

Filters

In addition to searching for symbols, you can apply filters to the results.

The following filters are available:

  • --function- only functions
  • --class- only classes
  • --async- onlyasync deffunctions
  • --unasync- only non-async functions
  • --documented- functions/classes that have a docstring
  • --undocumented- functions/classes that do not have a docstring
  • --public- functions/classes that are public - don't have a_nameprefix (or are__*__methods)
  • --private- functions/classes that are private - have a_nameprefix and are not__*__
  • --dunder- functions matching__*__- this should usually be used with*.*to find all dunder methods
  • --typed- functions that have at least one type annotation
  • --untyped- functions that have no type annotations
  • --partially-typed- functions that have some type annotations but not all
  • --fully-typed- functions that have type annotations for every argument and the return value
  • --no-init- Exclude__init__(self)methods. This is useful when combined with--fully-typed '*.*'to avoid returning__init__(self)methods that would otherwise be classified as fully typed, since__init__doesn't need argument or return type annotations.

For example, to see the signatures of everyasync deffunction in your project that doesn't have any type annotations:

symbex -s --async --untyped

For class methods instead of functions, you can combine filters with a symbol search argument of*.*.

This example shows the full source code of every class method in the Python standard library that has type annotations for all of the arguments and the return value:

symbex --fully-typed --no-init'*.*'--stdlib

To find all public functions and methods that lack documentation, just showing the signature of each one:

symbex'*''*.*'--public --undocumented --signatures

Example output

In a fresh checkout ofDatasetteI ran this command:

symbex MessagesDebugView get_long_description

Here's the output of the command:

# File: setup.py Line: 5
defget_long_description():
withopen(
os.path.join(os.path.dirname(os.path.abspath(__file__)),"README.md"),
encoding="utf8",
)asfp:
returnfp.read()

# File: datasette/views/special.py Line: 60
classPatternPortfolioView(View):
asyncdefget(self,request,datasette):
awaitdatasette.ensure_permissions(request.actor,["view-instance"])
returnResponse.html(
awaitdatasette.render_template(
"patterns.html",
request=request,
view_name="patterns",
)
)

Just the signatures

The-s/--signaturesoption will list just the signatures of the functions and classes, for example:

symbex -s -f symbex/lib.py
# File: symbex/lib.py Line: 107
deffunction_definition(function_node:AST):

# File: symbex/lib.py Line: 13
deffind_symbol_nodes(code:str,filename:str,symbols:Iterable[str])->List[Tuple[(AST,Optional[str])]]:

# File: symbex/lib.py Line: 175
defclass_definition(class_def):

# File: symbex/lib.py Line: 209
defannotation_definition(annotation:AST)->str:

# File: symbex/lib.py Line: 227
defread_file(path):

# File: symbex/lib.py Line: 253
classTypeSummary:

# File: symbex/lib.py Line: 258
deftype_summary(node:AST)->Optional[TypeSummary]:

# File: symbex/lib.py Line: 304
defquoted_string(s):

# File: symbex/lib.py Line: 315
defimport_line_for_function(function_name:str,filepath:str,possible_root_dirs:List[str])->str:

# File: symbex/lib.py Line: 37
defcode_for_node(code:str,node:AST,class_name:str,signatures:bool,docstrings:bool)->Tuple[(str,int)]:

# File: symbex/lib.py Line: 71
defadd_docstring(definition:str,node:AST,docstrings:bool,is_method:bool)->str:

# File: symbex/lib.py Line: 82
defmatch(name:str,symbols:Iterable[str])->bool:

This can be combined with other options, or you can runsymbex -sto see every symbol in the current directory and its subdirectories.

To include estimated import paths, such as# from symbex.lib import match,use--imports.These will be calculated relative to the directory you specified, or you can pass one or more--sys-pathoptions to request that imports are calculated relative to those directories as if they were onsys.path:

~/dev/symbex/symbex match --imports -s --sys-path~/dev/symbex

Example output:

# File: symbex/lib.py Line: 82
# from symbex.lib import match
defmatch(name:str,symbols:Iterable[str])->bool:

To suppress the# File:...comments, use--no-fileor-n.

So to both show import paths and suppress File comments, use-inas a shortcut:

symbex -in match

Output:

# from symbex.lib import match
defmatch(name:str,symbols:Iterable[str])->bool:

To include docstrings in those signatures, use--docstrings:

symbex match --docstrings -f symbex/lib.py

Example output:

# File: symbex/lib.py Line: 82
defmatch(name:str,symbols:Iterable[str])->bool:
"Returns True if name matches any of the symbols, resolving wildcards"

Counting symbols

If you just want to count the number of functions and classes that match your filters, use the--countoption. Here's how to count your classes:

symbex --class --count

Or to count every async test function:

symbex --async'test_*'--count

Structured output

LLM defaults to outputting plain text (actually valid Python code, thanks to the way it uses comments).

You can request output in CSV, TSV, JSON or newline-delimited JSON instead, using the following options:

  • --json:a JSON array,[{ "id": "...", "code": "..." }]
  • --nl:newline-delimited JSON,{ "id": "...", "code": "..." }per line
  • --csv:CSV withid,codeas the heading row
  • --tsv:TSV withid\tcodeas the heading row

In each case the ID will be the path to the file containing the symbol, followed by a colon, followed by the line number of the symbol, for example:

{
"id":"symbex/lib.py:82",
"code":"def match(name: str, symbols: Iterable[str]) -> bool:"
}

If you pass-i/--importsthe ID will be the import line instead:

{
"id":"from symbex.lib import match",
"code":"def match(name: str, symbols: Iterable[str]) -> bool:"
}

Pass--id-prefix 'something:'to add the specified prefix to the start of each ID.

This example will generate a CSV file of all of your test functions, using the import style of IDs and a prefix oftest::

symbex'test_*'\
--function \
--imports \
--csv>tests.csv

Using with LLM

This tool is primarily designed to be used withLLM,a CLI tool for working with Large Language Models.

symbexmakes it easy to grab a specific class or function and pass it to thellmcommand.

For example, I ran this in the Datasette repository root:

symbex Response|llm --system'Explain this code, succinctly'

And got back this:

This code defines a customResponseclass with methods for returning HTTP responses. It includes methods for setting cookies, returning HTML, text, and JSON responses, and redirecting to a different URL. Theasgi_sendmethod sends the response to the client using the ASGI (Asynchronous Server Gateway Interface) protocol.

The structured output feature is designed to be used withLLM embeddings.You can generate embeddings for every symbol in your codebase usingllm embed-multilike this:

symbex'*''*:*'--nl|\
llm embed-multi symbols - \
--format nl --database embeddings.db --store

This creates a database inembeddings.dbcontaining all of your symbols along with embedding vectors.

You can then search your code like this:

llm similar symbols -d embeddings.db -c'test csv'|jq

Replacing a matched symbol

The--replaceoption can be used to replace a single matched symbol with content piped in to standard input.

Given a file calledmy_code.pywith the following content:

deffirst_function():
# This will be ignored
pass

defsecond_function():
# This will be replaced
pass

Run the following:

echo"def second_function(a, b):
# This is a replacement implementation
return a + b + 3
"|symbex second_function --replace

The result will be an updated-in-placemy_code.pycontaining the following:

deffirst_function():
# This will be ignored
pass

defsecond_function(a,b):
# This is a replacement implementation
returna+b+3

This feature should be used with care! I recommend only using this feature against code that is already checked into Git, so you can review changes it makes usinggit diffand revert them usinggit checkout my_code.py.

Replacing a matched symbol by running a command

The--rexec COMMANDoption can be used to replace a single matched symbol by running a command and using its output.

The command will be run with the matched symbol's definition piped to its standard input. The output of that command will be used as the replacement text.

Here's an example that usessedto add a#to the beginning of each matching line, effectively commenting out the matched function:

symbex first_function --rexec"sed 's/^/# /'"

This modified the first function in place to look like this:

# def first_function():
# # This will be ignored
# pass

A much more exciting example uses LLM. This example will use thegpt-3.5-turbomodel to add type hints and generate a docstring:

symbex second_function \
--rexec"llm --system 'add type hints and a docstring'"

I ran this against this code:

deffirst_function():
# This will be ignored
pass

defsecond_function(a,b):
returna+b+3

And the second function was updated in place to look like this:

defsecond_function(a:int,b:int)->int:
"""
Returns the sum of two integers (a and b) plus 3.

Parameters:
a (int): The first integer.
b (int): The second integer.

Returns:
int: The sum of a and b plus 3.
"""
returna+b+3

Using in CI

The--checkoption causessymbexto return a non-zero exit code if any matches are found for your query.

You can use this in CI to guard against things like public functions being added without documentation:

symbex --function --public --undocumented --check

This will fail silently but set a1exit code if there are any undocumented functions.

Using this as a step in a CI tool such as GitHub Actions should result in a test failure.

Run this to see the exit code from the last command:

echo$?

--checkwill not output anything by default. Add--countto output a count of matching symbols, or-s/--signaturesto output the signatures of the matching symbols, for example:

symbex --function --public --undocumented --check --count

Similar tools

  • pyastgrepby Luke Plant offers advanced capabilities for viewing and searching through Python ASTs using XPath.
  • cqis a tool thet lets you "extract code snippets using CSS-like selectors", built usingTree-sitterand primarily targetting JavaScript and TypeScript.

symbex --help

Usage: symbex [OPTIONS] [SYMBOLS]...

Find symbols in Python code and print the code for them.

Example usage:

# Search current directory and subdirectories
symbex my_function MyClass

# Search using a wildcard
symbex 'test_*'

# Find a specific class method
symbex 'MyClass.my_method'

# Find class methods using wildcards
symbex '*View.handle_*'

# Search a specific file
symbex MyClass -f my_file.py

# Search within a specific directory and its subdirectories
symbex Database -d ~/projects/datasette

# View signatures for all symbols in current directory and subdirectories
symbex -s

# View signatures for all test functions
symbex 'test_*' -s

# View signatures for all async functions with type definitions
symbex --async --typed -s

# Count the number of --async functions in the project
symbex --async --count

# Replace my_function with a new implementation:
echo "def my_function(a, b):
# This is a replacement implementation
return a + b + 3
"| symbex my_function --replace

# Replace my_function with the output of a command:
symbex first_function --rexec "sed 's/^/# /'"
# This uses sed to comment out the function body

Options:
--version Show the version and exit.
-f, --file FILE Files to search
-d, --directory DIRECTORY Directories to search
--stdlib Search the Python standard library
-x, --exclude DIRECTORY Directories to exclude
-s, --signatures Show just function and class signatures
-n, --no-file Don't include the # File: comments in the output
-i, --imports Show 'from x import y' lines for imported symbols
-m, --module TEXT Modules to search within
--sys-path TEXT Calculate imports relative to these on sys.path
--docs, --docstrings Show function and class signatures plus docstrings
--count Show count of matching symbols
--silent Silently ignore Python files with parse errors
--function Filter functions
--async Filter async functions
--unasync Filter non-async functions
--class Filter classes
--documented Filter functions with docstrings
--undocumented Filter functions without docstrings
--public Filter for symbols without a _ prefix
--private Filter for symbols with a _ prefix
--dunder Filter for symbols matching __*__
--typed Filter functions with type annotations
--untyped Filter functions without type annotations
--partially-typed Filter functions with partial type annotations
--fully-typed Filter functions with full type annotations
--no-init Filter to exclude any __init__ methods
--check Exit with non-zero code if any matches found
--replace Replace matching symbol with text from stdin
--rexec TEXT Replace with the result of piping to this tool
--csv Output as CSV
--tsv Output as TSV
--json Output as JSON
--nl Output as newline-delimited JSON
--id-prefix TEXT Prefix to use for symbol IDs
--help Show this message and exit.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cdsymbex
python -m venv venv
sourcevenv/bin/activate

Now install the dependencies and test dependencies:

pip install -e'.[test]'

To run the tests:

pytest

just

You can also installjustand use it to run the tests and linters like this:

just

Or to list commands:

just -l
Available recipes:
black # Apply Black
cog # Rebuild docs with cog
default # Run tests and linters
lint # Run linters
test *options # Run pytest with supplied options