Parser Module¶

The parser module provides TreeSitter-based Python parsing for accurate code analysis.

Usage¶

from pygrad import RepoTreeSitter

# Parse a repository
parser = RepoTreeSitter("./my-repo")
parser.parse()

# Access parsed data
for file_info in parser.files:
    print(f"File: {file_info.path}")
    print(f"  Classes: {len(file_info.classes)}")
    print(f"  Functions: {len(file_info.functions)}")

Classes¶

RepoTreeSitter¶

class RepoTreeSitter:
    def __init__(self, repository_path: str) -> None

TreeSitter-based parser for Python repositories.

Uses the tree-sitter library with tree-sitter-python grammar for accurate, syntax-aware parsing of Python source code.

Parameters:

Name	Type	Description
`repository_path`	`str`	Path to the repository root

Features:

Accurate syntax parsing (not regex-based)
Handles complex Python syntax correctly
Extracts docstrings from AST
Preserves source location information

Methods¶

parse¶

def parse(self) -> None

Parse all Python files in the repository.

Example:

from pygrad import RepoTreeSitter

parser = RepoTreeSitter("./my-repo")
parser.parse()

parse_file¶

def parse_file(self, file_path: str) -> FileInfo

Parse a single Python file.

Parameters:

Name	Type	Description
`file_path`	`str`	Path to the Python file

Returns: FileInfo - Parsed file information

Example:

from pygrad import RepoTreeSitter

parser = RepoTreeSitter("./my-repo")
file_info = parser.parse_file("./my-repo/src/main.py")

print(f"Module: {file_info.module_name}")
for cls in file_info.classes:
    print(f"  Class: {cls.name}")

TreeSitter Integration¶

pygrad uses TreeSitter for parsing because it provides:

Accuracy: Parses the actual Python grammar, not approximations
Speed: Incremental parsing for large codebases
Robustness: Handles syntax errors gracefully
Completeness: Access to full AST information

How It Works¶

graph LR
    A[Python Source] --> B[TreeSitter Parser]
    B --> C[Syntax Tree]
    C --> D[AST Traversal]
    D --> E[ClassInfo/FunctionInfo]

Extracted Information¶

For each Python file, the parser extracts:

Element	Information
Classes	Name, docstring, bases, methods, decorators
Functions	Name, docstring, parameters, return type, decorators
Methods	Same as functions, plus class association
Imports	Module imports (for dependency analysis)

Advanced Usage¶

Direct TreeSitter Access¶

For advanced use cases, you can access the underlying TreeSitter tree:

from pygrad.parser.treesitter import RepoTreeSitter
import tree_sitter_python as tspython
from tree_sitter import Language, Parser

# Create parser with Python language
PY_LANGUAGE = Language(tspython.language())
parser = Parser(PY_LANGUAGE)

# Parse source code
source = b'''
def hello(name: str) -> str:
    """Say hello."""
    return f"Hello, {name}!"
'''

tree = parser.parse(source)
root = tree.root_node

# Traverse the tree
def traverse(node, depth=0):
    print("  " * depth + f"{node.type}: {node.text[:50] if node.text else ''}")
    for child in node.children:
        traverse(child, depth + 1)

traverse(root)

Custom Node Extraction¶

from pygrad.parser.treesitter import RepoTreeSitter

parser = RepoTreeSitter("./my-repo")
parser.parse()

# Find all type annotations
def find_type_annotations(node):
    annotations = []
    if node.type == "type":
        annotations.append(node.text.decode())
    for child in node.children:
        annotations.extend(find_type_annotations(child))
    return annotations

# Process parsed files
for file_info in parser.files:
    # Access the raw tree for custom analysis
    pass

Supported Python Features¶

The parser handles all modern Python syntax:

Feature	Support
Classes	Full support including dataclasses
Functions	Regular, async, generator
Type hints	Parameters and return types
Decorators	Single and stacked
Docstrings	Google, NumPy, Sphinx styles
f-strings	Fully supported
Match statements	Python 3.10+
Walrus operator	Python 3.8+

Dependencies¶

The parser module requires:

[project.dependencies]
tree-sitter = ">=0.21.0"
tree-sitter-python = ">=0.21.0"

These are automatically installed with pygrad.