Parser Module¶
The parser module provides TreeSitter-based Python parsing for accurate code analysis.
Usage¶
from pygrad import RepoTreeSitter
# Parse a repository
parser = RepoTreeSitter("./my-repo")
parser.parse()
# Access parsed data
for file_info in parser.files:
print(f"File: {file_info.path}")
print(f" Classes: {len(file_info.classes)}")
print(f" Functions: {len(file_info.functions)}")
Classes¶
RepoTreeSitter¶
TreeSitter-based parser for Python repositories.
Uses the tree-sitter library with tree-sitter-python grammar for accurate, syntax-aware parsing of Python source code.
Parameters:
| Name | Type | Description |
|---|---|---|
repository_path |
str |
Path to the repository root |
Features:
- Accurate syntax parsing (not regex-based)
- Handles complex Python syntax correctly
- Extracts docstrings from AST
- Preserves source location information
Methods¶
parse¶
Parse all Python files in the repository.
Example:
parse_file¶
Parse a single Python file.
Parameters:
| Name | Type | Description |
|---|---|---|
file_path |
str |
Path to the Python file |
Returns: FileInfo - Parsed file information
Example:
from pygrad import RepoTreeSitter
parser = RepoTreeSitter("./my-repo")
file_info = parser.parse_file("./my-repo/src/main.py")
print(f"Module: {file_info.module_name}")
for cls in file_info.classes:
print(f" Class: {cls.name}")
TreeSitter Integration¶
pygrad uses TreeSitter for parsing because it provides:
- Accuracy: Parses the actual Python grammar, not approximations
- Speed: Incremental parsing for large codebases
- Robustness: Handles syntax errors gracefully
- Completeness: Access to full AST information
How It Works¶
graph LR
A[Python Source] --> B[TreeSitter Parser]
B --> C[Syntax Tree]
C --> D[AST Traversal]
D --> E[ClassInfo/FunctionInfo]
Extracted Information¶
For each Python file, the parser extracts:
| Element | Information |
|---|---|
| Classes | Name, docstring, bases, methods, decorators |
| Functions | Name, docstring, parameters, return type, decorators |
| Methods | Same as functions, plus class association |
| Imports | Module imports (for dependency analysis) |
Advanced Usage¶
Direct TreeSitter Access¶
For advanced use cases, you can access the underlying TreeSitter tree:
from pygrad.parser.treesitter import RepoTreeSitter
import tree_sitter_python as tspython
from tree_sitter import Language, Parser
# Create parser with Python language
PY_LANGUAGE = Language(tspython.language())
parser = Parser(PY_LANGUAGE)
# Parse source code
source = b'''
def hello(name: str) -> str:
"""Say hello."""
return f"Hello, {name}!"
'''
tree = parser.parse(source)
root = tree.root_node
# Traverse the tree
def traverse(node, depth=0):
print(" " * depth + f"{node.type}: {node.text[:50] if node.text else ''}")
for child in node.children:
traverse(child, depth + 1)
traverse(root)
Custom Node Extraction¶
from pygrad.parser.treesitter import RepoTreeSitter
parser = RepoTreeSitter("./my-repo")
parser.parse()
# Find all type annotations
def find_type_annotations(node):
annotations = []
if node.type == "type":
annotations.append(node.text.decode())
for child in node.children:
annotations.extend(find_type_annotations(child))
return annotations
# Process parsed files
for file_info in parser.files:
# Access the raw tree for custom analysis
pass
Supported Python Features¶
The parser handles all modern Python syntax:
| Feature | Support |
|---|---|
| Classes | Full support including dataclasses |
| Functions | Regular, async, generator |
| Type hints | Parameters and return types |
| Decorators | Single and stacked |
| Docstrings | Google, NumPy, Sphinx styles |
| f-strings | Fully supported |
| Match statements | Python 3.10+ |
| Walrus operator | Python 3.8+ |
Dependencies¶
The parser module requires:
These are automatically installed with pygrad.