Core API¶

The core module provides the main API functions for building and querying knowledge graphs.

Usage¶

import pygrad as pg

# Add a repository
await pg.add("https://github.com/owner/repo")

# Search the knowledge graph
result = await pg.search("https://github.com/owner/repo", "How do I authenticate?")

# List all datasets
datasets = await pg.list()

# Delete a repository
await pg.delete("https://github.com/owner/repo")

# Visualize the graph
await pg.visualize("./knowledge-graph.html")

Functions¶

add¶

async def add(url: str) -> None

Add a repository to the knowledge graph.

This function:

Clones the repository (if not already cached)
Parses all Python files using TreeSitter
Extracts classes, functions, methods, and docstrings
Generates XML API documentation
Indexes the documentation into the knowledge graph

Parameters:

Name	Type	Description
`url`	`str`	GitHub repository URL (e.g., `https://github.com/owner/repo`)

Returns: None

Raises:

ValueError: If the URL is invalid
subprocess.CalledProcessError: If git clone fails

Example:

import pygrad as pg

# Add a single repository
await pg.add("https://github.com/psf/requests")

# Add multiple repositories
repos = [
    "https://github.com/psf/requests",
    "https://github.com/pallets/flask",
    "https://github.com/django/django",
]
for repo in repos:
    await pg.add(repo)
    print(f"Added: {repo}")

search¶

async def search(url: str, query: str) -> str

Query a repository's knowledge graph using natural language.

Uses Graph RAG (Retrieval Augmented Generation) to search the knowledge graph and generate contextual answers.

Parameters:

Name	Type	Description
`url`	`str`	GitHub repository URL
`query`	`str`	Natural language query

Returns: str - The search result as a string

Example:

import pygrad as pg

# Search for usage patterns
result = await pg.search(
    "https://github.com/psf/requests",
    "How do I make a POST request with JSON data?"
)
print(result)

# Search for API information
result = await pg.search(
    "https://github.com/pallets/flask",
    "What decorators are available for routes?"
)
print(result)

Query Tips:

Be specific: "How do I authenticate with OAuth2?" vs "authentication"
Ask about patterns: "What's the recommended way to handle errors?"
Reference concepts: "How does the session management work?"

list¶

async def list() -> list[Any]

List all indexed datasets (repositories).

Returns: list[Any] - List of dataset objects with name and id attributes

Example:

import pygrad as pg

datasets = await pg.list()

if datasets:
    print("Indexed repositories:")
    for ds in datasets:
        print(f"  - {ds.name}")
else:
    print("No repositories indexed yet.")

delete¶

async def delete(url: str) -> None

Delete a repository from the knowledge graph.

This removes the indexed data but does not delete the cached repository files.

Parameters:

Name	Type	Description
`url`	`str`	GitHub repository URL

Returns: None

Example:

import pygrad as pg

# Delete a single repository
await pg.delete("https://github.com/owner/repo")

# Delete all repositories
datasets = await pg.list()
for ds in datasets:
    # Reconstruct URL from dataset name
    await pg.delete(f"https://github.com/{ds.name.replace('_', '/')}")

visualize¶

async def visualize(path: str = "./pygrad.html") -> str

Export the knowledge graph as an interactive HTML visualization.

Parameters:

Name	Type	Default	Description
`path`	`str`	`"./pygrad.html"`	Output file path

Returns: str - Path to the generated HTML file

Example:

import pygrad as pg

# Generate default visualization
await pg.visualize()

# Generate to custom path
path = await pg.visualize("./docs/knowledge-graph.html")
print(f"Visualization saved to: {path}")

get_dataset¶

async def get_dataset(dataset_name: str, default: Any = None) -> Any

Get a dataset by name.

Parameters:

Name	Type	Default	Description
`dataset_name`	`str`	-	Name of the dataset (repository ID)
`default`	`Any`	`None`	Default value if not found

Returns: Dataset object or default if not found

Example:

import pygrad as pg
from pygrad import get_repository_id

url = "https://github.com/owner/repo"
repo_id = get_repository_id(url)

dataset = await pg.get_dataset(repo_id)
if dataset:
    print(f"Found dataset: {dataset.name}")
else:
    print("Dataset not found, indexing...")
    await pg.add(url)

Internal Functions¶

These functions are used internally but are available for advanced use cases.

_create_xml_api_doc¶

async def _create_xml_api_doc(url: str) -> Path

Create XML API documentation for a repository.

_split_xml_api¶

def _split_xml_api(xml_api_path: Path) -> list[str]

Split XML API into documents for indexing.

_cognee_add_xml_api¶

async def _cognee_add_xml_api(xml_api_path: Path, dataset_name: str) -> None

Add XML API to Cognee knowledge graph.