Skip to content

Redaction API

The redaction module provides the core functionality for detecting and redacting secrets.

Overview

from bugsafe.redact import RedactionEngine, PatternConfig

# Create engine with default settings
engine = RedactionEngine()

# Redact text
text = "API_KEY=sk-abc123xyz"
redacted, report = engine.redact(text)
print(redacted)  # API_KEY=<API_KEY_1>
print(report.get_total())  # 1

RedactionEngine

Main orchestrator for secret redaction.

bugsafe.redact.engine.RedactionEngine dataclass

Main redaction engine.

Orchestrates pattern matching, tokenization, and path anonymization to redact sensitive information from text.

Attributes:

Name Type Description
tokenizer Tokenizer

Tokenizer for generating replacement tokens.

path_anonymizer PathAnonymizer

Path anonymizer for file paths.

config PatternConfig

Pattern configuration.

patterns list[Pattern]

List of patterns to use.

timeout_ms int

Timeout per pattern in milliseconds.

redact(text)

Redact sensitive information from text.

Parameters:

Name Type Description Default
text str

Text to redact.

required

Returns:

Type Description
tuple[str, RedactionReport]

Tuple of (redacted_text, report).

verify_redaction(text)

Verify that no high-priority secrets remain in text.

Parameters:

Name Type Description Default
text str

Text to verify.

required

Returns:

Type Description
list[str]

List of pattern names that still match (potential leaks).

get_salt_hash()

Get the salt hash for bundle metadata.

get_redaction_summary()

Get summary from last redaction operation.

Returns:

Type Description
dict[str, int]

Dictionary mapping category names to redaction counts.

RedactionReport

Tracks all redactions performed in a session.

bugsafe.redact.engine.RedactionReport dataclass

Report of redactions performed.

Attributes:

Name Type Description
matches list[RedactionMatch]

List of all redaction matches.

categories dict[str, int]

Count of redactions by category.

patterns_used set[str]

Set of pattern names that matched.

warnings list[str]

Any warnings during redaction.

add(original, token, category, pattern_name, start=0, end=0)

Add a redaction match to the report.

merge(other)

Merge another report into this one.

get_summary()

Get summary of redactions by category.

get_total()

Get total number of redactions.

Pattern

Defines a secret detection pattern.

bugsafe.redact.patterns.Pattern dataclass

A secret detection pattern.

Attributes:

Name Type Description
name str

Unique identifier for the pattern.

regex Pattern[str]

Compiled regular expression.

category str

Category for token naming (e.g., AWS_KEY, EMAIL).

priority int

Pattern priority (higher = more important).

capture_group int

Which regex group contains the secret (0 = full match).

description str

Human-readable description.

PatternConfig

Configuration for pattern matching behavior.

bugsafe.redact.patterns.PatternConfig dataclass

Configuration for pattern matching.

Attributes:

Name Type Description
enabled_patterns set[str] | None

Set of pattern names to enable (None = all).

disabled_patterns set[str]

Set of pattern names to disable.

custom_patterns list[Pattern]

Additional custom patterns.

min_priority int

Minimum priority threshold.

redact_emails bool

Whether to redact email addresses.

redact_ips bool

Whether to redact IP addresses.

redact_uuids bool

Whether to redact UUIDs.

Tokenizer

Deterministic, correlation-preserving tokenizer.

bugsafe.redact.tokenizer.Tokenizer dataclass

Deterministic, correlation-preserving secret tokenizer.

Ensures that the same secret always maps to the same token within a single redaction session, preserving correlations across different parts of the output.

Attributes:

Name Type Description
salt bytes

Random bytes used for the session (not stored, only hash).

_cache dict[str, str]

Internal cache mapping secrets to tokens.

_counter int

Sequential counter for token numbering.

tokenize(secret, category)

Replace a secret with a deterministic token.

Parameters:

Name Type Description Default
secret str

The secret value to tokenize.

required
category str

Category for the token (e.g., AWS_KEY, EMAIL).

required

Returns:

Type Description
str

Token string in format .

is_token(text)

Check if text is a redaction token.

Parameters:

Name Type Description Default
text str

Text to check.

required

Returns:

Type Description
bool

True if text matches token format.

get_salt_hash()

Get SHA256 hash of the salt for bundle metadata.

Returns:

Type Description
str

Hexadecimal hash string.

get_report()

Get count of tokens by category.

Returns:

Type Description
dict[str, int]

Dictionary mapping category names to token counts.

reset()

Reset the tokenizer state for a new session.

Factory Functions

bugsafe.redact.engine.create_redaction_engine(project_root=None, config=None)

Create a configured redaction engine.

Parameters:

Name Type Description Default
project_root Path | None

Project root for path anonymization.

None
config PatternConfig | None

Optional pattern configuration.

None

Returns:

Type Description
RedactionEngine

Configured RedactionEngine.

bugsafe.redact.patterns.create_custom_pattern(name, pattern, category, priority=Priority.MEDIUM, capture_group=0, flags=0)

Create a custom pattern.

Parameters:

Name Type Description Default
name str

Unique pattern identifier.

required
pattern str

Regular expression string.

required
category str

Category for token naming.

required
priority int

Pattern priority.

MEDIUM
capture_group int

Which group contains the secret.

0
flags int

Regex flags.

0

Returns:

Type Description
Pattern

New Pattern instance.