Redaction API¶
The redaction module provides the core functionality for detecting and redacting secrets.
Overview¶
from bugsafe.redact import RedactionEngine, PatternConfig
# Create engine with default settings
engine = RedactionEngine()
# Redact text
text = "API_KEY=sk-abc123xyz"
redacted, report = engine.redact(text)
print(redacted) # API_KEY=<API_KEY_1>
print(report.get_total()) # 1
RedactionEngine¶
Main orchestrator for secret redaction.
bugsafe.redact.engine.RedactionEngine dataclass ¶
Main redaction engine.
Orchestrates pattern matching, tokenization, and path anonymization to redact sensitive information from text.
Attributes:
| Name | Type | Description |
|---|---|---|
tokenizer | Tokenizer | Tokenizer for generating replacement tokens. |
path_anonymizer | PathAnonymizer | Path anonymizer for file paths. |
config | PatternConfig | Pattern configuration. |
patterns | list[Pattern] | List of patterns to use. |
timeout_ms | int | Timeout per pattern in milliseconds. |
redact(text) ¶
Redact sensitive information from text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text | str | Text to redact. | required |
Returns:
| Type | Description |
|---|---|
tuple[str, RedactionReport] | Tuple of (redacted_text, report). |
verify_redaction(text) ¶
Verify that no high-priority secrets remain in text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text | str | Text to verify. | required |
Returns:
| Type | Description |
|---|---|
list[str] | List of pattern names that still match (potential leaks). |
get_salt_hash() ¶
Get the salt hash for bundle metadata.
get_redaction_summary() ¶
Get summary from last redaction operation.
Returns:
| Type | Description |
|---|---|
dict[str, int] | Dictionary mapping category names to redaction counts. |
RedactionReport¶
Tracks all redactions performed in a session.
bugsafe.redact.engine.RedactionReport dataclass ¶
Report of redactions performed.
Attributes:
| Name | Type | Description |
|---|---|---|
matches | list[RedactionMatch] | List of all redaction matches. |
categories | dict[str, int] | Count of redactions by category. |
patterns_used | set[str] | Set of pattern names that matched. |
warnings | list[str] | Any warnings during redaction. |
Pattern¶
Defines a secret detection pattern.
bugsafe.redact.patterns.Pattern dataclass ¶
A secret detection pattern.
Attributes:
| Name | Type | Description |
|---|---|---|
name | str | Unique identifier for the pattern. |
regex | Pattern[str] | Compiled regular expression. |
category | str | Category for token naming (e.g., AWS_KEY, EMAIL). |
priority | int | Pattern priority (higher = more important). |
capture_group | int | Which regex group contains the secret (0 = full match). |
description | str | Human-readable description. |
PatternConfig¶
Configuration for pattern matching behavior.
bugsafe.redact.patterns.PatternConfig dataclass ¶
Configuration for pattern matching.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled_patterns | set[str] | None | Set of pattern names to enable (None = all). |
disabled_patterns | set[str] | Set of pattern names to disable. |
custom_patterns | list[Pattern] | Additional custom patterns. |
min_priority | int | Minimum priority threshold. |
redact_emails | bool | Whether to redact email addresses. |
redact_ips | bool | Whether to redact IP addresses. |
redact_uuids | bool | Whether to redact UUIDs. |
Tokenizer¶
Deterministic, correlation-preserving tokenizer.
bugsafe.redact.tokenizer.Tokenizer dataclass ¶
Deterministic, correlation-preserving secret tokenizer.
Ensures that the same secret always maps to the same token within a single redaction session, preserving correlations across different parts of the output.
Attributes:
| Name | Type | Description |
|---|---|---|
salt | bytes | Random bytes used for the session (not stored, only hash). |
_cache | dict[str, str] | Internal cache mapping secrets to tokens. |
_counter | int | Sequential counter for token numbering. |
tokenize(secret, category) ¶
Replace a secret with a deterministic token.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
secret | str | The secret value to tokenize. | required |
category | str | Category for the token (e.g., AWS_KEY, EMAIL). | required |
Returns:
| Type | Description |
|---|---|
str | Token string in format |
is_token(text) ¶
Check if text is a redaction token.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text | str | Text to check. | required |
Returns:
| Type | Description |
|---|---|
bool | True if text matches token format. |
get_salt_hash() ¶
Get SHA256 hash of the salt for bundle metadata.
Returns:
| Type | Description |
|---|---|
str | Hexadecimal hash string. |
get_report() ¶
Get count of tokens by category.
Returns:
| Type | Description |
|---|---|
dict[str, int] | Dictionary mapping category names to token counts. |
reset() ¶
Reset the tokenizer state for a new session.
Factory Functions¶
bugsafe.redact.engine.create_redaction_engine(project_root=None, config=None) ¶
Create a configured redaction engine.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
project_root | Path | None | Project root for path anonymization. | None |
config | PatternConfig | None | Optional pattern configuration. | None |
Returns:
| Type | Description |
|---|---|
RedactionEngine | Configured RedactionEngine. |
bugsafe.redact.patterns.create_custom_pattern(name, pattern, category, priority=Priority.MEDIUM, capture_group=0, flags=0) ¶
Create a custom pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name | str | Unique pattern identifier. | required |
pattern | str | Regular expression string. | required |
category | str | Category for token naming. | required |
priority | int | Pattern priority. | MEDIUM |
capture_group | int | Which group contains the secret. | 0 |
flags | int | Regex flags. | 0 |
Returns:
| Type | Description |
|---|---|
Pattern | New Pattern instance. |