Custom

Schema-driven detector documentation.

CUSTOMactiveP061 params19 examples

Detector Metadata

Capability catalog entry from all_detectors.json.

Categories

CLASSIFICATIONCOMPLIANCE

Supported Asset Types

TXTTABLEURLIMAGE

Recommended Model

mDeBERTa-v3 + SetFit + GLiNER + HuggingFace transformers

Notes

User-defined rules and pipelines tailored to specific business needs. Supports regex, GLiNER2, AI/LLM (prompt-driven classification + extraction via a configured provider), text classification, image classification, feature extraction, and object detection pipelines.

Parameters

Configuration parameters for the Custom detector. Shared from `CustomDetectorConfig`.

Parameter	Type	Required	Description	Default	Constraints
custom_detector_key	string	Yes	Stable key used to identify one custom detector instance	—	—
name	string	Yes	User-facing name of custom detector	—	—
description	string	No	—	—	—
method	enum	No	Execution method for custom detector logic Allowed values: RULESET, CLASSIFIER, ENTITY, PIPELINE	—	—
languages	array	No	—	["de","en"]	—
languages[]	string	No	—	—	—
ruleset	object	No	—	—	no extra properties
ruleset.regex_rules	array	No	—	[]	—
ruleset.regex_rules[]	object	No	—	—	no extra properties
ruleset.regex_rules[].id	string	Yes	Stable ID for this regex rule	—	—
ruleset.regex_rules[].name	string	Yes	Display name for this regex rule	—	—
ruleset.regex_rules[].pattern	string	Yes	Regular expression pattern	—	—
ruleset.regex_rules[].flags	string	No	Regex flags (for example i, m, s)		—
ruleset.regex_rules[].severity	enum	No	Severity level of finding Allowed values: critical, high, medium, low, info	—	—
ruleset.keyword_rules	array	No	—	[]	—
ruleset.keyword_rules[]	object	No	—	—	no extra properties
ruleset.keyword_rules[].id	string	Yes	Stable ID for this keyword rule	—	—
ruleset.keyword_rules[].name	string	Yes	Display name for this keyword rule	—	—
ruleset.keyword_rules[].keywords	array	Yes	Keyword set to match	—	min items 1
ruleset.keyword_rules[].keywords[]	string	Yes	—	—	—
ruleset.keyword_rules[].case_sensitive	boolean	No	Whether keyword matching is case-sensitive	false	—
ruleset.keyword_rules[].severity	enum	No	Severity level of finding Allowed values: critical, high, medium, low, info	—	—
classifier	object	No	—	—	no extra properties
classifier.labels	array	No	—	[]	—
classifier.labels[]	object	No	—	—	no extra properties
classifier.labels[].id	string	Yes	—	—	—
classifier.labels[].name	string	Yes	—	—	—
classifier.labels[].description	string	No	—	—	—
classifier.zero_shot_model	string	No	—	MoritzLaurer/mDeBERTa-v3-base-mnli-xnli	—
classifier.hypothesis_template	string	No	—	This text contains {}.	—
classifier.training_examples	array	No	—	[]	—
classifier.training_examples[]	object	No	—	—	no extra properties
classifier.training_examples[].text	string	Yes	—	—	—
classifier.training_examples[].label	string	Yes	—	—	—
classifier.training_examples[].accepted	boolean	No	—	true	—
classifier.training_examples[].source	string	No	Origin of this example (editor/feedback/import)	editor	—
classifier.min_examples_per_label	integer	No	—	8	min 1
classifier.setfit_model	string	No	—	sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2	—
entity	object	No	—	—	no extra properties
entity.entity_labels	array	No	—	[]	—
entity.entity_labels[]	string	No	—	—	—
entity.entity_descriptions	object	No	Optional GLiNER2 schema descriptions keyed by entity label	{}	—
entity.model	string	No	—	fastino/gliner2-base-v1	—
extractor	object	No	Optional structured extraction — runs when detector fires	—	no extra properties
extractor.enabled	boolean	No	—	true	—
extractor.fields	array	Yes	—	—	min items 1
extractor.fields[]	object	Yes	One output field in the extraction schema	—	no extra properties
extractor.fields[].name	string	Yes	Output field name — becomes a key in extracted_data JSON	—	—
extractor.fields[].description	string	No	Human-readable hint for what this field captures	—	—
extractor.fields[].type	enum	No	Allowed values: string, number, boolean, list[string], list[number]	string	—
extractor.fields[].entity_label	string	No	GLiNER2 schema label used for extraction (ENTITY and CLASSIFIER methods)	—	—
extractor.fields[].regex_pattern	string	No	Regex with one named capture group (?P<value>...) for RULESET method	—	—
extractor.fields[].regex_flags	string	No	Regex flags: i=case-insensitive, m=multiline, s=dotall	i	—
extractor.fields[].aggregate	enum	No	How to aggregate multiple matches Allowed values: first, last, list, join, count	list	—
extractor.fields[].join_separator	string	No	—	,	—
extractor.fields[].min_confidence	number	No	Minimum GLiNER confidence for this field	0.4	min 0, max 1
extractor.fields[].required	boolean	No	If true, skip saving extraction when this field is empty	false	—
extractor.gliner_model	string	No	—	fastino/gliner2-base-v1	—
extractor.content_limit	integer	No	Chars of content to pass to extractor (classifier matched_content is only 320 chars)	4000	min 320, max 8192
pipeline_schema	object	No	—	—	—
max_findings	integer \| null	No	Maximum number of findings to return per asset	null	—