This PR adds a new Wazuh integration for Wazuh decoder rule generation tool by Hasitha9796 · Pull Request #79 · wazuh/integrations

Hasitha9796 · 2026-05-01T06:33:57Z

Summary

This PR adds a new integration named wazuh_decoder_rule_tool — a FastAPI-based tool for analyzing logs, checking existing Wazuh decoder/rule matches through wazuh-logtest, and generating custom decoder and rule XML.

New Features

AI-Powered Generation (Hybrid Approach)

Hybrid architecture: programmatically generates correct Wazuh decoder XML, then uses an LLM to review and improve osregex patterns
Multiple AI providers: Ollama (local, no rate limits), DashScope (Qwen 3.6 Plus), and OpenRouter
wazuh-logtest integration: every AI generation first checks wazuh-logtest to determine:
- Whether a custom decoder is needed at all
- The correct parent strategy (<program_name> when available, <prematch> otherwise)
- Which fields are already decoded by built-in decoders (skipped automatically)
Priority fallback: Ollama > DashScope > OpenRouter

Enhanced ML Decoder Similarity

Ensemble model combining TF-IDF (exact token matching) + SBERT (semantic similarity)
Configurable weighting (default: 40% TF-IDF, 60% SBERT)
Enhanced tokenization preserving regex patterns
Backward compatible with existing TF-IDF fallback

Improved Decoder Generation

Split decoders: one child decoder per field for better accuracy
Robust prefix generalization (timestamps, IPs, MAC addresses, PIDs)
CEF (Common Event Format) log support with field mapping
Per-field validation explaining which fields will/won't be decoded
Multiple log type handlers: syslog, JSON, key=value, bracketed, Java dash, Android, Palo Alto CSV

Robustness & Reliability

Timeouts on all git subprocess calls (clone, pull, sparse-checkout) to prevent startup hangs
Proper Wazuh OS_Regex validation (no PCRE patterns, correct \. vs . semantics)
Non-blocking SSH with configurable timeouts

Included

FastAPI backend with streaming AI responses
Single-page HTML/JS UI with decoder analysis, rule generation, AI generation, and testing
Log analysis using heuristics with regex generation engine for Wazuh OS_Regex compatibility
wazuh-logtest validation (local or remote via SSH)
ML-based decoder similarity (TF-IDF + optional SBERT ensemble)
Rule ML model trained from wazuh-ruleset
Per-field feedback collection for continuous improvement
README with comprehensive setup instructions including AI provider configuration

Testing

The app can be tested locally:

Set up the virtual environment:

cd integrations/wazuh_decoder_rule_tool
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Generate SSL certificates:

mkdir -p certs
openssl req -x509 -newkey rsa:4096 -keyout certs/localhost.key -out certs/localhost.crt -days 365 -nodes -subj "/CN=localhost"

Start the application (with AI):

export OLLAMA_BASE_URL=http://localhost:11434/v1
export OLLAMA_MODEL=llama3.2:3b
uvicorn app.main:app --host 0.0.0.0 --port 8443 --ssl-certfile certs/localhost.crt --ssl-keyfile certs/localhost.key

Access the application via https://localhost:8443.

Connecting to Wazuh VM for wazuh-logtest

export WAZUH_SSH_HOST=192.168.56.10
export WAZUH_SSH_PORT=22
export WAZUH_SSH_USER=vagrant
export WAZUH_SSH_PASSWORD=vagrant

Example Scenario

Paste a log like: May 19 12:34:56 custom-server myapp[1234]: User 'admin' failed to authenticate from IP 192.168.1.100 due to invalid_password
Click Analyze to detect log type and extract fields
Select fields to extract (e.g., user, srcip)
Click Generate for programmatic decoder+rule generation
Click AI Generate for AI-assisted pattern improvement

…s and auto-enable split mode for CEF logs

… formats for more reliable extraction

…ue logs instead of truncating prefixes

…source pattern learning

…ders from all logs

…tterns and full preceding words instead of truncating prefixes

…d generalize them to \d+ to prevent brittle anchors

…on for decoders

…ecoder prefixes

… unavailable

…of rule conditions - Add STATIC_FIELD_TAGS set (srcip, dstip, srcport, dstport, protocol, action, id, url, data, extra_data, status, system_name, user, hostname, program_name) — known Wazuh tags rendered as direct XML children of <rule> - Add _render_static_tags() to emit <tagname>value</tagname> instead of <field name="tagname">value</field> for static tags - Add child_static_conditions to CandidateRequest, build_rule_xml, build_candidate, and derive_child_rule_conditions - Enhance derive_child_rule_conditions() to use extract_fields, field_hints, and parsed_logtest_fields for smarter auto-detection: - field_hints take priority and lock fields to prevent override - Explicit patterns parsed: 'field X is Y', 'X equals Y' - extract_fields guide which field names to look for - IP addresses skipped as too specific for conditions - Update RulePattern and parse_rule_file in decoder_ml.py to extract static field tags from real Wazuh rules for ML training - Include static_conditions in ML rule suggestions and feature_text - Update AI prompt with static field tag rules (prefer over <field name="">) - Add Static Field Tags UI section in rule form with add/remove rows - Update readRulePayload() to collect child_static_conditions

…on cleaning - Add child_only parameter to build_rule_xml() — when True, emits a single <rule> with if_sid=parent_rule_id instead of parent+child pair - Fix build_candidate() to use child_only=True when user sets parent_rule_id - Improve clean_rule_description() to extract 'use description as X', 'description should be X', and 'create alert using NNNN parent rule by matching X' patterns

- Fix clean_rule_description to handle 'use the description as X' pattern - Skip <regex> in child_only mode — child rules extending a parent should use <match> / <field> / static tags only - Filter meta fields (program, program_name, hostname, decoder_name) from auto-detected conditions unless user explicitly mentions them - Add 'use' to stopwords to prevent false <match> from 'User' in body - Use \b word boundary check for match conditions (no substring-inside-word) - Exclude description words from match condition auto-detection - Pass clean_description to derive_child_rule_conditions

- Add rule_description field to CandidateRequest (overrides auto-detected) - Add rule_description input in rule form below rule requirement - Update readRulePayload() to send rule_description - build_candidate() uses rule_description when provided

…gs only

- child_rule is no longer auto-created from rule_requirement alone - Only build child_rule when parent_rule_id is set or user provides explicit child_field_conditions / child_match_conditions / child_static_conditions - Add description parameter to build_rule_xml() for parent rule - Derive parent rule description from requirement when no child rule exists

… split regex Path 2 - escape_xml: only escape & and < (required by XML spec). html.escape also escapes > which breaks Wazuh regex patterns using -> arrow notation (e.g. \.+\s->\s(\d+.\d+.\d+.\d+)) - build_split_regexes_from_fields Path 2: use the raw prefix before the value in target_text and pass it through generalize_prefix_text. Previously the regex '\b(\w+\s*[=:]\s*)' would falsely match '4:' as a key:value separator in '1.2.3.4:1234' because \b matched between . and 4, producing \.+4:(\d+). Now it generalizes to \.+\d+.\d+.\d+.\d+:(\d+)

… fields - osregex_escape now escapes [ and ] in both inner (generalize_prefix_text) and outer (build_split_regexes_from_fields) definitions. Without this, suricata signature IDs like [1:2010935:2] in the body would be rendered as Wazuh character class [\d+:\d+:\d+] instead of escaped \[\d+:\d+:\d+\]. - parse_logtest_output now extracts phase-2 decoded_fields (e.g. srcip, dstip, protocol decoded by built-in suricata decoder). - analyze_logs_impl computes effective_extract_fields by removing fields already decoded by the built-in decoder, avoiding unnecessary decoders. - And adds skipped_decoded_fields / logtest_decoded_fields to analysis output.

…ation - validate_individual_fields() explains why each requested field is 'decoded' (built-in), 'skipped' (syslog pre-decoded like timestamp/hostname), 'pending' (will be extracted by custom decoder), or 'warning' (value not found in body). - PREDECODED_SYSLOG_FIELDS frozenset defines fields consumed during syslog pre-decoding that cannot be re-decoded. - parse_logtest_output now extracts phase-2 decoded_fields from logtest stdout, so the system knows which fields a built-in decoder already handles (e.g. suricata extracts srcip, dstip, protocol). - analyze_logs_impl computes effective_extract_fields by filtering out already-decoded fields, avoiding redundant decoders.

…ckets as literal Wazuh's OS_REGEX does not use [ ] for character classes or { } for quantifiers — they are literal characters. The previous commit's escaping of [ to \[ was wrong and produced invalid Wazuh regex patterns like \[\d+:\d+:\d+\] instead of the correct [\d+:\d+:\d+].

The result dict for parse_logtest_output was missing the 'decoded_fields' key initialization. When phase 2 decoded fields were found, the code tried result['decoded_fields'][fname] = fval which raised KeyError.

…d simplify func Path 2 used target_text[:val_start] as the raw prefix which included ALL context before the value (e.g. full suricata signature [1:2010935:2] plus 'ET MALWARE ...'). Now it uses the same prefix-shortening regex as Path 3, extracting only the last 1-2 tokens before the captured value. This produces simpler, more robust regexes: BEFORE: \.+[\d+:\d+:\d+] ET MALWARE ... {\S+} \d+.\d+.\d+.\d+:(\d+) AFTER: \.+\S+} \d+.\d+.\d+.\d+:(\d+) Removed unused simplify_escaped_prefix function.

…rouped' - Removed trailing comma in <group name="custom,app_name,"> → "custom,app_name" - rule_description now only sets the child rule description, never the parent. - Parent description always falls back to "{log_source_name} messages grouped" when a child_rule exists. Only when there's NO child_rule does the rule_requirement influence the parent description.

Copilot

Pull request overview

This PR introduces a new wazuh_decoder_rule_tool integration: a FastAPI-based UI/API for analyzing pasted logs, optionally validating them via wazuh-logtest, and generating Wazuh decoder/rule XML. It also adds an “enhanced” ML decoder-similarity approach (TF‑IDF + SBERT) plus scripts/datasets to train a custom similarity model from Wazuh ruleset test data.

Changes:

Add the FastAPI app’s HTML/JS/CSS frontend and supporting backend utilities for decoder/rule generation workflows.
Add ML enhancements: ensemble similarity model wrapper, dataset builder + training script, and accompanying tests/docs.
Add local datasets and TLS artifacts for local HTTPS testing (currently including private keys).

Reviewed changes

Copilot reviewed 21 out of 26 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
integrations/wazuh_decoder_rule_tool/tests/test_ml_enhanced.py	Adds unit tests for enhanced ML similarity components.
integrations/wazuh_decoder_rule_tool/tests/test_integration.py	Adds a basic integration test for enhanced ML model loading.
integrations/wazuh_decoder_rule_tool/scripts/train_similarity.py	Adds SBERT contrastive training script for decoder similarity.
integrations/wazuh_decoder_rule_tool/scripts/build_dataset.py	Adds script to build training/validation datasets from Wazuh rules-testing suites + feedback.
integrations/wazuh_decoder_rule_tool/requirements.txt	Adds Python dependencies for running the tool (FastAPI/Uvicorn/ML libs).
integrations/wazuh_decoder_rule_tool/README.md	Documents local HTTPS run instructions, remote VM mode, and ML training workflow.
integrations/wazuh_decoder_rule_tool/ML_ENHANCEMENT_SUMMARY.md	Documents ML feature-engineering + ensemble approach and future tuning ideas.
integrations/wazuh_decoder_rule_tool/key.pem	Adds a private key file (should not be committed).
integrations/wazuh_decoder_rule_tool/generated/decoders/local_myapp_decoder_20260307094900.xml	Adds generated decoder XML output artifact.
integrations/wazuh_decoder_rule_tool/generated/decoders/local_myapp_decoder_20260307094544.xml	Adds generated decoder XML output artifact (duplicate-style).
integrations/wazuh_decoder_rule_tool/data/datasets/val.jsonl	Adds validation dataset records for ML training.
integrations/wazuh_decoder_rule_tool/data/datasets/feedback.jsonl	Adds feedback dataset examples used for training/tuning.
integrations/wazuh_decoder_rule_tool/data/datasets/feedback_rejections.jsonl	Adds rejected feedback examples for analysis/training workflows.
integrations/wazuh_decoder_rule_tool/certs/localhost.key	Adds a private TLS key for local HTTPS (should not be committed).
integrations/wazuh_decoder_rule_tool/certs/localhost.crt	Adds a self-signed TLS certificate for local HTTPS.
integrations/wazuh_decoder_rule_tool/cert.pem	Adds a certificate artifact for local HTTPS usage.
integrations/wazuh_decoder_rule_tool/app/wazuh_logtest.py	Adds a helper to run `wazuh-logtest` via SSH (currently hardcoded/inconsistent).
integrations/wazuh_decoder_rule_tool/app/templates/index.html	Adds the single-page HTML UI for the tool.
integrations/wazuh_decoder_rule_tool/app/static/styles.css	Adds styling for the UI.
integrations/wazuh_decoder_rule_tool/app/static/app.js	Adds UI logic for navigation, generate/test flows, ML status, AI generation, feedback, history.
integrations/wazuh_decoder_rule_tool/app/decoder_ml.py	Adds baseline TF‑IDF similarity models + parsing utilities for decoders/rules.
integrations/wazuh_decoder_rule_tool/app/decoder_ml_enhanced.py	Adds enhanced feature engineering + ensemble similarity model + compatibility wrapper.
integrations/wazuh_decoder_rule_tool/.gitignore	Adds ignores for venv/cache/model/repo directories.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+function toggleConditionsRow() {
+  const req = document.getElementById('ruleRequirement').value.trim();
+  document.getElementById('ruleFieldConditionsRow').style.display = req ? 'flex' : 'none';
+  document.getElementById('ruleMatchConditionsRow').style.display = req ? 'flex' : 'none';


+    try:
+        # This might fail if no Wazuh repo is available, but that's OK for this test
+        model = ensure_ml_model_enhanced(force_refresh=False, use_ensemble=True)
+        # If we get here without exception, the function works
+        assert model is not None or model is None  # Either is fine
+        print("✓ ensure_ml_model_enhanced executed successfully")
+        return True
+    except Exception as e:
+        print(f"✗ ensure_ml_model_enhanced failed: {e}")
+        return False
+
+
+if __name__ == "__main__":
+    success = test_ensure_ml_model_enhanced()
+    if success:
+        print("Integration test passed!")
+    else:
+        print("Integration test failed!")


+            parts.extend([self.prematch] * int(prematch_weight))
+        if self.regex:
+            # Extract meaningful tokens from regex
+            regex_tokens = re.findall(r'\[\\w\+\\]|\\\\d\+|\\\\S\+|\\\\w\+', self.regex)


+
+        parts = []
+        if self.name:
+            parts.extend([self.name] * int(name_weight))
+        if self.program_name:
+            parts.extend([self.program_name] * int(program_weight))
+        if self.prematch:
+            parts.extend([self.prematch] * int(prematch_weight))
+        if self.regex:
+            # Extract meaningful tokens from regex
+            regex_tokens = re.findall(r'\[\\w\+\\]|\\\\d\+|\\\\S\+|\\\\w\+', self.regex)
+            parts.extend(regex_tokens * int(regex_weight))
+        if self.order:
+            parts.extend(self.order * int(order_weight))


+-----BEGIN PRIVATE KEY-----
+MIIJQgIBADANBgkqhkiG9w0BAQEFAASCCSwwggkoAgEAAoICAQDeCJuheTkfwUSK
+shHW/6XR28sohDtaA+BgE5VQhA/dO0A0OD4Y+FHFvwqDZg4j74mZ1s4BBxdercSO
+l1NXmfTJvH0WhY09vSyS3g4N/T1unrtTFUTrC3Dc5ovLAxAUe2AHLGhQcXGWRbTq
+pEL1KEoYG89DSisTjSBOcoM3dE8fnU2Gc7YCvLUh8IpIaYLr0GOiQumAGhxIyWGq


+# Cache directories and ML models
+data/models/
+data/wazuh_repo/
+data/wazuh_ruleset_repo/
+


@@ -0,0 +1,3 @@
+{"log":"03-17 16:13:38.811  1702  2395 D WindowManager: printFreezingDisplayLogsopening app wtoken = AppWindowToken{9f4ef63 token=Token{a64f992 ActivityRecord{de9231d u0 com.tencent.qt.qtl/.activity.info.NewsDetailXmlActivity t761}}}, allDrawn= false, startingDisplayed =  false, startingMoved =  false, isRelaunching =  false","decoder":{"name":"myapp-event","parent":"myapp","prematch":"WindowManager:","regex":"(\\d+-\\d+ \\d+:\\d+:\\d+.\\d+)  \\d+  \\d+ \\S WindowManager: \\S+ \\S+ wtoken = (\\.+) token=(\\.+), allDrawn= (\\S+)","order":["logtime","wtoken","token","allDrawn"],"source_file":"feedback/windowmanager.json"}}
+{"log":"20171223-22:15:33:144|Step_SPUtils|30002312| getTodayTotalDetailSteps = 1514038440000##7013##548365##8661##12836##27176966","decoder":{"name":"myapp-event","parent":"myapp","prematch":"Step_SPUtils","regex":"(\\.+)\\|Step_SPUtils\\|30002312\\| getTodayTotalDetailSteps = (\\.+)","order":["logtime","getTodayTotalDetailSteps"],"source_file":"feedback/pipemetric.json"}}
+{"timestamp": "2026-05-16T08:56:11.647689Z", "approved": true, "log": "May 16 14:22:31 plc-gateway01 scada-engine[2241]: ALERT Modbus unauthorized write request detected from 10.10.50.24 function_code=0x10 register=40123", "extract_fields": ["srcip", "funtion_code"], "notes": "", "decoder": {"name": "myapp-event", "parent": "myapp", "prematch": "scada-engine", "regex": "ALERT\\s+Modbus\\s+unauthorized\\s+write\\s+request\\s+detected\\s+from\\s+(\\d+.\\d+.\\d+.\\d+)\\s+function_code=(\\d+x\\d+)\\s+register=\\d+", "order": ["srcip", "function_code"], "source_file": "feedback/myapp.json"}, "target_text": "myapp-event myapp scada-engine alert\\s+modbus\\s+unauthorized\\s+write\\s+request\\s+detected\\s+from\\s+(\\d+.\\d+.\\d+.\\d+)\\s+function_code=(\\d+x\\d+)\\s+register=\\d+ srcip function_code feedback/myapp.json"}


+{"timestamp": "2026-04-29T05:52:13.354712Z", "approved": false, "app_name": "myapp", "log": "[2026-04-29T04:29:06,056][INFO ][o.o.s.s.c.FlintStreamingJobHouseKeeperTask] [node-1] Starting housekeeping task for auto refresh streaming jobs.", "extract_fields": ["logtime", "loglevel", "message"], "notes": "[(\\d+-\\d+-\\S+:\\d+:\\d+,\\d+)][(\\S+)\\s][\\.+] [\\S+] (\\.+)"}
+{"timestamp": "2026-04-29T08:50:41.323760Z", "approved": false, "app_name": "myapp", "log": "[2026-04-29T04:29:06,056][INFO ][o.o.s.s.c.FlintStreamingJobHouseKeeperTask] [node-1] Starting housekeeping task for auto refresh streaming jobs.", "extract_fields": [], "notes": "It should be corrected like this"}
+{"timestamp": "2026-04-29T08:50:41.368350Z", "approved": false, "app_name": "myapp", "log": "[2026-04-29T04:29:06,056][INFO ][o.o.s.s.c.FlintStreamingJobHouseKeeperTask] [node-1] Starting housekeeping task for auto refresh streaming jobs.", "extract_fields": [], "notes": "It should be corrected like this"}
+{"timestamp": "2026-05-16T08:56:23.312599Z", "approved": false, "app_name": "myapp", "log": "May 16 14:22:31 plc-gateway01 scada-engine[2241]: ALERT Modbus unauthorized write request detected from 10.10.50.24 function_code=0x10 register=40123", "extract_fields": ["srcip", "funtion_code"], "notes": ""}


+For this workspace, the app now defaults to:
+
+```bash
+WAZUH_SSH_HOST=192.168.56.10
+WAZUH_SSH_PORT=22
+WAZUH_SSH_USER=vagrant
+WAZUH_SSH_PASSWORD=vagrant
+```


+WAZUH_HOST = "127.0.0.1"
+WAZUH_PORT = "2222"
+WAZUH_USER = "vagrant"
+
+# read from environment variable
+WAZUH_LOGTEST = os.getenv("WAZUH_LOGTEST_PATH", "/var/ossec/bin/wazuh-logtest")
+
+
+def run_logtest(log_line):
+    cmd = [
+        "ssh",
+        "-p", WAZUH_PORT,
+        f"{WAZUH_USER}@{WAZUH_HOST}",
+        f"sudo {WAZUH_LOGTEST}"
+    ]


- Hybrid AI generation: programmatic base XML (guaranteed correct) + AI review for regex improvement - wazuh-logtest always checked before AI generation to determine parent strategy - Parent decoder uses <program_name> when log has a decoded program name - Fields already decoded by built-in decoders are skipped automatically - AI prompt refocused on reviewing/improving regex patterns instead of writing XML from scratch - Git subprocess calls now have timeouts to prevent startup hangs - Updated README with AI provider setup and hybrid approach documentation

…ation - Removed Decoder Generator and Rule Generator sections from HTML - Moved input fields (appName, logsInput, extractFields, etc.) into AI view - Removed 'Generate Decoder' and 'Generate Rule' sidebar nav items - Made 'AI Generate' the default active view - Cleaned up app.js: removed unused functions (showAnalysis, showXml, syncFeedback, readRulePayload, rule conditions UI, old button handlers) - Updated history loading and test function to work without decoder view

- Added POST /api/install endpoint to write decoder/rule XML to Wazuh's custom decoders/rules directories (SSH or local) - Added POST /api/uninstall endpoint to remove installed files - Added POST /api/logtest/raw endpoint for running wazuh-logtest with arbitrary log samples and returning raw output + parsed fields - Redesigned Test view with three cards: Installed Decoder (install/ uninstall), Test Logs (editable sample input), and wazuh-logtest Output (raw stdout + parsed fields table) - Added state management storing installed file paths in localStorage - AI-generated XML is now persisted in JS so it can be installed from the Test view without re-running AI generation

…ailure - Add generation_mode (auto/decoder_only/rule_only/both) to AI request - Add validate_with_logtest flag and /api/ai/generate-validated endpoint - Add _collect_ai_response, _extract_xml_from_ai_response helpers - Add _validate_ai_decoder_with_logtest for auto-install+test validation - Refactor _build_ai_prompt: shorter config block, concise ML/logtest context - Add system prompt for Ollama (system+user roles), fix URL path - Lower default temperature to 0.05 for more deterministic output - Default model changed to wazuh-decoder - UI: generation mode dropdown, validate checkbox, Generate & Validate button - UI: show validation badge & details in AI output section - UI: hide rule section when generation_mode=decoder_only

…ndpoint and automate rule group/static field sanitization

…coring, and sigmoid calibration - Add log-type detection (_detect_log_type) with type-based boosting to bias results toward relevant decoder families (JSON, Windows, syslog, etc.) - Add regex token overlap scoring (_regex_overlap_score) to boost patterns whose OS_Regex tokens match query log literals - Add sigmoid confidence calibration for well-calibrated probabilities in [0,1] - Tune ensemble weights: TF-IDF 0.3, SBERT 0.7 (semantic model is stronger for unseen formats) - Raise minimum confidence gate to 0.15 to avoid low-confidence noise - Add fine-tuned SBERT checkpoint loading with graceful fallback - Enhance tokenizer to preserve more OS_Regex character classes

… Modelfile - Lower temperature (0.05→0.02) and top_p (0.85→0.80) for more deterministic output - Increase repeat_penalty (1.15→1.20) and lower top_k (20→15) to reduce repetition - Add self-validation checklist to catch common errors before output - Add JSON log decoder and DHCP/MAC address examples - Fix sshd example to use same decoder name for multiple children - Add instruction: 'No text before or after' the XML block

…lization - Default OLLAMA_BASE_URL to http://localhost:11434/v1 so it works without env vars - Normalize /v1 suffix to prevent double-/v1 404 errors in URL construction - Add 60s timeout to streaming client with retry on ReadTimeout (up to 3 attempts) - Add decoder rule: multiple child decoders must use exact same decoder name - Fix IP regex guidance: do not escape dots in \d+.\d+.\d+.\d+ - Update top_k to 15 and repeat_penalty to 1.20 to match Modelfile tuning - Improve error messages for network/server issues

…to dataset builder - Add load_rejection_records(): convert rejection notes with regex corrections into positive training pairs - Add augment_with_dropout(): create robustness variants by randomly masking log tokens (15% prob) - Rejection corrections teach SBERT to distinguish correct from broken regex patterns - Dropout augmentation teaches model that partial log lines still map to same decoder - Add structured logging of record counts throughout pipeline

…nting to SBERT training - 5 epochs with best-checkpoint saving (by validation AUC) - Larger batch size (64 configurable) for better in-batch negatives with MultipleNegativesRankingLoss - Hard-negative augmentation: pair logs with categorically distinct decoders (30% ratio) - Token dropout data augmentation for robustness on partial input - Early stopping with patience=2 epochs - Add binary evaluator with both positive and negative pairs for AUC measurement - Configurable training device (default CPU to avoid MPS OOM with Ollama) - Copy best checkpoint to 'final' directory for easy model loading

The sidebar defaulted to AI Generate as active, but the corresponding #view-ai div was missing the 'active' class, so CSS display:none kept the entire AI generation page blank on initial load.

Hasitha9796 and others added 30 commits August 20, 2025 17:33

README.md

3451028

Create custom-ticketing

2386f02

Create custom-ticketing.py

cc6f037

Create README.md

78f12de

Create custom-email.py

4923e48

Create Vagrantfile

0dd69ae

Create inventory.ini

ea00869

Create README.md

767d3d4

Update README.md

2612226

Update README.md

ec4bd0c

Update README.md

c3f62c6

Delete Custom email template directory

e430af0

Merge branch 'wazuh:main' into main

8bb083b

Added integration: Microsoft Teams Using Ticketing as a Service

361f955

Added ansible + vagrant deployement steps.

9f49dee

Delete Wazuh + Microsoft Teams Ticketing as a service directory

fbe1354

Delete wazuh-deployment-ansible-vagrant directory

77a855f

updated the folder names

21bda76

Merge branch 'wazuh:main' into main

f2ff40c

Merge branch 'wazuh:main' into main

78a205f

feat: add wazuh decoder rule tool integration

e09e897

fix(decoder): ensure CEF split decoders use user requested field name…

a3ee16b

…s and auto-enable split mode for CEF logs

feat(decoder): enable split decoder generation by default for all log…

c0b9192

… formats for more reliable extraction

fix(decoder): dynamically extract full field keys for non-CEF key=val…

e50b588

…ue logs instead of truncating prefixes

style(ui): clarify log input section instructions to indicate single-…

d1dbe2d

…source pattern learning

fix(decoder): support multiple program names and aggregate child deco…

cb7dc0d

…ders from all logs

fix(decoder): improve fallback regex generation to use IP specific pa…

dc77e20

…tterns and full preceding words instead of truncating prefixes

fix(decoder): detect numeric dynamic fields (e.g. IPs) in prefixes an…

973f686

…d generalize them to \d+ to prevent brittle anchors

Update README with HTTPS setup instructions and refine regex generati…

bff6cb7

…on for decoders

Fix regex to correctly include spaces before punctuation in dynamic d…

f74c97a

…ecoder prefixes

Hasitha9796 added 16 commits May 23, 2026 16:09

fix: return clean error messages instead of 500 when wazuh-logtest is…

e57cb2e

… unavailable

fix: block decoder/rule generation when wazuh-logtest is unavailable

052fe6a

feat: status pill turns red when wazuh-logtest is not accessible

a5b626c

fix(rules): remove regex from child rules — use match/field/static ta…

56f6cb8

…gs only

fix: add missing decoded_fields initialization in parse_logtest_output

f14a574

The result dict for parse_logtest_output was missing the 'decoded_fields' key initialization. When phase 2 decoded fields were found, the code tried result['decoded_fields'][fname] = fval which raised KeyError.

nicolascurioni requested a review from Copilot May 25, 2026 13:13

Copilot started reviewing on behalf of nicolascurioni May 25, 2026 13:13 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Hasitha9796 added 11 commits May 26, 2026 16:58

Improve AI generation pipeline, tune Ollama prompt, fix completions e…

0b42a17

…ndpoint and automate rule group/static field sanitization

fix(ui): add missing active class to AI view so it shows on page load

07fa10b

The sidebar defaulted to AI Generate as active, but the corresponding #view-ai div was missing the 'active' class, so CSS display:none kept the entire AI generation page blank on initial load.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This PR adds a new Wazuh integration for Wazuh decoder rule generation tool#79

This PR adds a new Wazuh integration for Wazuh decoder rule generation tool#79
Hasitha9796 wants to merge 66 commits into
wazuh:mainfrom
Hasitha9796:main

Hasitha9796 commented May 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hasitha9796 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Features

AI-Powered Generation (Hybrid Approach)

Enhanced ML Decoder Similarity

Improved Decoder Generation

Robustness & Reliability

Included

Testing

Connecting to Wazuh VM for wazuh-logtest

Example Scenario

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hasitha9796 commented May 1, 2026 •

edited

Loading