Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
3451028
README.md
Hasitha9796 Aug 20, 2025
2386f02
Create custom-ticketing
Hasitha9796 Aug 20, 2025
cc6f037
Create custom-ticketing.py
Hasitha9796 Aug 20, 2025
78f12de
Create README.md
Hasitha9796 Aug 20, 2025
4923e48
Create custom-email.py
Hasitha9796 Aug 20, 2025
0dd69ae
Create Vagrantfile
Hasitha9796 Aug 27, 2025
ea00869
Create inventory.ini
Hasitha9796 Aug 27, 2025
767d3d4
Create README.md
Hasitha9796 Aug 27, 2025
2612226
Update README.md
Hasitha9796 Aug 27, 2025
ec4bd0c
Update README.md
Hasitha9796 Aug 27, 2025
c3f62c6
Update README.md
Hasitha9796 Aug 27, 2025
e430af0
Delete Custom email template directory
Hasitha9796 Sep 22, 2025
8bb083b
Merge branch 'wazuh:main' into main
Hasitha9796 Sep 24, 2025
361f955
Added integration: Microsoft Teams Using Ticketing as a Service
Hasitha9796 Sep 24, 2025
9f49dee
Added ansible + vagrant deployement steps.
Hasitha9796 Sep 24, 2025
fbe1354
Delete Wazuh + Microsoft Teams Ticketing as a service directory
Hasitha9796 Sep 24, 2025
77a855f
Delete wazuh-deployment-ansible-vagrant directory
Hasitha9796 Sep 24, 2025
21bda76
updated the folder names
Hasitha9796 Sep 29, 2025
f2ff40c
Merge branch 'wazuh:main' into main
Hasitha9796 May 1, 2026
78a205f
Merge branch 'wazuh:main' into main
Hasitha9796 May 12, 2026
e09e897
feat: add wazuh decoder rule tool integration
Hasitha9796 May 17, 2026
a3ee16b
fix(decoder): ensure CEF split decoders use user requested field name…
Hasitha9796 May 18, 2026
c0b9192
feat(decoder): enable split decoder generation by default for all log…
Hasitha9796 May 18, 2026
e50b588
fix(decoder): dynamically extract full field keys for non-CEF key=val…
Hasitha9796 May 18, 2026
d1dbe2d
style(ui): clarify log input section instructions to indicate single-…
Hasitha9796 May 18, 2026
cb7dc0d
fix(decoder): support multiple program names and aggregate child deco…
Hasitha9796 May 18, 2026
dc77e20
fix(decoder): improve fallback regex generation to use IP specific pa…
Hasitha9796 May 18, 2026
973f686
fix(decoder): detect numeric dynamic fields (e.g. IPs) in prefixes an…
Hasitha9796 May 18, 2026
bff6cb7
Update README with HTTPS setup instructions and refine regex generati…
Hasitha9796 May 19, 2026
f74c97a
Fix regex to correctly include spaces before punctuation in dynamic d…
Hasitha9796 May 19, 2026
daaa1fe
Fix parent decoder XML formatting to avoid double newlines
Hasitha9796 May 19, 2026
7bbd2c6
fix(decoder): support user mapping hints and generalize dynamic prefi…
Hasitha9796 May 19, 2026
b2cee2b
feat(decoder): improve rule generation with log source name, regex ma…
Hasitha9796 May 20, 2026
07941a0
feat(ai): switch to Llama 3.3 70B free model with retry logic and Das…
Hasitha9796 May 20, 2026
4d2d9a5
feat: enhance ML model with ensemble approach for improved accuracy
Hasitha9796 May 23, 2026
f3f4f91
feat(ui): split decoder and rule generation into separate pages
Hasitha9796 May 23, 2026
95d540a
feat(rules): generate parent + child rule when rule_requirement is pr…
Hasitha9796 May 23, 2026
b7f3e44
fix(rules): child rule now uses condition-specific regex and clean de…
Hasitha9796 May 23, 2026
3bcb547
fix(rules): use osregex-compatible \.+ for child regex prefix/suffix
Hasitha9796 May 23, 2026
e57cb2e
fix: return clean error messages instead of 500 when wazuh-logtest is…
Hasitha9796 May 23, 2026
052fe6a
fix: block decoder/rule generation when wazuh-logtest is unavailable
Hasitha9796 May 23, 2026
a5b626c
feat: status pill turns red when wazuh-logtest is not accessible
Hasitha9796 May 23, 2026
27eb8d4
feat(rules): add static field tag support and improve auto-detection …
Hasitha9796 May 24, 2026
fbcc676
fix(rules): child-only mode when parent_rule_id is set, fix descripti…
Hasitha9796 May 24, 2026
4625892
fix(rules): clean up child-only rule output and auto-detection
Hasitha9796 May 24, 2026
2c6972c
feat(ui): add explicit rule_description input field
Hasitha9796 May 24, 2026
56f6cb8
fix(rules): remove regex from child rules — use match/field/static ta…
Hasitha9796 May 24, 2026
a553a8c
fix(rules): only create child rule when explicitly requested
Hasitha9796 May 25, 2026
5007797
fix(decoder): escape_xml no longer escapes >; generalize IP prefix in…
Hasitha9796 May 25, 2026
154d4a0
fix(decoder): escape [] in regex prefix; skip already-decoded decoder…
Hasitha9796 May 25, 2026
e4d63ee
feat(decoder): per-field validation with reasons; field-level verific…
Hasitha9796 May 25, 2026
c6e0349
fix: revert [] escaping in osregex_escape — Wazuh OS_REGEX treats bra…
Hasitha9796 May 25, 2026
f14a574
fix: add missing decoded_fields initialization in parse_logtest_output
Hasitha9796 May 25, 2026
3df5876
fix(decoder): shorten Path 2 prefixes to last 1-2 words; remove unuse…
Hasitha9796 May 25, 2026
36a91c2
fix(rules): group name trailing comma; parent desc always 'messages g…
Hasitha9796 May 25, 2026
d8277d7
feat(ai): hybrid decoder generation with wazuh-logtest integration
Hasitha9796 May 26, 2026
7e1bf1a
refactor(ui): remove decoder/rule generator views, keep only AI gener…
Hasitha9796 May 26, 2026
1492277
feat(test): real-time wazuh-logtest view with install/uninstall workflow
Hasitha9796 May 26, 2026
fca0963
feat(ai): add generate-validate endpoint with auto-retry on logtest f…
Hasitha9796 May 27, 2026
0b42a17
Improve AI generation pipeline, tune Ollama prompt, fix completions e…
Hasitha9796 May 27, 2026
5d38755
feat(ml): enhance ensemble model with log-type hints, regex overlap s…
Hasitha9796 Jun 2, 2026
e174ddf
feat(ai): tune Ollama parameters and add self-validation checklist to…
Hasitha9796 Jun 2, 2026
d0feda1
feat(ai): add streaming timeout handling and default Ollama URL norma…
Hasitha9796 Jun 2, 2026
048d707
feat(data): add rejection-based corrections and dropout augmentation …
Hasitha9796 Jun 2, 2026
0c0dfc3
feat(train): add hard-negative sampling, early stopping, and checkpoi…
Hasitha9796 Jun 2, 2026
07fa10b
fix(ui): add missing active class to AI view so it shows on page load
Hasitha9796 Jun 2, 2026
fb90732
fix(ai): strip escaped dots in AI-generated regex and strengthen OS_R…
Hasitha9796 Jun 2, 2026
ede0644
fix(ai): correct regex sanitization and add IP example to Modelfile
Hasitha9796 Jun 2, 2026
2154d23
fix(ui): add client-side OS_Regex sanitization for streaming AI output
Hasitha9796 Jun 2, 2026
3504e72
chore: bump static file version for cache bust
Hasitha9796 Jun 2, 2026
6b96bff
feat(train): add TrafficLog IP extraction example to finetuning data
Hasitha9796 Jun 2, 2026
598f91e
fix(ai): add OS_Regex IP dot sanitization and improve Modelfile IP re…
Hasitha9796 Jun 6, 2026
0a9ee35
feat(ai): remove programmatic XML from AI prompt; add bare-dot OS_Reg…
Hasitha9796 Jun 6, 2026
965c0b0
feat(ai): only show AI-generated output; add programmatic fallback an…
Hasitha9796 Jun 6, 2026
63e8687
fix(ai): inject programmatic regex into AI decoder XML instead of ban…
Hasitha9796 Jun 6, 2026
258d3d8
clean: remove all programmatic interference from AI generation
Hasitha9796 Jun 6, 2026
979500f
clean: remove all programmatic interference; add OS_Regex training sc…
Hasitha9796 Jun 6, 2026
d8174f5
feat: 806 training examples from official Wazuh decoder repo; silent …
Hasitha9796 Jun 7, 2026
1b1283c
fix: stop client-side sanitizeOsRegex (backend handles it); improve p…
Hasitha9796 Jun 7, 2026
bd7a758
fix: simplify prefix regex generation — use exact key= separator inst…
Hasitha9796 Jun 7, 2026
3563de0
feat: warn if log already matches existing decoder before AI generati…
Hasitha9796 Jun 8, 2026
3e9851f
fix: improve AI prompt with prematch/program_name strategy; fix ML re…
Hasitha9796 Jun 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions integrations/wazuh_decoder_rule_tool/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Virtual environments
venv/
.venv/
test_venv/

# Python cache
__pycache__/
*.pyc

# Local scratch and temp folders
scratch/
.claude/

# Cache directories and ML models
data/models/
data/wazuh_repo/
data/wazuh_ruleset_repo/
.cache_decoders/

Comment on lines +14 to +19

# Generated output and TLS materials
generated/
certs/
*.pem
*.key
*.crt
85 changes: 85 additions & 0 deletions integrations/wazuh_decoder_rule_tool/ML_ENHANCEMENT_SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# ML Enhancement Summary for Wazuh Decoder Rule Tool

## Overview
This document summarizes the enhancements made to improve the ML model accuracy in the Wazuh decoder rule tool, targeting 100% accuracy through ensemble methods and advanced feature engineering.

## Key Improvements Made

### 1. Enhanced Feature Engineering (`decoder_ml_enhanced.py`)
- **EnhancedDecoderPattern class**: Extended the base DecoderPattern with weighted feature components
- Name: 3x weight
- Program name: 2x weight
- Prematch: 2.5x weight
- Regex: 3x weight (with specialized token extraction)
- Order: 1.5x weight
- Source file: 1x weight

### 2. Ensemble Model Approach (`decoder_ml_enhanced.py`)
- **EnsembleDecoderSimilarityModel class**: Combines TF-IDF and SBERT for superior accuracy
- TF-IDF component: Excellent for exact token matching and specialized regex patterns
- SBERT component: Excellent for semantic similarity and contextual understanding
- Configurable weighting (default: 40% TF-IDF, 60% SBERT)
- Enhanced tokenization preserving important regex patterns like `[\w+]`, `\d+`, `\s+`, `\w+`

### 3. Integration with Existing Codebase (`main.py`)
- Replaced all `ensure_ml_model()` calls with `ensure_ml_model_enhanced(force_refresh=False, use_ensemble=True)`
- Updated ML status and refresh endpoints to use enhanced model
- Maintained backward compatibility through wrapper class
- Preserved all existing functionality while improving accuracy

### 4. Comprehensive Test Suite (`tests/`)
- `test_ml_enhanced.py`: Unit tests for enhanced ML components
- `test_integration.py`: Integration tests for model loading
- Tests cover pattern creation, ensemble modeling, suggestion functionality, and backward compatibility

## How to Achieve 100% Accuracy

### 1. Data Quality Improvements
- Expand training dataset with more diverse log samples
- Implement active learning loop using user feedback
- Add negative examples to improve discrimination
- Regularly update Wazuh decoder repository

### 2. Model Tuning Strategies
- **Weight Optimization**: Tune TF-IDF/SBERT weights based on validation performance
- **Threshold Calibration**: Optimize confidence thresholds for different log types
- **Ensemble Diversity**: Consider adding third model (e.g., cosine n-grams) for additional diversity
- **Hyperparameter Search**: Optimize TF-IDF parameters (ngram range, max_features, etc.)

### 3. Advanced Techniques for 100% Target
- **Hierarchical Classification**: First predict log type, then apply specialized models
- **Confidence Calibration**: Use Platt scaling or isotonic regression for better probability estimates
- **Error Analysis Loop**: Systematically analyze mistakes and add targeted training examples
- **Model Distillation**: Create smaller, faster ensemble for production use
- **Online Learning**: Continuously update model with new verified examples

### 4. Implementation Recommendations
1. **Implement Confidence Thresholds**: Only accept predictions above calibrated confidence threshold
2. **Add Fallback Mechanisms**: If ensemble confidence is low, fall back to rule-based heuristics
3. **Create Specialized Models**: Train separate models for different log types (syslog, JSON, CEF, etc.)
4. **Feature Importance Analysis**: Identify and enhance most discriminative features
5. **Cross-Validation**: Implement rigorous cross-validation to prevent overfitting

## Files Modified
1. `app/decoder_ml_enhanced.py` - New file with enhanced ML components
2. `app/main.py` - Updated to use enhanced ML model throughout
3. `tests/test_ml_enhanced.py` - Unit tests for enhanced components
4. `tests/test_integration.py` - Integration tests
5. `ML_ENHANCEMENT_SUMMARY.md` - This document

## Usage
The enhancements are automatically active. The tool now uses:
- Ensemble model (TF-IDF + SBERT) when advanced ML packages are available
- Falls back to original TF-IDF model if advanced packages unavailable
- Maintains full backward compatibility with existing API

## Next Steps for 100% Accuracy
1. Collect and label more diverse training data
2. Implement confidence calibration using validation set
3. Add specialized models for different log formats
4. Implement active learning from user corrections
5. Create continuous evaluation pipeline
6. Add model versioning and A/B testing capabilities

## Conclusion
These enhancements provide a strong foundation for achieving very high accuracy (>95%) through ensemble methods and improved feature engineering. Reaching 100% will require ongoing data collection, model tuning, and implementation of the advanced techniques outlined above.
Loading