Add CLI executable with model registry support#127
Conversation
Implements issue #119 with a full-featured command-line interface: - Default action is classification (no subcommand needed) - Supports Bayes, LSI, KNN, and Logistic Regression models - Commands: train, info, fit (LR only), search (LSI), related (LSI) - Flags: -f (file), -m (model type), -p (probabilities), -k (KNN neighbors) - Reads from stdin when no text argument provided - Sensible defaults: ./classifier.json, bayes model type Also: - Add Classifier::VERSION as single source of truth - LR now requires explicit fit() before classification - Add add_category() method to LR for dynamic category creation Closes #119
Instead of showing "Error: model not found" when running `classifier` with no arguments and no model file, show a friendly getting started guide with examples for training, classifying, and LSI semantic search.
When a model exists but classifier is run without any text to classify, show the model info and usage hints instead of silently exiting.
The info command now outputs pretty-printed JSON with detailed stats: - file: model path - type: classifier type - categories: list of categories - category_stats: per-category word counts (Bayes) - documents: document count (LSI, KNN) - items: list of items (LSI) - fitted: fit status (LR)
- LSI examples now use stdin training like Bayes examples - Use separate lsi.json file to avoid conflicts - All examples can be copy-pasted and run immediately
- Add defaults for --learning-rate (0.1), --regularization (0.01), --max-iterations (100) - Remove -f lsi.json from LSI examples for simpler copy-paste
Greptile SummaryImplements a full-featured CLI for training and classifying text across all classifier types (Bayes, LSI, KNN, Logistic Regression). The implementation successfully provides sensible defaults, stdin support, JSON output for Key Changes:
Style Issues Found:
Implementation Quality: Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Exe as exe/classifier
participant CLI as CLI.new
participant Parser as OptionParser
participant Command as execute_command
participant Classifier as Bayes/LSI/KNN/LR
participant Storage as Storage::File
User->>Exe: Run classifier [args]
Exe->>CLI: new(ARGV)
CLI->>CLI: initialize(args)
Note over CLI: Set default options from ENV
Exe->>CLI: run()
CLI->>Parser: parse_options
Parser->>CLI: Parse flags (-f, -m, -p, etc)
CLI->>Command: execute_command
alt train command
Command->>Storage: load_or_create_classifier
Storage-->>Command: Classifier instance
Command->>Classifier: train(category, text)
Command->>Storage: save_classifier
Storage->>Storage: write JSON
else classify command (default)
alt No model exists
Command->>User: show_getting_started
else Model exists
Command->>Storage: load_classifier
Storage-->>Command: Classifier instance
alt LR not fitted
Command-->>User: Error: Model not fitted
else Ready to classify
Command->>Classifier: classify(text)
Classifier-->>Command: category
Command->>User: Output result
end
end
else fit command (LR only)
Command->>Storage: load_classifier
Storage-->>Command: LR instance
Command->>Classifier: fit()
Note over Classifier: Run SGD optimization
Command->>Storage: save_classifier
else info command
Command->>Storage: load_classifier
Storage-->>Command: Classifier instance
Command->>Classifier: Extract metadata
Command->>User: Output JSON
else search/related (LSI only)
Command->>Storage: load_classifier
Storage-->>Command: LSI instance
Command->>Classifier: search(query) / find_related(item)
Classifier-->>Command: results with scores
Command->>User: Output results
end
CLI-->>Exe: {output, error, exit_code}
Exe->>User: Display output/error
Exe->>User: exit(code)
|
| unless File.exist?(@options[:model]) | ||
| if text.empty? && ($stdin.tty? || @stdin.nil?) | ||
| show_getting_started | ||
| return | ||
| end | ||
| @error << "Error: model not found at #{@options[:model]}" | ||
| @exit_code = 1 | ||
| return | ||
| end |
There was a problem hiding this comment.
style: Nested conditionals violate style guide. Use guard clauses with early returns.
| unless File.exist?(@options[:model]) | |
| if text.empty? && ($stdin.tty? || @stdin.nil?) | |
| show_getting_started | |
| return | |
| end | |
| @error << "Error: model not found at #{@options[:model]}" | |
| @exit_code = 1 | |
| return | |
| end | |
| # No model and no input - show getting started guide | |
| if text.empty? && ($stdin.tty? || @stdin.nil?) | |
| show_getting_started | |
| return | |
| end | |
| unless File.exist?(@options[:model]) | |
| @error << "Error: model not found at #{@options[:model]}" | |
| @exit_code = 1 | |
| return | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 336:344
Comment:
**style:** Nested conditionals violate style guide. Use guard clauses with early returns.
```suggestion
# No model and no input - show getting started guide
if text.empty? && ($stdin.tty? || @stdin.nil?)
show_getting_started
return
end
unless File.exist?(@options[:model])
@error << "Error: model not found at #{@options[:model]}"
@exit_code = 1
return
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| if text.empty? | ||
| lines = read_stdin_lines | ||
| if lines.empty? | ||
| show_model_usage(classifier) | ||
| return | ||
| end | ||
| lines.each { |line| classify_and_output(classifier, line) } | ||
| else | ||
| classify_and_output(classifier, text) | ||
| end |
There was a problem hiding this comment.
style: Nested conditionals make logic harder to follow. Refactor using guard clauses.
| if text.empty? | |
| lines = read_stdin_lines | |
| if lines.empty? | |
| show_model_usage(classifier) | |
| return | |
| end | |
| lines.each { |line| classify_and_output(classifier, line) } | |
| else | |
| classify_and_output(classifier, text) | |
| end | |
| if text.empty? | |
| lines = read_stdin_lines | |
| return show_model_usage(classifier) if lines.empty? | |
| lines.each { |line| classify_and_output(classifier, line) } | |
| else | |
| classify_and_output(classifier, text) | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 348:357
Comment:
**style:** Nested conditionals make logic harder to follow. Refactor using guard clauses.
```suggestion
if text.empty?
lines = read_stdin_lines
return show_model_usage(classifier) if lines.empty?
lines.each { |line| classify_and_output(classifier, line) }
else
classify_and_output(classifier, text)
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| if classifier.is_a?(LSI) && @args.any? | ||
| train_lsi_from_files(classifier, category, @args) | ||
| else | ||
| text = read_training_input | ||
| if text.empty? | ||
| @error << 'Error: no training data provided' | ||
| @exit_code = 2 | ||
| return | ||
| end | ||
| train_classifier(classifier, category, text) | ||
| end |
There was a problem hiding this comment.
style: Nested conditional inside else branch. Use guard clauses for clearer flow.
| if classifier.is_a?(LSI) && @args.any? | |
| train_lsi_from_files(classifier, category, @args) | |
| else | |
| text = read_training_input | |
| if text.empty? | |
| @error << 'Error: no training data provided' | |
| @exit_code = 2 | |
| return | |
| end | |
| train_classifier(classifier, category, text) | |
| end | |
| # For LSI, train with file paths as item keys | |
| if classifier.is_a?(LSI) && @args.any? | |
| train_lsi_from_files(classifier, category, @args) | |
| save_classifier(classifier) | |
| return | |
| end | |
| text = read_training_input | |
| if text.empty? | |
| @error << 'Error: no training data provided' | |
| @exit_code = 2 | |
| return | |
| end | |
| train_classifier(classifier, category, text) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 171:181
Comment:
**style:** Nested conditional inside else branch. Use guard clauses for clearer flow.
```suggestion
# For LSI, train with file paths as item keys
if classifier.is_a?(LSI) && @args.any?
train_lsi_from_files(classifier, category, @args)
save_classifier(classifier)
return
end
text = read_training_input
if text.empty?
@error << 'Error: no training data provided'
@exit_code = 2
return
end
train_classifier(classifier, category, text)
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # Categories - different classifiers store them differently | ||
| if classifier.respond_to?(:categories) | ||
| info[:categories] = classifier.categories.map(&:to_s) | ||
| end | ||
|
|
||
| info[:training_count] = classifier.training_count if classifier.respond_to?(:training_count) | ||
| info[:vocab_size] = classifier.vocab_size if classifier.respond_to?(:vocab_size) | ||
| info[:fitted] = classifier.fitted? if classifier.respond_to?(:fitted?) | ||
|
|
||
| # Category-specific stats for Bayes | ||
| if classifier.is_a?(Bayes) | ||
| categories_data = classifier.instance_variable_get(:@categories) | ||
| info[:category_stats] = classifier.categories.to_h do |cat| | ||
| cat_data = categories_data[cat.to_sym] || {} | ||
| [cat.to_s, { unique_words: cat_data.size, total_words: cat_data.values.sum }] | ||
| end | ||
| end | ||
|
|
||
| # LSI-specific info | ||
| if classifier.is_a?(LSI) | ||
| info[:documents] = classifier.items.size | ||
| info[:items] = classifier.items | ||
| # Get unique categories from items | ||
| categories = classifier.items.map { |item| classifier.categories_for(item) }.flatten.uniq | ||
| info[:categories] = categories.map(&:to_s) unless categories.empty? | ||
| end |
There was a problem hiding this comment.
style: Remove inline comments that restate code. # Categories - different classifiers store them differently and # Get unique categories from items just describe what the code already shows. The conditional checks make the intent obvious.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 204:229
Comment:
**style:** Remove inline comments that restate code. `# Categories - different classifiers store them differently` and `# Get unique categories from items` just describe what the code already shows. The conditional checks make the intent obvious.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # Category-specific stats for Bayes | ||
| if classifier.is_a?(Bayes) | ||
| categories_data = classifier.instance_variable_get(:@categories) | ||
| info[:category_stats] = classifier.categories.to_h do |cat| | ||
| cat_data = categories_data[cat.to_sym] || {} | ||
| [cat.to_s, { unique_words: cat_data.size, total_words: cat_data.values.sum }] | ||
| end | ||
| end |
There was a problem hiding this comment.
style: Remove comment # Category-specific stats for Bayes — the code clearly shows it's extracting Bayes-specific category stats.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 213:220
Comment:
**style:** Remove comment `# Category-specific stats for Bayes` — the code clearly shows it's extracting Bayes-specific category stats.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # KNN-specific info | ||
| if classifier.is_a?(KNN) | ||
| data = classifier.instance_variable_get(:@data) || [] | ||
| info[:documents] = data.size | ||
| categories = data.map { |d| d[:category] }.uniq | ||
| info[:categories] = categories.map(&:to_s) unless categories.empty? | ||
| end |
There was a problem hiding this comment.
style: Remove comment # KNN-specific info — redundant given the type check.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 231:237
Comment:
**style:** Remove comment `# KNN-specific info` — redundant given the type check.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # For LSI, train with file paths as item keys | ||
| if classifier.is_a?(LSI) && @args.any? | ||
| train_lsi_from_files(classifier, category, @args) |
There was a problem hiding this comment.
style: Remove comment # For LSI, train with file paths as item keys — restates what the code shows. The classifier.is_a?(LSI) check and method name train_lsi_from_files make this obvious.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 170:172
Comment:
**style:** Remove comment `# For LSI, train with file paths as item keys` — restates what the code shows. The `classifier.is_a?(LSI)` check and method name `train_lsi_from_files` make this obvious.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # No model and no input - show getting started guide | ||
| unless File.exist?(@options[:model]) | ||
| if text.empty? && ($stdin.tty? || @stdin.nil?) | ||
| show_getting_started | ||
| return |
There was a problem hiding this comment.
style: Remove comment # No model and no input - show getting started guide — obvious from the condition and method call.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 335:339
Comment:
**style:** Remove comment `# No model and no input - show getting started guide` — obvious from the condition and method call.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/cli.rb
Outdated
| # LR requires explicit fit before classification | ||
| if classifier.is_a?(LogisticRegression) && !classifier.fitted? | ||
| raise StandardError, "Model not fitted. Run 'classifier fit' after training." | ||
| end |
There was a problem hiding this comment.
style: Remove comment # LR requires explicit fit before classification — the error message already explains this clearly.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 387:390
Comment:
**style:** Remove comment `# LR requires explicit fit before classification` — the error message already explains this clearly.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| content = File.read(file) | ||
| # Use file path as item key so search/related returns file paths | ||
| classifier.add_item(file, category.to_sym) { content } | ||
| end |
There was a problem hiding this comment.
style: Remove comment # Use file path as item key so search/related returns file paths — obvious from the code using file as the item identifier.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/cli.rb
Line: 488:491
Comment:
**style:** Remove comment `# Use file path as item key so search/related returns file paths` — obvious from the code using `file` as the item identifier.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.- Refactor build_model_info into smaller methods to reduce complexity - Fix Minitest assertion style (refute/assert vs assert_equal) - Add JSON.pretty_generate to vendor RBS
- Fix file permissions on generated RBS files (644 instead of 600) - Refactor restore_state to reduce ABC complexity - Regenerate RBS files for logistic_regression
- Refactor nested conditionals to use guard clauses in command_train - Refactor nested conditionals to use guard clauses in command_classify - Remove redundant comments that restate what code already shows
Enable downloading and using pre-trained models from GitHub repositories similar to Homebrew taps. This makes it easy to share and reuse classifier models without training from scratch. New commands: - `models` - list available models in a registry - `pull` - download models to local cache - `push` - show instructions for contributing models New options: - `-r/--remote MODEL` - classify using a remote model directly - `-o/--output FILE` - specify download location for pull Features: - Default registry: cardmagic/classifier-models - Custom registries: @user/repo:model syntax - Local caching in ~/.classifier/models/ - Automatic fallback from main to master branch - Environment variables for configuration Closes #119
Steep was failing because the CLI uses stdlib classes (FileUtils, URI, Net::HTTP, JSON::ParserError) that weren't declared in the Steepfile. Also fixes nil safety issue with spec[1..] which returns String? in Ruby's type system - now uses fallback to empty string.
The 'models' command now supports --local to list models that have been downloaded to the local cache (~/.classifier/models/) instead of fetching the registry index from GitHub. Shows model name, type, and human-readable file size. Custom registry models display with their full @user/repo:name format.
Use idiomatic Ruby patterns to satisfy linter:
- Replace each+append with map for building arrays
- Use annotated format tokens (%<name>s) over positional ones
- Use %r{} for regex containing slashes
Summary
Implements #119 - a full-featured command-line interface for the classifier gem with model registry support.
Core CLI Features
classifier 'text to classify'train,info,fit(LR only),search(LSI),related(LSI)./classifier.jsonmodel file,bayesmodel typeclassifier infooutputs JSON for jq filteringModel Registry (like Homebrew taps)
Pre-trained models can be downloaded from public GitHub repositories:
github.com/cardmagic/classifier-models@user/repo:modelsyntax)New Commands:
classifier models- List models in default registryclassifier models @user/repo- List models in custom registryclassifier pull <model>- Download model from default registryclassifier pull @user/repo:model- Download model from custom registryclassifier pull @user/repo- Download ALL models from registryclassifier push- Shows instructions for contributing a modelNew Options:
-r, --remote MODEL- Use remote model directly (downloads and caches automatically)-o, --output FILE- Output path for pull commandCache System:
~/.classifier/models/~/.classifier/models/@user/repo/Developer Experience
classifierwith no model shows a getting started guide with copy-paste examplesclassifierwith a model but no input shows model info and usage hintsExample Usage
Environment Variables
CLASSIFIER_MODEL- Default model pathCLASSIFIER_TYPE- Default classifier typeCLASSIFIER_REGISTRY- Default registry (default:cardmagic/classifier-models)CLASSIFIER_CACHE- Cache directory (default:~/.classifier)Test plan
-rflag testedCloses #119