Skip to content

fix: inherit Core from BaseEstimator for sklearn compatibility (Issue…#564

Open
CRAFTYPROGRAMMER826 wants to merge 1 commit into
dswah:mainfrom
CRAFTYPROGRAMMER826:sklearn-compat
Open

fix: inherit Core from BaseEstimator for sklearn compatibility (Issue…#564
CRAFTYPROGRAMMER826 wants to merge 1 commit into
dswah:mainfrom
CRAFTYPROGRAMMER826:sklearn-compat

Conversation

@CRAFTYPROGRAMMER826

Copy link
Copy Markdown

Fixes #422

Overview

This PR begins the foundational work to make pyGAM compatible with the
scikit-learn estimator API (v1.7+), enabling pyGAM models to be used
inside sklearn Pipelines, GridSearchCV, and cross_validate workflows
without manual wrappers.

Research & Audit

A full compatibility audit was performed on the pyGAM source code against
the scikit-learn estimator contract. The findings are documented in
compatibility_log.md which includes diagnostic test outputs, MRO analysis,
signature inspection, and check_estimator failure traces. Reviewers are
strongly encouraged to go through it for full context.

Root Cause Analysis

The audit identified the following structural incompatibilities:

1. Missing BaseEstimator in Inheritance Chain (Fixed in this PR)

The foundational class pygam.core.Core inherited directly from object,
meaning no pyGAM estimator was recognized by sklearn's internal tooling.
Refactoring Core to inherit from sklearn.base.BaseEstimator corrects
the MRO across all estimators (LinearGAM, LogisticGAM, PoissonGAM, etc.)
and provides get_params(), set_params(), and __sklearn_tags__()
through standard inheritance.

Verified after fix:

  • clone(LinearGAM())
  • Pipeline([StandardScaler(), LinearGAM()]).fit(X, y)

2. **kwargs in __init__ Signatures (Remaining — next PR)

Child estimators like LinearGAM pass parameters up to Core via
**kwargs. sklearn's introspection uses inspect.signature() to discover
parameters — anything inside **kwargs is invisible to it. This means
parameters like n_splines and lam cannot be read by get_params(),
cannot be set by set_params(), and cannot be tuned by GridSearchCV.
The fix requires replacing **kwargs with explicit, default-valued
parameters in each estimator's __init__.

3. Non-Standard Parameter Storage (Remaining — next PR)

Core.__init__ stores parameters with leading underscores
(e.g. self._name = name). sklearn requires a strict 1:1 mapping —
a parameter named name must be stored as self.name. The underscore
storage causes failures during cloning and hyperparameter tuning as
sklearn cannot locate the stored values.

4. Custom get_params / set_params Conflict (Remaining — next PR)

pyGAM defines its own get_params() and set_params() with non-standard
arguments (force=False, custom filtering for user-facing vs non-user-facing
attributes). These override the BaseEstimator versions and break nested
object handling inside Pipelines. These custom implementations need to be
removed or refactored to conform to the sklearn contract.

Status

This PR addresses the foundational inheritance issue. The remaining items
above are well-understood and scoped — they will be addressed in follow-up
PRs as part of the GSoC 2026 project proposal for Issue #422.

Please refer to compatibility_log.md for full diagnostic outputs, test traces, and analysis.

@CRAFTYPROGRAMMER826

Copy link
Copy Markdown
Author

@dswah could you please review and provide your valuable feedback on the changed classes and discovered gaps responsible for incompatibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Support for Scikit-Learn v1.7+ Compatibility

1 participant