Skip to content

Conversation

@kmontemayor2-sc
Copy link
Collaborator

Scope of work done

  • Support GS inputs and sampling for ABLP Loader
  • Fix some bugs in DistLoader to support "labeled homogeneous" mode
  • Add tests for iterating over both dataloader in with ABLP dataset

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

@kmontemayor2-sc
Copy link
Collaborator Author

/e2e_test

@kmontemayor2-sc
Copy link
Collaborator Author

/integration_test

@kmontemayor2-sc
Copy link
Collaborator Author

/unit_test_py

@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

GiGL Automation

@ 19:58:53UTC : 🔄 E2E Test started.

@ 21:23:22UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

GiGL Automation

@ 19:58:54UTC : 🔄 Integration Test started.

@ 21:07:19UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

GiGL Automation

@ 19:58:59UTC : 🔄 Python Unit Test started.

@ 21:10:08UTC : ✅ Workflow completed successfully.

Copy link
Collaborator

@mkolodner-sc mkolodner-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kyle, left a few comments/questions

# Graph Store mode inputs
dict[int, tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]],
tuple[
NodeType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, how would this interplay if I have user as my anchor node type and item as my supervision node type?

Tbh it would feel a bit weird to specify {user: {0: (0, 1, 1)}} with the 0 corresponding to the user node type but the two 1 node types are items. Ideally, if a user is providing the positive and negative labels here, I feel like they should have to provide the node type in some way. This may be even more true once we will need to support multiple supervision edge types

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout!

Hmmm, wdyt about dict[int, tuple[tuple[NodeType, Tensor], tuple[NodeType, Tensor], Optional[tuple[NodeType, Tensor]]?

Or honest that's so complicated probably worth creating a dataclass to wrap all the data?

I know we want to avoid dataclasses here but frankly this is so complicated I think it's worth it. WDYT?

Copy link
Collaborator

@mkolodner-sc mkolodner-sc Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is growing complicated enough here honestly where I think a dataclass specifically explaining this input (and potentially having the RemoteDataset.get_ablp_inputs() return this dataclass) makes sense.

Re: dict[int, tuple[tuple[NodeType, Tensor], tuple[NodeType, Tensor], Optional[tuple[NodeType, Tensor]]

Not to add complexity unfortunately to this already complex object, but for multiple supervision edge types, we may need to make it list[tuple[NodeType, Tensor]] for the positive and negatives :')

Alternatively, another option could be to expose positive_nodes, negative_nodes, or label_nodes as direct arguments. It seems like it's a tradeoff between

  1. adhering to GLT by only having one input_nodes but having a fairly complex structure to provide as input :')
  2. having multiple arguments but now we've evolved the API beyond GLT/PyG and further added complexity to the API

I need to think more about which approach I prefer but these are my thoughts for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants