Skip to content

Bug/py icu#60

Open
LJSigersmith wants to merge 17 commits intoscribe-org:mainfrom
LJSigersmith:bug/PyICU
Open

Bug/py icu#60
LJSigersmith wants to merge 17 commits intoscribe-org:mainfrom
LJSigersmith:bug/PyICU

Conversation

@LJSigersmith
Copy link
Copy Markdown

@LJSigersmith LJSigersmith commented Apr 25, 2026

Contributor checklist

  • This pull request is on a separate branch and not the main branch
  • I have ran the ./pre-commit executable as well as make lint and have fixed all reported issues

Description

Fix PyICU import error and add emoji_keywords data type to update script

This PR fixes a runtime ImportError that caused the update_data.sh script to crash during emoji keyword generation, and adds emoji_keywords as a processed data type.

Files Changed

update_data.sh:

  • Added a System Dependencies section that installs libicu-dev, pkg-config, g++, and python3-dev before the virtual environment is created. Installing system deps after the venv caused a runtime import failure even when pip reported a successful install.
  • Added pip install --force-reinstall --no-binary :all: PyICU after pip install -e . to ensure PyICU is always compiled from source against the locally installed ICU library rather than using a prebuilt wheel that may link against a different ICU version.
  • Added emoji_keywords to DATA_TYPES, ordered after nouns and verbs as emoji_keywords generation requires the scribe_data_json_export/<language>/ directory to exist, which is created when nouns are processed first.

.github/workflows/update_scribe_data.yml:

  • Updated Python version from 3.11 to 3.12, as Scribe-Data requires >=3.12.
  • Added a "Verify SQLite output" step that prints the tables and a sample emoji_keywords row after the script runs to confirm the output is valid before packaging.

Testing

Tested locally on a Raspberry Pi 4 running Ubuntu using a fork of Scribe-Data (LJSigersmith/Scribe-Data, branch fix/emoji-keywords-sqlite-generation) which includes fixes to emoji keyword SQLite generation. The full script ran successfully, generating and migrating emoji_keywords for English, with 3,393 emoji keywords written to the database. Also validated via GitHub Actions on the fork. The CI run confirmed the SQLite output step printed correct tables and a sample row.

Note: This PR depends on fixes in Scribe-Data being merged upstream.

Related issue

@github-actions
Copy link
Copy Markdown

Thank you for the pull request! 💙🩵

The Scribe-Server team will do our best to address your contribution as soon as we can. The following are some important points:

  • Those interested in developing their skills and expanding their role in the community should read the mentorship and growth section of the contribution guide
  • If you're not already a member of our public Matrix community, please consider joining!
    • We'd suggest that you use the Element client as well as Element X for a mobile app
    • Join the General and Data rooms once you're in
  • Also consider attending our bi-weekly Saturday developer syncs!
    • Details are shared in the General room on Matrix each Wednesday before the sync
    • It would be great to meet you 😊

Note

Scribe uses Conventional Comments in reviews to make sure that communication is as clear as possible.

@github-actions
Copy link
Copy Markdown

Maintainer Checklist

The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

  • Tests for changes have been written and the continuous integration (CI) workflows within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First PR Commit Check

  • The commit messages for the remote branch should be checked to make sure the contributor's email is set up correctly so that they receive credit for their contribution
  • The contributor's name and icon in remote commits should be the same as what appears in the PR
  • If there's a mismatch, the contributor needs to make sure that the email they use for GitHub matches what they have for git config user.email in their local Scribe-Server repo (can be set with git config --global user.email "GITHUB_EMAIL")

@DeleMike
Copy link
Copy Markdown
Collaborator

Hi @LJSigersmith , I have done my review. Please check.

Also @axif0, can you look at this too?

Comment thread update_data.sh Outdated
# MARK: System Dependencies

log "🔧 Installing system dependencies for PyICU..."
sudo apt-get install -y libicu-dev pkg-config g++ python3-dev || {
Copy link
Copy Markdown
Member

@axif0 axif0 Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Toolforge containers do not provide sudo or root privileges, so installing system packages at runtime is not supported.

Image

@LJSigersmith
Copy link
Copy Markdown
Author

@axif0 I've updated the PR to move the dependencies needed to build PyICU to the workflow script

@axif0
Copy link
Copy Markdown
Member

axif0 commented Apr 28, 2026

@LJSigersmith Actually it appears, that changes won't work.

image

@axif0
Copy link
Copy Markdown
Member

axif0 commented Apr 28, 2026

The build system uses pkg-config or icu-config to auto-detect ICU, but neither tool was installed on this Toolforge server.
we need to mannually tell the build system where ICU is via env vars like -

export ICU_VERSION=76
export PYICU_LFLAGS="-L/usr/lib/x86_64-linux-gnu -licui18n -licuuc -licudata"
export PYICU_CFLAGS="-I/usr/include"
pip install PyICU
image

@DeleMike can you please reverify the commands if it actually installs the PyICU ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ImportError: Could not import required PyICU functionality — PyICU installs successfully but fails at runtime during scribe-data get

3 participants