add support for base64 embedded images by blamster19 · Pull Request #56 · bauripalash/tkhtmlview

blamster19 · 2025-09-30T15:54:14Z

The HTML tag supports images embedded directly into src attribute using base64 encoding, see for example this question on SO. I've added that trivial functionality to the parser since I needed it in my own project using tkhtmlview.

Summary by Sourcery

New Features:

Support decoding and rendering of base64-encoded JPEG, PNG, and GIF images in tags

The HTML <img> tag supports images embedded directly into HTML body using base64 encoding. This commit adds support for MIME types Image/jpeg, Image/png and Image/gif.

sourcery-ai · 2025-09-30T15:54:21Z

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds support for base64-encoded images (JPEG, PNG, GIF) in the HTML parser by detecting, decoding, loading, and caching embedded image data before existing cache and file lookups.

Sequence diagram for handling base64-encoded images in HTML parser

sequenceDiagram
    participant Parser
    participant ImageLib as "Image.open()"
    participant Base64 as "base64.b64decode()"
    participant Cache as "cached_images"
    Parser->>Parser: handle_starttag(<img>, attrs)
    alt src starts with data:image/jpeg/png/gif;base64,
        Parser->>Base64: decode base64 from src
        Base64-->>Parser: decoded bytes
        Parser->>ImageLib: open(BytesIO(decoded bytes))
        ImageLib-->>Parser: image object
        Parser->>Cache: cache image with src as key
    else src in cached_images
        Parser->>Cache: retrieve cached image
    else src is file path
        Parser->>ImageLib: open(src file)
        ImageLib-->>Parser: image object
        Parser->>Cache: cache image
    end

Class diagram for updated image handling in HTML parser

classDiagram
    class HTMLParser {
        +cached_images: dict
        handle_starttag(tag, attrs)
    }
    class Image
    class base64
    class BytesIO
    HTMLParser --> Image : uses
    HTMLParser --> base64 : uses
    HTMLParser --> BytesIO : uses
    HTMLParser : +handle_starttag() now supports base64 src
    HTMLParser : +caches images from base64 src

File-Level Changes

Change	Details	Files
Support base64-encoded images in src attributes	Detect data:image/jpeg;base64 prefix, decode payload, open with PIL, and cache image Detect data:image/png;base64 and data:image/gif;base64 prefixes, decode payload, open with PIL, and cache image Insert new handling logic before existing cache/file checks Wrap decoding and image loading in try/except to avoid failures	`tkhtmlview/html_parser.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consider unifying the duplicate base64‐decode blocks by dynamically splitting on the first comma rather than hardcoding offsets for each mime type.
Avoid using bare excepts—catch specific exceptions (e.g. binascii.Error, PIL.UnidentifiedImageError) and consider logging a warning when decoding fails.
To make this more extensible, parse the MIME type from the data URL (everything before ‘;base64’) instead of hardcoding jpeg/png/gif support.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider unifying the duplicate base64‐decode blocks by dynamically splitting on the first comma rather than hardcoding offsets for each mime type.
- Avoid using bare excepts—catch specific exceptions (e.g. binascii.Error, PIL.UnidentifiedImageError) and consider logging a warning when decoding fails.
- To make this more extensible, parse the MIME type from the data URL (everything before ‘;base64’) instead of hardcoding jpeg/png/gif support.

## Individual Comments

### Comment 1
<location> `tkhtmlview/html_parser.py:547-556` </location>
<code_context>
                     except:
                         pass

+            if attrs[HTML.Attrs.SRC].startswith(("data:image/jpeg;base64,")):
+                try:
+                    image = Image.open(
+                        BytesIO(base64.b64decode(attrs[HTML.Attrs.SRC][23:].encode("utf-8")))
+                    )
+                    self.cached_images[attrs[HTML.Attrs.SRC]] = deepcopy(image)
+                except:
+                    pass
+
+            if attrs[HTML.Attrs.SRC].startswith(("data:image/png;base64,", "data:image/gif;base64,")):
+                try:
+                    image = Image.open(
</code_context>

<issue_to_address>
**suggestion:** The code for handling base64 image decoding is duplicated for each image type.

Refactor the base64 decoding logic into a shared helper to simplify maintenance and future extensions.

Suggested implementation:

```python
from collections import OrderedDict
import requests
from io import BytesIO
import base64
from PIL import Image
from copy import deepcopy

def decode_base64_image(src: str) -> "Image.Image|None":
    """
    Decodes a base64-encoded image from a data URI.
    Supports JPEG, PNG, and GIF formats.
    Returns a PIL Image or None if decoding fails.
    """
    prefixes = {
        "data:image/jpeg;base64,": 23,
        "data:image/png;base64,": 22,
        "data:image/gif;base64,": 22,
    }
    for prefix, offset in prefixes.items():
        if src.startswith(prefix):
            try:
                image_data = base64.b64decode(src[offset:].encode("utf-8"))
                return Image.open(BytesIO(image_data))
            except Exception:
                return None
    return None

```

```python
            image = decode_base64_image(attrs[HTML.Attrs.SRC])
            if image is not None:
                self.cached_images[attrs[HTML.Attrs.SRC]] = deepcopy(image)

```
</issue_to_address>

### Comment 2
<location> `tkhtmlview/html_parser.py:544` </location>
<code_context>
                     except:
                         pass

</code_context>

<issue_to_address>
**issue (bug_risk):** Bare except statements are used, which can hide unexpected errors.

Catching all exceptions makes it harder to identify and address real issues. Please catch only the relevant exceptions to improve error handling.
</issue_to_address>

### Comment 3
<location> `tkhtmlview/html_parser.py:550` </location>
<code_context>
+            if attrs[HTML.Attrs.SRC].startswith(("data:image/jpeg;base64,")):
+                try:
+                    image = Image.open(
+                        BytesIO(base64.b64decode(attrs[HTML.Attrs.SRC][23:].encode("utf-8")))
+                    )
+                    self.cached_images[attrs[HTML.Attrs.SRC]] = deepcopy(image)
</code_context>

<issue_to_address>
**suggestion:** The code uses hardcoded offsets for slicing the base64 string.

Hardcoded offsets are brittle and may fail if the prefix changes or new formats are introduced. Use string splitting to extract the base64 data for better reliability.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-09-30T15:54:53Z

+            if attrs[HTML.Attrs.SRC].startswith(("data:image/jpeg;base64,")):
+                try:
+                    image = Image.open(
+                        BytesIO(base64.b64decode(attrs[HTML.Attrs.SRC][23:].encode("utf-8")))
+                    )
+                    self.cached_images[attrs[HTML.Attrs.SRC]] = deepcopy(image)
+                except:
+                    pass
+
+            if attrs[HTML.Attrs.SRC].startswith(("data:image/png;base64,", "data:image/gif;base64,")):


suggestion: The code for handling base64 image decoding is duplicated for each image type.

Refactor the base64 decoding logic into a shared helper to simplify maintenance and future extensions.

Suggested implementation:

from collections import OrderedDict import requests from io import BytesIO import base64 from PIL import Image from copy import deepcopy def decode_base64_image(src: str) -> "Image.Image|None": """ Decodes a base64-encoded image from a data URI. Supports JPEG, PNG, and GIF formats. Returns a PIL Image or None if decoding fails. """ prefixes = { "data:image/jpeg;base64,": 23, "data:image/png;base64,": 22, "data:image/gif;base64,": 22, } for prefix, offset in prefixes.items(): if src.startswith(prefix): try: image_data = base64.b64decode(src[offset:].encode("utf-8")) return Image.open(BytesIO(image_data)) except Exception: return None return None

image = decode_base64_image(attrs[HTML.Attrs.SRC]) if image is not None: self.cached_images[attrs[HTML.Attrs.SRC]] = deepcopy(image)

sourcery-ai · 2025-09-30T15:54:53Z

@@ -543,6 +544,24 @@ def handle_starttag(self, tag, attrs):
                    except:


issue (bug_risk): Bare except statements are used, which can hide unexpected errors.

Catching all exceptions makes it harder to identify and address real issues. Please catch only the relevant exceptions to improve error handling.

sourcery-ai · 2025-09-30T15:54:53Z

+            if attrs[HTML.Attrs.SRC].startswith(("data:image/jpeg;base64,")):
+                try:
+                    image = Image.open(
+                        BytesIO(base64.b64decode(attrs[HTML.Attrs.SRC][23:].encode("utf-8")))


suggestion: The code uses hardcoded offsets for slicing the base64 string.

Hardcoded offsets are brittle and may fail if the prefix changes or new formats are introduced. Use string splitting to extract the base64 data for better reliability.

add support for base64 embedded images

0f232f5

The HTML <img> tag supports images embedded directly into HTML body using base64 encoding. This commit adds support for MIME types Image/jpeg, Image/png and Image/gif.

sourcery-ai bot reviewed Sep 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add support for base64 embedded images#56

add support for base64 embedded images#56
blamster19 wants to merge 1 commit intobauripalash:mainfrom
blamster19:patch-base64

blamster19 commented Sep 30, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Sep 30, 2025 •

edited

Loading

Reviewer's Guide

Sequence diagram for handling base64-encoded images in HTML parser

Class diagram for updated image handling in HTML parser

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Sep 30, 2025

Uh oh!

sourcery-ai bot Sep 30, 2025

Uh oh!

sourcery-ai bot Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -543,6 +544,24 @@ def handle_starttag(self, tag, attrs):
		except:

Uh oh!

Conversation

blamster19 commented Sep 30, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for handling base64-encoded images in HTML parser

Class diagram for updated image handling in HTML parser

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

blamster19 commented Sep 30, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Sep 30, 2025 •

edited

Loading