bug: df.count() returns 0 when Comet native scan is disabled and when Comet parquet conversion is enabled

### Describe the bug

The workaround for the issue https://github.com/apache/datafusion-comet/issues/4747 is to disable `spark.comet.scan.enabled` and enable `spark.comet.convert.parquet.enabled` so that all the other transformations can go through comet. This workaround was also mentioned here at https://github.com/apache/datafusion-comet/issues/4751

But using these settings if we read a parquet file from azure storage bucket and execute `df.count()` the result is zero, even though the parquet file contains records, which can be confirmed by running `df.show()`.

Attached a screenshot of the query plan that shows that the number of rows scanned is **1178**, but in the next stage where i assume the conversion happens the number of input rows is **0**.

<img width="1889" height="765" alt="Image" src="https://github.com/user-attachments/assets/b5926b0e-d174-4fc9-82d6-49320ad3fe5e" />


### Steps to reproduce

The steps to reproduce are the exact same as https://github.com/apache/datafusion-comet/issues/4747, except `conf spark.comet.scan.enabled` is set to `false` and `spark.comet.convert.parquet.enabled` is set to `true`, But the steps have been repeated here to maintain consistency

| Category  | Details |
| ------------- | ------------- |
| Spark  | 3.5.6 |
| Comet | 0.16.0 (https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.12/0.16.0/comet-spark-spark3.5_2.12-0.16.0.jar) |
| Cluster Manager | Kubernetes (Azure Kubernetes Service) |
| Storage | Azure cloud storage with necessary ABFS drivers in the spark driver and setted up to be authenticated using Workload Identity attached to the kubernetes service account |

- Deploy Spark cluster (3.5.6) with Comet 0.16.0 enabled on an Azure Kubernetes Service (AKS) cluster.
- Ensure that all ABFS related connectors and comet jar are present in the classpath.
- Configure Azure Workload Identity (which injects AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_FEDERATED_TOKEN_FILE env vars).
- In the configurations for the `spark-submit`, attach the following
```
     --conf spark.comet.enabled=true
     --conf spark.comet.exec.enabled=true
     --conf spark.comet.scan.enabled=false
     --conf spark.comet.convert.parquet.enabled=true
 
     --conf spark.comet.expression.regexp.allowIncompatible=true
     --conf spark.comet.expression.RegExpReplace.allowIncompatible=true
     --conf spark.comet.expression.RegExpExtract.allowIncompatible=true
     --conf spark.comet.caseConversion.enabled=true
 
     --conf spark.comet.exec.shuffle.enabled=true
     --conf spark.comet.exec.shuffle.mode=auto
     --conf spark.comet.batchSize=8192
     --conf spark.comet.memoryOverhead=$COMET_MEMORY_OVERHEAD
 
     --conf spark.memory.offHeap.enabled=true
     --conf spark.memory.offHeap.size=$OFFHEAP_SIZE
     --conf spark.comet.explainFallback.enabled=true
     --conf spark.comet.logFallbackReasons.enabled=true
 
     ## Azure related configurations
     --conf spark.hadoop.fs.azure.account.oauth2.msi.tenant="$TENANT_ID"
     --conf spark.hadoop.fs.azure.account.auth.type."$STORAGE_ACCOUNT".dfs.core.windows.net=OAuth
     --conf spark.hadoop.fs.azure.account.oauth2.client.id"$STORAGE_ACCOUNT".dfs.core.windows.net="$CLIENT_ID"
     --conf spark.hadoop.fs.azure.impl=org.apache.hadoop.fs.azure.NativeAzureFileSystem
     --conf spark.hadoop.fs.azure.account.oauth2.token.file."$STORAGE_ACCOUNT".dfs.core.windows.net=/var/run/secrets/azure/tokens/azure-identity-token
     --conf spark.hadoop.fs.azure.account.oauth.provider.type."$STORAGE_ACCOUNT".dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.WorkloadIdentityTokenProvider

    ## There are other configurations related to kubernetes and spark which are skipped for the sake of brevity
```
- Attempt to read a Parquet file stored in an Azure storage bucket as a spark dataframe using an `abfss://` URI.
- After reading the parquet file, execute `df.count()` to check if the result is 0.
- Execute `df.show()` to confirm that the parquet file contains valid data.

### Expected behavior

After reading the parquet file with the above `df.count()` should have returned the correct number of rows instead of 0.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: df.count() returns 0 when Comet native scan is disabled and when Comet parquet conversion is enabled #4793

Describe the bug

Steps to reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Category	Details
Spark	3.5.6
Comet	0.16.0 (https://repo1.maven.org/maven2/org/apache/datafusion/comet-spark-spark3.5_2.12/0.16.0/comet-spark-spark3.5_2.12-0.16.0.jar)
Cluster Manager	Kubernetes (Azure Kubernetes Service)
Storage	Azure cloud storage with necessary ABFS drivers in the spark driver and setted up to be authenticated using Workload Identity attached to the kubernetes service account

Uh oh!

bug: df.count() returns 0 when Comet native scan is disabled and when Comet parquet conversion is enabled #4793

Description

Describe the bug

Steps to reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions