Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,7 @@ EXPOSE 10000
EXPOSE 10001
# Table Storage Port
EXPOSE 10002
# DFS (ADLS Gen2) Port
EXPOSE 10004

CMD ["azurite", "-l", "/data", "--blobHost", "0.0.0.0","--queueHost", "0.0.0.0", "--tableHost", "0.0.0.0"]
CMD ["azurite", "-l", "/data", "--blobHost", "0.0.0.0", "--dfsHost", "0.0.0.0", "--queueHost", "0.0.0.0", "--tableHost", "0.0.0.0"]
4 changes: 3 additions & 1 deletion Dockerfile.Windows
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,11 @@ EXPOSE 10000
EXPOSE 10001
# Table Storage Port
EXPOSE 10002
# DFS (ADLS Gen2) Port
EXPOSE 10004

ENTRYPOINT "cmd.exe /S /C"

WORKDIR C:\\Node\\node-v22.12.0-win-x64\\

CMD azurite -l c:/data --blobHost 0.0.0.0 --queueHost 0.0.0.0 --tableHost 0.0.0.0
CMD azurite -l c:/data --blobHost 0.0.0.0 --dfsHost 0.0.0.0 --queueHost 0.0.0.0 --tableHost 0.0.0.0
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,8 @@ Following extension configurations are supported:

- `azurite.blobHost` Blob service listening endpoint, by default 127.0.0.1
- `azurite.blobPort` Blob service listening port, by default 10000
- `azurite.dfsHost` DFS service listening endpoint, by default 127.0.0.1
- `azurite.dfsPort` DFS service listening port, by default 10004
- `azurite.blobKeepAliveTimeout` Blob service keep alive timeout in seconds, by default 5
- `azurite.queueHost` Queue service listening endpoint, by default 127.0.0.1
- `azurite.queuePort` Queue service listening port, by default 10001
Expand Down Expand Up @@ -214,17 +216,18 @@ Following extension configurations are supported:
> Note. Find more docker images tags in <https://mcr.microsoft.com/v2/azure-storage/azurite/tags/list>

```bash
docker run -p 10000:10000 -p 10001:10001 -p 10002:10002 mcr.microsoft.com/azure-storage/azurite
docker run -p 10000:10000 -p 10004:10004 -p 10001:10001 -p 10002:10002 mcr.microsoft.com/azure-storage/azurite
```

`-p 10000:10000` will expose blob service's default listening port.
`-p 10004:10004` will expose dfs service's default listening port.
`-p 10001:10001` will expose queue service's default listening port.
`-p 10002:10002` will expose table service's default listening port.

Or just run blob service:

```bash
docker run -p 10000:10000 mcr.microsoft.com/azure-storage/azurite azurite-blob --blobHost 0.0.0.0
docker run -p 10000:10000 -p 10004:10004 mcr.microsoft.com/azure-storage/azurite azurite-blob --blobHost 0.0.0.0 --dfsHost 0.0.0.0
```

#### Run Azurite V3 docker image with customized persisted data location
Expand Down Expand Up @@ -317,6 +320,7 @@ You can customize the listening address per your requirements.

```cmd
--blobHost 127.0.0.1
--dfsHost 127.0.0.1
--queueHost 127.0.0.1
--tableHost 127.0.0.1
```
Expand All @@ -325,13 +329,14 @@ You can customize the listening address per your requirements.

```cmd
--blobHost 0.0.0.0
--dfsHost 0.0.0.0
--queueHost 0.0.0.0
--tableHost 0.0.0.0
```

### Listening Port Configuration

Optional. By default, Azurite V3 will listen to 10000 as blob service port, and 10001 as queue service port, and 10002 as the table service port.
Optional. By default, Azurite V3 will listen to 10000 as blob service port, 10004 as dfs service port, 10001 as queue service port, and 10002 as the table service port.
You can customize the listening port per your requirements.

> Warning: After using a customized port, you need to update connection string or configurations correspondingly in your Storage Tools or SDKs.
Expand All @@ -341,6 +346,7 @@ You can customize the listening port per your requirements.

```cmd
--blobPort 8888
--dfsPort 8889
--queuePort 9999
--tablePort 11111
```
Expand All @@ -349,6 +355,7 @@ You can customize the listening port per your requirements.

```cmd
--blobPort 0
--dfsPort 0
--queuePort 0
--tablePort 0
```
Expand Down
187 changes: 187 additions & 0 deletions docs/designs/ADLS-gen2-parity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# ADLS Gen2 Parity Implementation Plan

## Context

Azurite currently has a **thin DFS proxy layer** (port 10004) that translates a small subset of ADLS Gen2 DFS REST API calls to Blob REST API calls via HTTP proxying (axios). This covers only filesystem (container) create/delete/HEAD and account listing. Full ADLS Gen2 parity requires native support for path (file/directory) operations, the append-then-flush write pattern, rename/move, ACLs, and list paths — none of which can be achieved by simple query-parameter rewriting.

## Architectural Decision: Hybrid (Native DFS Handlers + Shared Stores)

Replace the HTTP proxy with a **native Express pipeline** in the DFS server that directly accesses `IBlobMetadataStore` and `IExtentStore` — the same store instances used by the blob server.

```
Port 10000 (Blob API) → Blob Handlers → IBlobMetadataStore + IExtentStore
Port 10004 (DFS API) → DFS Handlers → same IBlobMetadataStore + IExtentStore
```

**Why not keep proxying?** DFS operations like List Paths, Create Directory, Rename, ACLs, and append-then-flush have no single blob API equivalent. Proxying would require multi-call orchestration, lose atomicity, and add latency.

### Directory Model

Directories stored as **zero-length BlockBlobs with `hdi_isfolder=true` metadata** — matching Azure's real internal behavior. No separate table needed.

### ACL Storage

New fields on `BlobModel`: `dfsAclOwner`, `dfsAclGroup`, `dfsAclPermissions`, `dfsAcl`. LokiJS is schemaless (just add fields); SQL needs ALTER TABLE.

---

## Phase 0: Foundation — Shared Store Access & HNS Flag

**Goal:** Wire DFS server to share stores with blob server; enable HNS mode.

| File | Change |
|------|--------|
| `src/blob/utils/constants.ts` | Set `EMULATOR_ACCOUNT_ISHIERARCHICALNAMESPACEENABLED = true` (or make configurable) |
| `src/blob/DfsProxyServer.ts` → rename to `DfsServer.ts` | Accept `IBlobMetadataStore` + `IExtentStore` in constructor |
| `src/blob/DfsProxyConfiguration.ts` → rename to `DfsConfiguration.ts` | Remove upstream host/port fields (no longer proxying) |
| `src/blob/BlobServer.ts` | Expose `metadataStore` and `extentStore` via public getters |
| `src/azurite.ts` | Pass shared stores to both BlobServer and DfsServer |
| `src/blob/main.ts` | Same wiring for standalone blob+dfs mode |
| `src/blob/DfsRequestListenerFactory.ts` | Rewrite: replace axios proxy with native Express pipeline + DFS routing |
| `src/blob/IBlobEnvironment.ts`, `BlobEnvironment.ts`, `src/common/Environment.ts`, `VSCEnvironment.ts` | Add `--enableHierarchicalNamespace` option |

**Deliverable:** DFS server starts, shares data with blob, existing filesystem tests pass via direct store access.

---

## Phase 1: Path CRUD + List Paths

**Goal:** Create/delete/read files and directories, list paths — the core operations most ADLS Gen2 SDKs depend on.

### New files to create

| File | Purpose |
|------|---------|
| `src/blob/dfs/DfsContext.ts` | DFS request context (account, filesystem, path) — analogous to `BlobStorageContext` |
| `src/blob/dfs/DfsOperation.ts` | Enum of DFS operations for dispatch |
| `src/blob/dfs/DfsDispatchMiddleware.ts` | Routes requests by `resource` param, `action` param, method, and headers |
| `src/blob/dfs/DfsErrorFactory.ts` | JSON error responses (`PathNotFound`, `DirectoryNotEmpty`, etc.) |
| `src/blob/dfs/DfsSerializer.ts` | JSON response serialization (DFS uses JSON, not XML) |
| `src/blob/dfs/handlers/FilesystemHandler.ts` | Filesystem ops → container store operations |
| `src/blob/dfs/handlers/PathHandler.ts` | Path create/delete/read/getProperties + listPaths |

### Operations implemented

- **Create Path** (`PUT ?resource=file|directory`): Creates zero-length BlockBlob; directories get `hdi_isfolder=true` metadata; auto-creates intermediate directories
- **Delete Path** (`DELETE`): Files → `deleteBlob()`; directories with `recursive=true` → delete all blobs with prefix; `recursive=false` → 409 if non-empty
- **Get Path Properties** (`HEAD`): Returns `x-ms-resource-type: file|directory` header
- **Read Path** (`GET`): Streams file content via `downloadBlob()` (follows `BlobHandler.download()` pattern)
- **List Paths** (`GET ?resource=filesystem&directory=...&recursive=true|false`): JSON response with `paths` array; uses `listBlobs()` with prefix/delimiter; supports continuation via `x-ms-continuation`

### Existing files modified

| File | Change |
|------|--------|
| `src/blob/persistence/IBlobMetadataStore.ts` | Add `dfsResourceType`, ACL fields to `BlobModel` / `IBlobAdditionalProperties` |
| `src/blob/persistence/LokiBlobMetadataStore.ts` | No schema changes needed (schemaless) |
| `src/blob/persistence/SqlBlobMetadataStore.ts` | Add columns: `dfsResourceType`, `dfsAclOwner`, `dfsAclGroup`, `dfsAclPermissions`, `dfsAcl` |

### Tests

Extend `tests/blob/dfsProxy.test.ts`:
- Create file / directory, verify as blob
- Delete file / empty dir / non-empty dir with recursive
- Get properties with `x-ms-resource-type`
- Read file content
- List paths recursive and non-recursive
- Cross-API: create via DFS → read via Blob API and vice versa

---

## Phase 2: Append-Flush Write Pattern

**Goal:** Implement the DFS file write model (create empty → append chunks → flush to commit).

### Key insight

DFS append-then-flush maps directly to existing **BlockBlob uncommitted blocks** infrastructure: each `action=append` becomes a `stageBlock()`, and `action=flush` becomes `commitBlockList()`. No new persistence methods needed.

### Changes to `src/blob/dfs/handlers/PathHandler.ts`

- **`updatePath_Append(position, body)`**: Write body to `IExtentStore` as extent chunk; record as uncommitted block via `metadataStore.stageBlock()`; validate `position` matches current append offset; return 202
- **`updatePath_Flush(position, close)`**: Commit all staged blocks via `metadataStore.commitBlockList()`; update content length to `position`; return 200 with updated ETag

### Tests

- Create → append 3 chunks → flush → read back, verify content
- Append with wrong position → 400
- Large file (multi-MB) append

---

## Phase 3: Rename/Move Path

**Goal:** Atomic rename for files and directories.

### New persistence methods

| Method | Description |
|--------|-------------|
| `IBlobMetadataStore.renameBlob(src, dest)` | Atomic rename of single blob (metadata-only, no extent copy) |
| `IBlobMetadataStore.renameBlobsByPrefix(srcPrefix, destPrefix)` | Atomic rename of all blobs matching prefix (for directory rename) |

### PathHandler addition

- **`renamePath(x-ms-rename-source)`**: Parse source header → for files: `renameBlob()`; for directories: `renameBlobsByPrefix()`. Supports cross-filesystem rename and conditional headers.

### Persistence implementations

- **LokiJS**: Update document `containerName` and `name` properties
- **SQL**: `UPDATE ... SET name = REPLACE(name, oldPrefix, newPrefix) WHERE name LIKE 'prefix%'` in transaction

### Tests

- Rename file within filesystem / across filesystems
- Rename directory (verify children moved)
- Rename non-existent → 404
- Rename with conditional headers

---

## Phase 4: ACL Operations

**Goal:** POSIX ACL get/set for emulator parity.

### PathHandler additions

- **`getAccessControl()`**: Read ACL fields from blob record → return as `x-ms-owner`, `x-ms-group`, `x-ms-permissions`, `x-ms-acl` headers. Defaults: `$superuser`/`$superuser`/`rwxr-x---`
- **`setAccessControl(owner, group, permissions, acl)`**: Validate ACL format → update blob record
- **`setAccessControlRecursive(mode, acl)`**: `mode` = set|modify|remove; iterate blobs under prefix; support continuation; return JSON with `directoriesSuccessful`, `filesSuccessful`, `failureCount`

### Tests

- Set/get ACL on file and directory
- Recursive ACL set on directory tree
- Default ACL values on new paths

---

## Phase 5: Polish & Remaining Operations

- **Set Filesystem Properties** (`PATCH ?resource=filesystem`) → `setContainerMetadata()`
- **`x-ms-properties` encoding/decoding** — new `src/blob/dfs/DfsPropertyEncoding.ts` utility (base64 key=value pairs)
- **DFS JSON error format**: `{"error":{"code":"...","message":"..."}}`
- **Lease support** on DFS paths (reuse blob lease infrastructure)
- **SAS validation** on DFS endpoints (reuse existing authenticators)
- **Content-MD5/CRC64 validation** on append

---

## Verification Plan

1. **Unit tests**: Extend `tests/blob/dfsProxy.test.ts` per phase
2. **Cross-API tests**: Verify DFS-created data is visible via Blob API and vice versa
3. **SDK integration**: Test with `@azure/storage-file-datalake` Node.js SDK against the emulator
4. **Manual smoke test**: Run Azurite, use Azure Storage Explorer with DFS endpoint
5. **Existing blob tests**: Ensure `npm test` still passes (no regression)

---

## Critical Reference Files

- `src/blob/handlers/ContainerHandler.ts` — pattern for handler ↔ store interaction
- `src/blob/handlers/BlockBlobHandler.ts` — `stageBlock`/`commitBlockList` for append-flush reuse
- `src/blob/handlers/BlobHandler.ts` — `download()` pattern for Read Path
- `src/blob/persistence/IBlobMetadataStore.ts` — store interface to extend
- `src/blob/generated/handlers/` — handler interface patterns
- `src/blob/middlewares/blobStorageContext.middleware.ts` — context extraction pattern for DfsContext
Loading
Loading