-
Notifications
You must be signed in to change notification settings - Fork 0
Merge : 자동완성 개선. 크롤링 API 수정 #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
66e6b6d
fix : repo/query 제거
GulSam00 8bd72f8
chore : crawlYoutubeTemp, crawlYoutubeUbuntu 임시 파일 및 관련 스크립트 제거
GulSam00 35bd241
feat : isValidKYExistNumber AI 기반 곡 검증으로 교체 및 validateSongMatch 유틸 추가
GulSam00 ef80587
refactor : crawling log 시스템 정리 및 replaceSupabaseFailed 버그 수정
GulSam00 928d98b
doc : packages/crawling CLAUDE.md 추가
GulSam00 b654aa8
refactor : crawlYoutubeVerify 파일 체크포인트를 verify_ky_songs DB 기반으로 전환
GulSam00 f183afb
refactor : logData 파일 체크포인트 사용 주석처리
GulSam00 2a8f4b4
Merge pull request #158 from GulSam00/refactor/crawling
GulSam00 5959ace
fix : useSearchSong 자동완성 label 정확 일치 시에만 alias 치환
GulSam00 63d3bce
chore : 아티스트 별칭 및 한일 매핑 데이터 업데이트
GulSam00 9f91863
chore : .gitignore Claude Code 로컬 설정 파일 제외 추가
GulSam00 a72ccba
Merge pull request #159 from GulSam00/fix/autoComplete
GulSam00 8bfda37
fix : yml 액션 파일 환경 변수 받게 수정
GulSam00 8ed06fd
Merge pull request #160 from GulSam00/fix/env
GulSam00 9aa7634
chore : ky-valid → ky-verify 스크립트 전환
GulSam00 6d05c1e
chore : ky-verify 수동 실행 GitHub Actions 워크플로우 추가
GulSam00 bdeef97
Merge pull request #161 from GulSam00/chore/kyVerifyAction
GulSam00 50b9252
fix : crawlYoutubeVerify 우분투 환경 Puppeteer 샌드박스 옵션 추가
GulSam00 d59423a
Merge pull request #162 from GulSam00/chore/kyVerifyAction
GulSam00 1b50f6f
fix : 레이아웃 간격 조정
GulSam00 f927cc4
Merge pull request #163 from GulSam00/fix/layout
GulSam00 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| name: Verify ky by Youtube | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: write # push 권한을 위해 필요 | ||
|
|
||
| jobs: | ||
| run-npm-task: | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: Checkout branch | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Use Node.js 20 | ||
| uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "20" | ||
|
|
||
| - name: Install pnpm | ||
| uses: pnpm/action-setup@v2 | ||
| with: | ||
| version: 9 | ||
| run_install: false | ||
|
|
||
| - name: Install dependencies | ||
| working-directory: packages/crawling | ||
| run: pnpm install | ||
|
|
||
| - name: Create .env file | ||
| working-directory: packages/crawling | ||
| run: | | ||
| echo "SUPABASE_URL=${{ secrets.SUPABASE_URL }}" >> .env | ||
| echo "SUPABASE_KEY=${{ secrets.SUPABASE_KEY }}" >> .env | ||
| echo "OPENAI_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> .env | ||
|
|
||
| - name: run verify script - packages/crawling | ||
| working-directory: packages/crawling | ||
| run: pnpm run ky-verify |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,113 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Overview | ||
|
|
||
| 일회성 데이터 수집/처리 스크립트 모음. 빌드 결과물을 배포하지 않으며, `tsx`로 스크립트를 직접 실행한다. | ||
|
|
||
| ## Commands | ||
|
|
||
| ```bash | ||
| pnpm ky-open # Open API(금영)로 KY 번호 수집 | ||
| pnpm ky-youtube # YouTube 크롤링으로 KY 번호 수집 + AI 검증 | ||
| pnpm ky-verify # 기존 KY 번호의 실제 존재 여부 재검증 (체크포인트 지원) | ||
| pnpm ky-update # ky-youtube + ky-verify 병렬 실행 | ||
| pnpm trans # 일본어 아티스트명 → 한국어 번역 후 DB 저장 | ||
| pnpm test # vitest 실행 | ||
| pnpm lint # ESLint | ||
| ``` | ||
|
|
||
| 스크립트는 반드시 **`packages/crawling/`** 디렉토리에서 실행해야 한다. 로그 파일 및 assets 경로가 상대 경로 기준이기 때문. | ||
|
|
||
| ## Environment Variables | ||
|
|
||
| `.env` 파일 필요 (루트가 아닌 `packages/crawling/`에 위치): | ||
|
|
||
| ``` | ||
| SUPABASE_URL= | ||
| SUPABASE_KEY= | ||
| OPENAI_API_KEY= | ||
| ``` | ||
|
|
||
| ## Architecture | ||
|
|
||
| ### 데이터 흐름 | ||
|
|
||
| 모든 스크립트는 **Supabase `songs` 테이블**을 중심으로 동작한다. | ||
|
|
||
| ``` | ||
| [songs 테이블] | ||
| title, artist, num_tj(TJ번호), num_ky(KY번호) | ||
|
|
||
| 주요 목표: num_ky가 null인 곡에 KY 번호를 채우는 것 | ||
| ``` | ||
|
|
||
| **KY 번호 수집 (메인 파이프라인)** | ||
|
|
||
| ``` | ||
| crawlYoutube.ts | ||
| └─ getSongsKyNullDB() # num_ky가 null인 곡 조회 | ||
| └─ YouTube @KARAOKEKY 채널 검색 # puppeteer + cheerio로 번호 스크래핑 | ||
| └─ isValidKYExistNumber() # kysing.kr에서 번호 실존 여부 확인 | ||
| └─ validateSongMatch() # OpenAI gpt-4o-mini로 제목/아티스트 일치 판단 | ||
| └─ updateSongsKyDB() # 성공 시 DB 업데이트 | ||
| └─ postInvalidKYSongsDB() # 실패 시 invalid_ky_songs 테이블에 기록 | ||
| ``` | ||
|
|
||
| **KY 번호 검증 (기존 데이터 재확인)** | ||
|
|
||
| ``` | ||
| crawlYoutubeVerify.ts | ||
| └─ getSongsKyNotNullDB() # num_ky가 있는 곡 조회 | ||
| └─ getVerifyKySongsDB() # 이미 검증된 ID 로드 (체크포인트) | ||
| └─ isValidKYExistNumber() # KY 사이트에서 실존 여부 재확인 | ||
| └─ 유효하면 postVerifyKySongsDB() # verify_ky_songs 테이블에 insert | ||
| └─ 유효하지 않으면 num_ky = null로 초기화 | ||
| ``` | ||
|
|
||
| **Open API 방식 (보조)** | ||
|
|
||
| ``` | ||
| findKYByOpen.ts | ||
| └─ @repo/open-api의 getSong()으로 금영 API 직접 조회 | ||
| └─ 제목 + 아티스트 문자열 비교로 KY 번호 매칭 | ||
| ``` | ||
|
|
||
| **일본어 번역** | ||
|
|
||
| ``` | ||
| postTransDictionary.ts | ||
| └─ getSongsJpnDB() # 일본어 포함된 곡 필터링 | ||
| └─ transChatGPT() # GPT-4-turbo로 아티스트명 번역 | ||
| └─ postTransDictionariesDB() # trans_dictionaries 테이블에 저장 | ||
| ``` | ||
|
|
||
| ### 핵심 패턴: 진행 상태 저장 (체크포인트) | ||
|
|
||
| 장시간 실행되는 스크립트가 중단됐을 때 재시작하면 처음부터 다시 하지 않도록, `src/assets/`에 텍스트 파일로 진행 상태를 기록한다. | ||
|
|
||
| | 파일 | 용도 | | ||
| | ----------------------------------------- | ---------------------------------- | | ||
| | `src/assets/transList.txt` | 이미 번역 시도한 일본어 아티스트명 | | ||
| | `src/assets/crawlKYValidList.txt` | 검증 완료된 (제목-아티스트) 쌍 | | ||
| | `src/assets/crawlKYYoutubeFailedList.txt` | YouTube 크롤링 실패 목록 | | ||
|
|
||
| `logData.ts`의 `save*` / `load*` 함수로 관리. 스크립트 시작 시 로드해 `Set`으로 변환 후 O(1) 검색으로 스킵 처리. | ||
|
|
||
| ### Path Alias | ||
|
|
||
| `@/` → `src/` (tsconfig의 paths 설정) | ||
|
|
||
| ### Supabase 테이블 | ||
|
|
||
| | 테이블 | 용도 | | ||
| | -------------------- | -------------------------------- | | ||
| | `songs` | 메인 곡 데이터 (TJ/KY 번호 포함) | | ||
| | `invalid_ky_songs` | KY 번호 수집 실패 목록 | | ||
| | `trans_dictionaries` | 일본어 → 한국어 번역 사전 | | ||
|
|
||
| ### AI 유틸 | ||
|
|
||
| - `utils/validateSongMatch.ts` — `gpt-4o-mini`로 두 (제목, 아티스트) 쌍이 같은 곡인지 판단. `temperature: 0`, `max_tokens: 20`, 완전 일치 시 API 호출 생략. | ||
| - `utils/transChatGPT.ts` — `gpt-4-turbo`로 일본어 → 한국어 번역. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Ky-update premature exit
🐞 Bug⛯ ReliabilityAgent Prompt
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools