DISDIKBUD PASER
Framework Code Igniter 4
Frontend (User) • Browser (Bootstrap 5 UI)
↕
Backend (CI4 - PHP)
• Controller:
o Upload PDF → Parser → Chunking → Store ke DB
o Chat Query → buat embedding pertanyaan → similarity search
• Services:
o PDF Parser & Chunker
o Embedding API client
o LLM API client
o Enrichment
Database (MySQL/MariaDB)
• Tabel:
o documents (info PDF)
o chunks (potongan teks PDF)
o embeddings (vektor teks JSON/longtext)
o chat_history (multi-turn chat)
Layanan Pihak Ketiga
• HuggingFace Inference API → generate embeddings
• Groq API → generate jawaban
• Wikipedia REST API → enrichment tambahan
───────────────────────────────
🔄 Alur Data
1. Upload PDF → CI4 parser → potong jadi chunk → embedding (HuggingFace) → simpan ke DB.
2. User bertanya → buat embedding pertanyaan → similarity search (cosine similarity manual).
3. Ambil top-N context chunks → compose prompt → kirim ke Groq API.
4. Jika skor similarity rendah → panggil Wikipedia API → gabungkan ke prompt.
5. Jawaban ditampilkan ke user (Bootstrap).