← All projects
SIH 2024 WinnerMLAI/ML

SHAKTI

Vision-Language RAG System for Military Intelligence

SHAKTI - Vision-Language RAG System for Military Intelligence

Problem

Defense intelligence relies on dense, multi-page mixed-format documents - scans, maps, captioned imagery - that traditional OCR + keyword search misses. Analysts under time pressure need a system that retrieves semantically across both text and imagery, not just exact-match keywords.

Approach

Vision-Language RAG. A custom OCR pipeline (79% accuracy on military document samples) extracts text from degraded scans; a VL backbone produces multimodal embeddings indexed in a vector store. Retrieval is cross-modal - a text query can surface captioned imagery, a snippet can surface the source document. FastAPI service layer with swappable LLM backends via LangChain.

At a glance

OCR accuracy

79%

Modality

Vision + Language

Recognition

SIH 2024 Winner

Domain

Defense / Military

Tech decisions

  • Custom OCR over Tesseract

    Military fonts and degraded scans dropped standard pipelines below usable threshold.

  • Vision-Language embeddings

    A single index serves both text and image queries - no separate image search to maintain.

  • RAG over fine-tuning

    Analyst queries are open-ended and document corpora rotate; retrieval beats memorization.

  • LangChain abstraction

    Easy swap of LLM provider as the program of record evolves.

Stack

PyTorchRAGOCRFastAPITransformersLangChain
Live demoGitHub