A research-grade Python application designed to digitize, segment, and recognize ancient Tamil script from degraded stone inscriptions and palm-leaf manuscripts. This project leverages advanced image preprocessing techniques and a custom-trained Convolutional Neural Network (CNN) to achieve high-accuracy character recognition, complete with explainable AI (Grad-CAM) visualization.
model_stone.pth), trained specifically on ancient Tamil character shapes (44 classes), achieving ~96% validation accuracy.git clone https://github.com/Ricthi/Ai-heritage-.git
cd Ai-heritage-
pip install -r requirements.txt
(Ensure you have streamlit, torch, torchvision, opencv-python, numpy, pillow, pandas, and streamlit-drawable-canvas installed).
Launch the Streamlit dashboard:
cd tamil_heritage_ai/Model-Creation
streamlit run main_app.py
The app will open automatically in your browser at http://localhost:8501.
If you have new labeled datasets of ancient characters, you can easily retrain the CNN.
tamil_heritage_ai/Labels/.train and val sets:
python prepare_dataset.py
model_stone.pth:
python train_stone_cnn.py
model_stone.pth. Select it from the sidebar dropdown!tamil_heritage_ai/Labels/ — Raw dataset containing folders of labeled ancient characters.tamil_heritage_ai/Model-Creation/data/ — The split 70/30 training and validation datasets.tamil_heritage_ai/Model-Creation/train_stone_cnn.py — PyTorch training script for the CNN.tamil_heritage_ai/Model-Creation/prepare_dataset.py — Data aggregation and splitting script.tamil_heritage_ai/Model-Creation/main_app.py — The primary Streamlit frontend application.tamil_heritage_ai/Model-Creation/model_stone.pth — The compiled weights and label mappings for the neural network.Built to preserve and digitize the rich epigraphical history of the Tamil language.