Emotion-Conditioned Image Captioning For Visual Artworks Using Affective Visual Encoders

Source Code Ready
IEEE Standard
Screenshot of Emotion-Conditioned Image Captioning For Visual Artworks Using Affective Visual Encoders

Technologies Used

Deep Learning
Natural Language Processing(NLP)
RetinaFace
DeepFace
Machine Learning
Project Price

₹Price on Request

  • • Includes all taxes
  • • Instant estimate based on inputs
  • • Transparent pricing structure

Project Overview & Abstract

This project "Image captioning System Using LSTM and VGG16 " is designed for exploring and enhancing image captions. The application loads an image from a local dataset (like Flickr5k) and retrieves its pre-existing, human-annotated caption from a corresponding text file. The core feature is its optional, on-the-fly "emotional enhancement." If the retinaface and deepface libraries are installed, the user can click a button to perform facial detection and extract the dominant emotion from the image. This detected emotion (e.g., "happy") is then intelligently inserted into the original caption using spaCy's Natural Language Processing. The system identifies the grammatical subject of the sentence to create a context-aware caption, such as "a happy man" instead of just "a man." The application is built to be robust, with graceful fallbacks that allow it to function even if the optional AI libraries are not installed. It also includes matplotlib visualizations for analyzing the dataset's word frequency and displaCy to visualize the grammatical structure of the enhanced captions. 🚀 Key Features & Technologies GUI: Built with Tkinter and ttk widgets for a modern, responsive, and styled interface. Core Libraries: Uses Pillow (PIL) for image display and OpenCV for backend image processing. Data Retrieval: Loads pre-computed captions from a local captions.txt file (Flickr5k dataset). Face Detection: Integrates RetinaFace, a deep learning model, to find faces in the image. Emotion Recognition: Uses DeepFace to analyze detected faces and determine the dominant emotion. NLP Enhancement: Employs spaCy to parse the caption's grammar and intelligently insert the emotion. Robustness: Features graceful fallbacks, allowing the app to run even if the optional AI libraries (spaCy, DeepFace, retinaface) are not installed. Visualizations: Uses Matplotlib for dataset statistics (word frequency, file sizes) and spaCy's displaCy to visualize the grammatical structure of the enhanced caption.