Video Lecture Summarizer
A Python/Flask web application that takes recorded Google Meet lecture videos, transcribes the audio via SpeechRecognition, and generates concise bullet-point summaries using LexRank extractive summarization — helping students review long lectures efficiently.
Problem
Online lectures often run 60–90 minutes with no structured notes. Students either re-watch the full recording or miss key content. No tool existed that could automatically summarize lecture recordings into digestible takeaways.
Solution
Flask web interface accepts video uploads. PyDub extracts and preprocesses the audio. SpeechRecognition transcribes speech to text. NLTK tokenizes and cleans the transcript. LexRank extractive summarization identifies the most important sentences and outputs a structured summary.
Architecture
Flask web UI accepts Google Meet recording uploads (MP4). PyDub extracts audio track and splits into manageable chunks for processing.
SpeechRecognition library processes audio chunks and produces a raw text transcript. Handles background noise via energy threshold tuning.
NLTK pipeline tokenizes, removes stopwords, and cleans the transcript. LexRank graph-based extractive summarization ranks sentences by centrality.
Top-ranked sentences formatted into a structured bullet-point summary, displayed in the Flask UI and optionally downloadable as a text file.
Highlights
- End-to-end pipeline: video upload → audio extraction → transcription → summarization.
- PyDub audio preprocessing with chunk-based processing for long recordings.
- LexRank extractive summarization — graph-based sentence ranking for key content.
- Flask web interface with summary display and text export.