YouTube Live Chat Sentiment Analysis
A real-time NLP pipeline that pulls YouTube live chat data via the YouTube API and classifies message sentiment with ML models. Tracks audience emotional tone over stream duration and visualizes engagement patterns — useful for streamers and content analysts.
Problem
YouTube live streams generate thousands of chat messages per hour. Content creators have no way to gauge audience sentiment in real time or review emotional trends after a stream ends — all they see is raw chat.
Solution
YouTube Data API v3 fetches live chat messages in real time. NLTK preprocessing cleans and tokenizes text. Scikit-learn classification models (trained on labeled sentiment data) predict positive, negative, or neutral sentiment per message. Time-series charts track sentiment trends across stream duration.
Architecture
YouTube Data API v3 polls live chat at configurable intervals, fetching message batches with author and timestamp metadata.
NLTK pipeline: tokenization, stopword removal, lemmatization, and TF-IDF vectorization for feature extraction.
Scikit-learn models (logistic regression, Naive Bayes, SVM) classify each message as positive, negative, or neutral. Best model selected by cross-validated F1 score.
Matplotlib time-series plots track sentiment ratio over stream duration. Aggregate charts show total sentiment distribution and peak engagement windows.
Highlights
- Real-time YouTube API ingestion with configurable polling intervals.
- NLTK NLP pipeline: tokenization, stopwords, lemmatization, TF-IDF.
- Multiple Scikit-learn classifiers compared by cross-validated F1 score.
- Time-series sentiment tracking across full stream duration.
- GitHub: DarakhTech/yt-chat-analysis.