🏠 Home
MarketPulse: Sentiment-Driven Market Movement Predictor

MarketPulse: Sentiment-Driven Market Movement Predictor

Harnessing Transformer Models for Sentiment Analysis and Market Prediction

Introduction

MarketPulse is a sophisticated tool for predicting market movements based on sentiment analysis. This project integrates data collection from social media, news, and stock data with transformer-based models, such as FinBERT, to analyze market sentiment. The ultimate goal is to forecast stock price movements using both sentiment analysis and ensemble predictive models.

  • Core Models: Sentiment Analysis (FinBERT) and Price Prediction (Ensemble Learning)
  • Data Sources: Twitter, Reddit, News APIs, and Stock Price APIs
  • Libraries: Transformers, PyTorch, Pandas, Scikit-Learn, Plotly
  • Visualization: Interactive dashboards with sentiment trends and price forecasts

Technical Workflow

1. Data Collection

The project collects real-time and historical data from multiple sources. Each source has its dedicated data collector:

  • Twitter Collector: Scrapes tweets using hashtags like $TSLA, $AAPL, and other stock mentions.
  • Reddit Collector: Fetches data from r/WallStreetBets and other finance-related subreddits.
  • News Collector: Collects headlines and articles from major financial news APIs.
  • Stock Data Collector: Pulls OHLCV (Open, High, Low, Close, Volume) data from stock market APIs.

Example code snippet from Reddit Collector:

subreddit = reddit.subreddit('wallstreetbets')
posts = subreddit.top(limit=100)
for post in posts: print(post.title, post.selftext)

2. Data Processing

Raw data from various sources undergoes preprocessing to ensure clean and uniform inputs:

  • Text Cleaning: Removes special characters, URLs, and stopwords using the text_cleaner.py.
  • Tokenization: Converts text into tokens using the BERT tokenizer.
  • Stock Data Preparation: Aggregates stock OHLC data into 1-hour or daily intervals.

Snippet for text preprocessing:

def clean_text(text):
text = re.sub(r'http\S+', '', text) # Remove URLs
text = re.sub(r'[^a-zA-Z0-9 ]', '', text) # Remove special characters
return text.lower()

3. Sentiment Analysis

The FinBERT model is used to analyze sentiment in tweets, Reddit posts, and news headlines. FinBERT is a fine-tuned BERT model specifically for financial sentiment classification.

  • Labels: Positive, Neutral, Negative
  • Integration: The model takes cleaned text as input and outputs sentiment scores.

Example usage of FinBERT:

from transformers import pipeline
finbert = pipeline("sentiment-analysis", model="ProsusAI/finbert")
sentiment = finbert("Stock prices are looking bullish for Tesla!")
print(sentiment)

Sentiment scores are aggregated hourly/daily to detect trends across social media and news sources.

4. Price Prediction Using Ensemble Models

The prediction model uses an ensemble approach combining:

  • Random Forest: For general price trend prediction.
  • XGBoost: Handles feature importance and non-linear relationships.
  • Linear Regression: Provides baseline performance for trend analysis.

The sentiment scores are integrated as features alongside OHLCV data for a comprehensive prediction pipeline.

# Example ensemble pipeline:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
X = df[['sentiment_score', 'Close', 'Volume']].values
y = df['NextDayClose'].values
model.fit(X, y)

Predictions are compared to actual values to evaluate accuracy metrics like RMSE and MAE.

5. Visualization and Dashboard

An interactive dashboard built with Plotly and Dash displays the following insights:

  • Sentiment trends over time for specific stocks.
  • Actual vs Predicted stock prices.
  • Market volatility and trading volume visualization.

Example for visualizing sentiment:

fig = px.line(df, x="timestamp", y="sentiment_score", title="Sentiment Over Time")
fig.show()

Results and Key Insights

  • Sentiment analysis identifies significant positive or negative trends influencing price movements.
  • Ensemble prediction models improve accuracy by integrating sentiment features.
  • Visualization provides actionable insights for traders and analysts.

The combined approach highlights how market sentiment significantly correlates with stock price movement during high volatility periods.

Key Highlights

  • Seamless integration of FinBERT for sentiment classification.
  • Ensemble predictive models for stock price forecasting.
  • Interactive dashboards for visual analysis.
  • Data collection pipelines from multiple real-time sources.

GitHub Repository

Access the complete project and codebase here: GitHub Repository

Built with passion for financial technology and machine learning. Explore more projects on GitHub.