Harnessing Transformer Models for Sentiment Analysis and Market Prediction
MarketPulse is a sophisticated tool for predicting market movements based on sentiment analysis. This project integrates data collection from social media, news, and stock data with transformer-based models, such as FinBERT, to analyze market sentiment. The ultimate goal is to forecast stock price movements using both sentiment analysis and ensemble predictive models.
The project collects real-time and historical data from multiple sources. Each source has its dedicated data collector:
Example code snippet from Reddit Collector:
subreddit = reddit.subreddit('wallstreetbets')
posts = subreddit.top(limit=100)
for post in posts: print(post.title, post.selftext)
Raw data from various sources undergoes preprocessing to ensure clean and uniform inputs:
text_cleaner.py
.
Snippet for text preprocessing:
def clean_text(text):
text = re.sub(r'http\S+', '', text) # Remove URLs
text = re.sub(r'[^a-zA-Z0-9 ]', '', text) # Remove special characters
return text.lower()
The FinBERT model is used to analyze sentiment in tweets, Reddit posts, and news headlines. FinBERT is a fine-tuned BERT model specifically for financial sentiment classification.
Example usage of FinBERT:
from transformers import pipeline
finbert = pipeline("sentiment-analysis", model="ProsusAI/finbert")
sentiment = finbert("Stock prices are looking bullish for Tesla!")
print(sentiment)
Sentiment scores are aggregated hourly/daily to detect trends across social media and news sources.
The prediction model uses an ensemble approach combining:
The sentiment scores are integrated as features alongside OHLCV data for a comprehensive prediction pipeline.
# Example ensemble pipeline:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
X = df[['sentiment_score', 'Close', 'Volume']].values
y = df['NextDayClose'].values
model.fit(X, y)
Predictions are compared to actual values to evaluate accuracy metrics like RMSE and MAE.
An interactive dashboard built with Plotly and Dash displays the following insights:
Example for visualizing sentiment:
fig = px.line(df, x="timestamp", y="sentiment_score", title="Sentiment Over Time")
fig.show()
The combined approach highlights how market sentiment significantly correlates with stock price movement during high volatility periods.
Access the complete project and codebase here: GitHub Repository
Built with passion for financial technology and machine learning. Explore more projects on GitHub.