Numeral Understanding in Financial Tweets for
Fine-grained Crowd-based Forecasting
Chung-Chi Chen, Hen-Hsen Huang, Yow-Ting Shiue, Hsin-Hsi Chen
Numeral Taxonomy
Monetary
Percentage
Option
Data Distribution
- Option is a popular instrument frequently discussed.
- To capture the implications of investors’ opinions, we propose two subcategories for Option category, “exercise price” and “maturity date”.
- $XLU long April $44 calls
- $MSFT those APR.22 CALLS were getting hot..
- The Monetary category contains the following 8 subcategories:
- “money”, “quote” and “change”
- “buy price”, “sell price”, “forecast”, “stop loss” and “support or resistance”.
- The identification of “buy price” and “sell price” can help us understand the performance of the writer.
- $SPY Long 1/2 position 137.89
- Some investors “forecast” the price of the instruments depending on their analysis results.
- The concepts of support and resistance are always discussed in technical analysis.
- The numeral that indicates the proportion of a certain amount is classified into “absolute”.
- The numeral that stands for the change relative to original amount is classified into “relative”.
- ¢Den up almost 10% since Q1 and £áuro up around 7.5%, much more $ for $AAPL pocket. Remember 23% of Apple revenues comes from this two @jimcramer
- 10% and 7.5% are annotated as “relative”
- 23% stands for “absolute”.
Error Analysis
As the performance of human, most of prediction errors of model happen among “quote” and other subcategories. That shows the challenge of understanding the numeral in Monetary category for the machine.
Product/ Version Number
Quantity
Temporal
Indicator
- Quantity information can help us know the position of an investor, and we can give the large weighting to the opinions held by persons who have large positions.
- The version of products may contain numerals. We can use the product information to compare importance of different tweets.
- For example, the tweets discuss of iPhone 7 may be more important than the tweets that discuss iPhone 4.
- Temporal information is also important in financial domain.
- The day most investor focusing on is the one with high volatility.
- We classify Temporal category into two subcategories, “date” and “time”
- This category captures the parameters of the technical indicators.
- Different investors may use dissimilar parameters for the same indicator. In order to capture the price most investors pay attention to, we should identify the parameters being used.
- $ATHX riding 5dma higher, dropping to 13dma at the dips, sign of a healthy advancing stock that stays above 20dma
Task Setting and Experiment
Outline
Task Setting
Introduction
- To the best of our knowledge, this paper is the first paper focusing on understanding the meaning of numerals in financial social media data.
- Example of Financial Tweet:
$TSLA 256 Break-out thru 50 & 200- DMA (197-230) upper head res (274-279) Short squeeze in progress Nr term obj: 310 Stop loss:239.
- In this paper, we
- propose fine-grained numeral taxonomy for financial social media data
- conduct comprehensive experiments to compare the performance of different classification models in coarse-grained and fine-grained tasks.
- attempt to leverage the numeric opinions made by the crowd by understanding the meanings of numerals
Task 1: Classify a numeral into 7 categories, i.e., Monetary, Percentage, Option, Indicator, Temporal, Quantity and Product/Version Number.
Task 2: Extend the classification task to the subcategory level, and classify numerals into 17 classes, including Indicator, Quantity, Product/Version Number, and all subcategories
Data Annotation
Crowd Opinion vs. Analysts’ Opinion
- Introduction
- Numeral Taxonomy
- Data Annotation
- Task Setting
- Methods
- Experimental Results
- Discussion
- Application - Crowd vs. Analyst
Methods
- 707 unique tweets containing numerals from the dataset of SemEval-2017 Task5.
- The dataset in this paper is annotated by three experts with financial domain knowledge.
- The Kappa agreement between each two annotators are 70.30%, 69.75% and 67.07%. It is considered as substantial agreement.
- The subcategories in Monetary category are the hardest to assign, especially, between the “quote” subcategory and other subcategories.
- Totally, 1,341 numerals are annotated in the proposed dataset, FinNum 1.0. (5,315 tweets with 8,868 numerals in FinNum 2.0, NTCIR-14 Shared Task)
Features
Experimental Results
- Word Vector: Pre-train based on 184,050 tweets
- Character-based & Word-based
- CNN
- LSTM
- Bidirectional LSTM
“maturity date” + “exercise price” + call(put)
“Quantity” + Noun
Forecast Price
Trading Strategy
- Long, if the forecast price is higher than the close price at the end of the month
- Short, if the forecast price is lower than the close price at the end of the month
- Close position
- If the unrealized loss of certain stock reaches 7%
- If the close price reaches the forecast price
Hybrid
Thank You!
Conclusion
- We address a new opinion mining challenge to capture the view of the investors on social media platform by giving a fine-grained taxonomy for numerals in financial tweets.
- Based on the forecast price of the individual investors, we provide the trading strategy that has both bullish/bearish information and the price level to close the position.
- With our dataset and models, lots of extended application scenarios can be addressed.
- We release the annotated dataset, FinNum, as a resource for research purpose.