I'll blog again tomorrow but I've just finished one community webinar where my students fought off two trolls that stayed on for my entire presentation!
I can't seem to shake off my excitement at my latest Python abomination.
Natural Language Processing is an advanced part of data science that allows us to write programs that can gauge the sentiment of a stock.
I took some code fragments and I spent two days figuring out how to do the following :
a) Parse an XML file because Yahoo Finance data feeds are in XML format and it produces a link to an article.
My code came from an old data science textbook and I had to rewrite the code for it to run.
After this, it gets really exciting.
a) My program has to read an investment article and determine whether it is "good" or "bad".
b) To do this, we must find a way to strip out all the 'useless' words in the English language like 'the' and 'an'.
c) Interestingly linguists in the University of Princeton has a library of words that are synonyms of words like "good" or "increasing" so coding this was much easier than I thought.
d) We can compare the frequency of good words versus bad words to determine whether the article is bullish or bearish on a counter.
e) Combing through the Internet, we can measure the number of favourable articles versus the unfavourable one use it as a measure of the sentiment against a stock.
Sadly what we think might work in theory does not work in practice.
I wanted to try sentiment analysis on a truly shitty stock so I chose Luckin Coffee.
But my program flagged 13 favourable articles against 3 unfavourable articles.
Well, we can't win all the time. I have to adjust the program's use of vocabulary. Obviously, unlike a human being, it can't detect irony or the context of the situation.
Still, I am so happy I am writing a blog article few would understand after a busy evening conducting a webinar. I imagine levelling up in NLP may equip me with the ability to summarise legal judgments in the future.
I promise an article in simple English tomorrow!