Smart Feed

I wanted to automate my pattern that check new articles, put them in Pocket, read carefully and move to favorite category. This is why Smart Feed function was created.

First, the RSS urls are required for this function. So, it can read the RSS and get a new article, you’ll be notified. So I made a awesome-feeds repository. I thought it would be convenient to use Git to manage RSS for my favorite websites, and I wanted to make awesome series with many good RSS.

Now that the RSS is ready, just let me know when the latest article is published!

I used feedparser here.

f = feedparser.parse(feed_url) f.entries = sorted(

f.entries, key=lambda x: x.get("updated_parsed", 0), reverse=True

) # get Latest Feed

noti_list = []

if feed_url in cache_data:

previous_update_date = arrow.get(cache_data[feed_url])

for e in f.entries:

e_updated_date = arrow.get(e.updated_parsed)

if e_updated_date > previous_update_date:

noti_list.append(self.__make_entry_tuple(category, e, feed_name))

Schedule functions can specified the function’s schedule as described in Part 2 Skill & Scheduler. Checking the feed every minute is big overhead. When I tested it, I felt that 20 minutes of interval was enough.

def __excute_feed_schedule(self, interval):

schedule.every(interval).minutes.do(

self.__run_threaded,

self.function_runner,

{

"repeat": True,

"func_name": "feed_notify",

"params": {},

"day_of_week": [0],

"not_holiday": False,

},

)

Now Kino can notify the latest RSS feed. It’s already useful, but there was a function that I wanted to go further. Automatically save article I already trust to put it in my Pocket!

It need to connect with Pocket, and a using simple classification algorithm can make it smarter. The most important thing in machine learning is data. These data can be created with the raw logs. First, you can view all of the text that notify you with Feed function as the entire data. If only the text stored in the Pocket is given a value of 1, the entire data is divided into the text of interest / the article not interested. In addition, if you give the category of the article or name of website as information, you can create a simple but useful Decision Tree.

Decision Tree From http://ccg.doc.gold.ac.uk/

For example, when a new article is published on the Google AI Blog website, if I’ve seen a total of five of these things, and if I’ve saved four of them in Pocket, it’s also can view as something to be interested.

You can use Decision Tree very easily with scikit-learn.

class FeedClassifier:

def __init__(self):

train_X = FeedData().train_X

train_y = FeedData().train_y



model = tree.DecisionTreeClassifier()

model.fit(train_X, train_y) # Training

self.clf = model def predict(self, link, category):

result = self.clf.predict(category_id)[0]

if result == FeedDataLoader.TRUE_LABEL:

...

else:

...

Online Learning

The next important thing is online learning. The rss feed I put in the Pocket will change at that time. In response, the model must also detect these changes and make judgements with the latest information. The method used is online learning.

keep models up to date by continuously applying new data to models

Kino’s Smart Feed is getting smarter through this way. Online learning is possible by creating a cycle like below.

Logging:

All data on feed notified and with among which feeds put in Pocket Data Processing:

Parse the log to process it with categories, titles, dates, links, etc.

and add labels. (0: Do not put in Pocket / 1: Put in Pocket) Model:

Fit the prepared data to the model. (Training) Predict:

Using the trained model, the new feed is determined whether or not to be put in the Pocket. Then, Feedback is provided for the wrong prediction of the model, so that the correct labels are stored.

If learning in real time is a bottleneck here, it could be a way to have it re-learned once a day.