16 Mar 2018

Formatting tweets: a look at Extended tweets, Retweets and Quotes

One thing I’ve noticed on thefeed.press is that the conversations (the tweets) surrounding shared links are sometimes more interesting than the link. To place proper emphasis on these tweets mean displaying them wherever necessary; the email digest for example. And displaying them mean formatting them properly.

Introduction

To display a tweet properly, it needs to be well formatted. This means identifying and linking entities like usernames, hashtags and URLs. In simple terms, it is converting a typical tweet object like this:

{ "created_at" : "Mon Mar 05 21:16:46 +0000 2018" , "id" : 970770116043595800 , "text" : "Wish I have some time to curate #WeAreNigerianCreatives. Someone please do." , "entities" : { "hashtags" : [{ "text" : "WeAreNigerianCreatives" , "indices" : [ 32 , 55 ] } ], "symbols" : [], "user_mentions" : [], "urls" : [] } }

to this:

Wish I have some time to curate #WeAreNigerianCreatives. Someone please do.

Notice that the tweet object’s text is plain unformatted text but there is an additional entities object with necessary details for formatting. You probably won’t need to write a library to match and replace the entities in the text though. Twitter provides Twitter Text, an amazing library to do this.

This is a representation in Node.js.

// twitter-text already installed with // `npm install twitter-text` // ... const twitter = require ( ' twitter-text ' ) , tweet = { " created_at " : " Mon Mar 05 21:16:46 +0000 2018 " , " id " : 970770116043595800 , " text " : " Wish I have some time to curate #WeAreNigerianCreatives. Someone please do. " , " entities " : { " hashtags " : [{ " text " : " WeAreNigerianCreatives " , " indices " : [ 32 , 55 ] } ], " symbols " : [], " user_mentions " : [], " urls " : [] } } ; console . log ( twitter . autoLinkWithJSON ( tweet . text , tweet . entities );

Say hello to extended tweets

For tweets over 140 characters, the tweet object only returns 140 characters of text by default. In this compatibility mode,

text is truncated to 140 characters truncated is set to true for tweets that are more than 140 characters entities only include those in the available 140 text range

Here is an example tweet object

{ "created_at" : "Sat Mar 10 18:12:17 +0000 2018" , "id" : 972535628742078500 , "text" : "I kind of hate how with most web development/new frameworks etc., I start out with the intention “I’d like to spend… https://t.co/A10WmSzVeL" , "truncated" : true , "entities" : { "hashtags" : [], "symbols" : [], "user_mentions" : [], "urls" : [{ "url" : "https://t.co/A10WmSzVeL" , "expanded_url" : "https://twitter.com/i/web/status/972535628742078469" , "display_url" : "twitter.com/i/web/status/9…" , "indices" : [ 117 , 140 ] } ] } }

Formatting that will give this:

I kind of hate how with most web development/new frameworks etc., I start out with the intention “I’d like to spend… https://twitter.com/i/web/status/972535628742078469 …

compared to the original tweet:

I kind of hate how with most web development/new frameworks etc., I start out with the intention “I’d like to spend 20 minutes learning X today,” and have to invest an additional 60 minutes just setting up the appropriate environment.

Mode: Extended

How to get the full text? Simple. Add the parameter tweet_mode=extended to any endpoint you are querying. So instead of https://api.twitter.com/1.1/statuses/show/972535628742078469.json , let’s try https://api.twitter.com/1.1/statuses/show/972535628742078469.json?tweet_mode=extended

{ "created_at" : "Sat Mar 10 18:12:17 +0000 2018" , "id" : 972535628742078500 , "full_text" : "I kind of hate how with most web development/new frameworks etc., I start out with the intention “I’d like to spend 20 minutes learning X today,” and have to invest an additional 60 minutes just setting up the appropriate environment." , "truncated" : false , "display_text_range" : [ 0 , 234 ], "entities" : { "hashtags" : [], "symbols" : [], "user_mentions" : [], "urls" : [] } }

Yeah, that simple. Notice that:

full_text replaces text truncated is false display_text_range identifies the start and end of the displayable content of the tweet.

You can then go ahead and format using full_text and entities .

const twitter = require ( ' twitter-text ' ) , tweet = { " created_at " : " Sat Mar 10 18:12:17 +0000 2018 " , " id " : 972535628742078500 , " full_text " : " I kind of hate how with most web development/new frameworks etc., I start out with the intention “I’d like to spend 20 minutes learning X today,” and have to invest an additional 60 minutes just setting up the appropriate environment. " , " truncated " : false , " display_text_range " : [ 0 , 234 ], " entities " : { " hashtags " : [], " symbols " : [], " user_mentions " : [], " urls " : [] } } ; console . log ( twitter . autoLinkWithJSON ( tweet . full_text , tweet . entities );

Here is a retweet requested in extended mode.

{ "created_at" : "Sun Mar 11 12:00:27 +0000 2018" , "id" : 972804442667003900 , "full_text" : "RT @jasongorman: As a physics grad, I understand how snooker works at a level I imagine a lot of pro snooker players don't. But I suck at s…" , "truncated" : false , "display_text_range" : [ 0 , 140 ], "entities" : { "hashtags" : [], "symbols" : [], "user_mentions" : [ { "screen_name" : "jasongorman" , "name" : "jasongorman" , "id" : 18771008 , "id_str" : "18771008" , "indices" : [ 3 , 15 ] } ], "urls" : [] }, "retweeted_status" : { ... } }

Notice how full_text is truncated even though truncated says false . What could be wrong? Well, texts in retweets are prefixed with RT @username: and if the resulting text is more than 140 characters, it will be truncated.

What to do? Use the retweeted_status instead. The retweeted_status object contains the full text and entities you need.

{ "created_at" : "Sun Mar 11 12:00:27 +0000 2018" , "id" : 972804442667003900 , "full_text" : "RT @jasongorman: As a physics grad, I understand how snooker works at a level I imagine a lot of pro snooker players don't. But I suck at s…" , "truncated" : false , "display_text_range" : [ ... ], "entities" : { ... }, "retweeted_status" : { "created_at" : "Sun Mar 11 08:10:46 +0000 2018" , "id" : 972746641957642200 , "full_text" : "As a physics grad, I understand how snooker works at a level I imagine a lot of pro snooker players don't. But I suck at snooker. Understanding != ability." , "truncated" : false , "display_text_range" : [ 0 , 155 ], "entities" : { "hashtags" : [], "symbols" : [], "user_mentions" : [], "urls" : [] }, } }

Just check if retweeted_status exist and use that instead.

// Get tweet // ... if ( tweet . retweeted_status ) tweet = tweet . retweeted_status ; formatted = twitter . autoLinkWithJSON ( tweet . full_text , tweet . entities );

Quotes :/

Quotes are in an entirely different world of their own. You need to see what a quoted tweet looks like to understand.

{ "created_at" : "Sat Dec 16 04:04:36 +0000 2017" , "id" : 941881722685284400 , "full_text" : "Added tweets to the daily newsletter for better context. https://t.co/Q46O3husnz" , "truncated" : false , "display_text_range" : [ 0 , 56 ], "entities" : { "hashtags" : [], "symbols" : [], "user_mentions" : [], "urls" : [{ "url" : "https://t.co/Q46O3husnz" , "expanded_url" : "https://twitter.com/thefeedpress/status/941880801087680512" , "display_url" : "twitter.com/thefeedpress/s…" , "indices" : [ 57 , 80 ] }] }, "quoted_status" : { ... } }

The full_text does not tell the complete story. It does not include the tweet that was quoted. The quoted tweet is hidden somewhere in quoted_status . And unlike retweets where you can replace the tweet with the retweeted status, you need both the original and additional tweet to make complete sense of a quote. Here is what quoted_status looks like:

{ "created_at" : "Sat Dec 16 04:00:56 +0000 2017" , "id" : 941880801087680500 , "full_text" : "New newsletter screenshot https://t.co/HQmJumZfhN" , "truncated" : false , "display_text_range" : [ 0 , 25 ], "entities" : { ... }, "extended_entities" : { ... } }

So what do we do in this case? What we need to achieve is something like this:

Added tweets to the daily newsletter for better context @thefeedpress:

New newsletter screenshot pic.twitter.com/HQmJumZfhN

And it seems we just need to format the quoted tweet and additional tweet separately and show them together.

const twitter = require ( ' twitter-text ' ) ; // Get tweet // .. let text = twitter . autoLinkWithJSON ( tweet . full_text , tweet . entities ); if ( tweet . quoted_status ) { let qt = twitter . autoLinkWithJSON ( tweet . quoted_status . full_text , tweet . quoted_status . entities ); text += `<blockquote><a href="https://twitter.com/ ${ tweet . quoted_status . user . screen_name } ">@ ${ tweet . quoted_status . user . screen_name } </a>:<br> ${ qt } </blockquote>` ; } console . log ( text );

Added tweets to the daily newsletter for better context. https://twitter.com/thefeedpress/status/941880801087680512 … @thefeedpress:

New newsletter screenshot pic.twitter.com/HQmJumZfhN

Looks pretty close. But the additional tweet has a link to the embedded quote. Can we remove this link though? Let’s try.

Since we know the link to the quoted status will always end the additional tweet text, we can match end of text for link with format https://twitter.com/[quoted_status_user_username]/status/[0-9]+ and remove. There are a couple of issues with this though. If we match the unformatted text, the url will still be in the format http://t.co/\w+ (unexpanded) and not https://twitter.com/[quoted_status_user_username]/status/[0-9]+ (expanded). If we match after formatting, the link would have been expanded but will contain HTML tags that will break our regular expression.

Well, since we know the link will always end the text, we can remove any ending link in the unformatted text. We can also remove the index from the entities before we then proceed to format the text.

if ( tweet . retweeted_status ) tweet = tweet . retweeted_status ; if ( tweet . quoted_status ) { if ( tweet . entities && tweet . entities . urls ) { let re = new RegExp ( ' https://twitter.com/ \\ w+/status/ ' + tweet . quoted_status . id_str ); tweet . entities . urls = tweet . entities . urls . filter ( url => ! re . test ( url . expanded_url )); } text = twitter . autoLinkWithJSON ( tweet . full_text , tweet . entities ); let qt = twitter . autoLinkWithJSON ( tweet . quoted_status . full_text , tweet . quoted_status . entities ); text = text . replace ( /https: \/\/ t.co \/[^\/] +$/ , '' ); text += `<blockquote><a href="https://twitter.com/ ${ tweet . quoted_status . user . screen_name } ">@ ${ tweet . quoted_status . user . screen_name } </a><br> ${ qt } </blockquote>` ; } else text = twitter . autoLinkWithJSON ( tweet . full_text , tweet . entities );

Conclusion

This is all you will probably need. But there is still more to do. What about displaying media (pictures, videos) within the tweet? Quotes within quotes? Threaded replies?

If you really want to do it, formatting tweets can be a complex thing. But you really don’t have to do it if not necessary. You can use embedded tweets instead.