“The network structure is so dense that any misinformation spreads almost instantaneously within one group, and so segregated that it does not reach the other.” Filippo Menczer

(Update 20/1/2019 added links for how to use advanced search operators)

This guide will show you some beginner techniques for discovering media network associations, especially of disinformation sites which pretend to be “independent”, called ‘grey’ propaganda sites. You can also use some of the same techniques positively to find more media sources you want to follow.

I am not (yet) an OSINT or SMINT expert, but I believe it’s valuable for more of the general public to know some basic OSINT techniques which don’t require any special technical skills (programming) or special or paid-for tools.

The methods I use and recommend rely primarily on social network metadata (data about data), rather than information content or themes. Two reasons I do that: i) we’re much less likely to fall into confirmation bias, for or against, if we rely primarily on objective metadata about network associations of sources than if we try to analyse their contents first; ii) it’s much more efficient, so you have a much better chance of managing to get a mental handle on how much you think you should trust each media source in your network by using OSINT methods than if you try to process all of their content just at the content level.

One change in our media environment in the last ten years is that in the old broadcast-media dominated environment, adults could and mostly did manage to keep a mental map of how much they trusted each media source, and which sections of the newspaper more or less, and on which subjects; in the social media environment, sources proliferate so fast, no-one could manage to mentally keep track or map how much they trust each source, so most people have given up trying and go with their feeling about each post. That makes it too easy for disinformation sources to recruit and then convert.

Before the methods for checking sources, I’ll try to briefly explain what ‘disinformation’ means or should mean, because it’s not as simple as the term ‘fake news’ implies — ‘fake news’ implies the difference it refers to is a categorical variable, but disinformation is really a scalar variable. I also think it’s worthwhile even in a beginner guide to introduce some of the psychology needed to understand how we’re being manipulated, but that implies reviewing our concept of human rationality and that’s too long for this guide.

I don’t believe that democratic societies’ institutions’ responses and planned responses to the crisis of authoritarian regimes exploiting the new social media-based public sphere’s vulnerabilities are even potentially adequate. Institutionally, we’re just reacting to the reactions, not yet thinking ahead. The populist era has occurred not just because of the 2008 financial crisis or authoritarian regimes reacting to the liquefaction of boundaries between formerly national information spaces or public spheres by developing new information warfare strategies, but also because core implicit assumptions aligned with populism are built into the structure of social media platforms. I intend to explain this fully in another article, but can’t for space reasons now.

I believe we need to consciously and consensually restructure our social and broadcast media environment so that it matches more closely a) the shape of environment our social cognitive mechanisms are evolved to fit; b) so that the structure of information presentation more closely matches the underlying mechanisms of how information is selected ranked and filtered to us. I think that can only come about (i) if we create public social media companies to compete with the private social media companies, like the public broadcasting companies compete and set more public-interested standards, and (ii) legislate that user-generated social graph data from a set date becomes partly user property and partly public property, in order to enable users (or ‘actors’, I prefer) to transport their social graph data to new and other platforms, and to enable genuine competition in the social media platforms market; but that would take another article to fully explain and would take probably 10–15 years to carry out, and it requires collective action, which is too slow for now.

In the meantime, until we have adequate collective understanding of our problems and willingness to do something relevant and adequately radical, the best I can recommend is for individuals to learn how to see and understand the networked information environment we’re in more accurately.

Humans are hyper-social and our primary sense is visual. I think also most of our basic mental representations are visual too, or in other words we use visual metaphors to think — a metaphor is a relatively simple model of complex realities to ‘carry us across’ (root of ‘metaphor’) from our limited mental capacities to the infinite complexity of the real world. In my experience, it’s hugely advantageous to switch mental metaphor of the information environment we’re living in now from mentally visualizing it how our current social media platforms present it — as a dynamic array of various types of information content, with no representation of how the items or sources are associated, to mentally representing it as a social network graph, which implies focusing as much or more on the connections as the items. I think my view of the information environment now is more adaptive and efficient because I pay as much or more attention to network associations/ connections as to content, objects, posts, articles, actors in the network.

I have actually surprised myself before with both finding sites to be connected which I didn’t expect were and finding that sites which I did think were connected I couldn’t find any confirmation in the data — it is a good sign that the methodology is robust if it doesn’t always confirm our expectations.

What exactly is ‘disinformation’?

Disinformation is not simply “fake news,” although ‘fake news’ is now commonly being used by the media as if it was an adequate synonym.

‘Fake news’ implies misinformation — simply false information, but disinformation is a strategic mixture of truths, half-truths and lies used to manipulate political behaviour towards desired aims. Sometimes total blatant bullshit can also be used strategically to shift norms of political discourse, but almost all disinformation uses some factual truths as starting materials.

Disinformation is a technique of information-psychological warfare, part of global hybrid warfare, which means an integrated strategy of information-psychological warfare and kinetic (physical and explosive) warfare tactics. ‘War’ I define as: any non-consensual means of gaining (or defending) political control of a territory. Information warfare now leads and kinetic warfare follows, or kinetic supports the information warfare at global scale. A ‘territory’ in this practical context means any environment in which a political community can form — it doesn’t necessarily have to be a geographical space.

A new layer of territory has emerged due to the social interactions and the social construction of the environment on the internet. It is as though on top of the geographical layer of territory, but the political communities that form at the level of territory of the globalised information space or public sphere do not neatly align with national borders. Political communities formed on the internet cohere more with themselves than with the old primarily national political communities that identified with a particular nation-State and made their collective decisions sufficiently in common through that nation-State.

The liquifaction effect of internet and social media globalization on the boundaries of the information environment in which political communities form, fission and evolve, is the same for both democratic and authoritarian regimes. Authoritarian regimes were more directly threatened by losing control of their information spaces, so they reacted first. Belatedly, democratic states are now reacting to the authoritarian regimes’ reactions, but no-one with enough power to do anything potentially adequate has even acknowledged yet the problems underlying the reactionary problems. The international order based on nation-states is dissolving, and no-one as far as I have seen has even begun thinking or planning for a better new system.

Secondarily, some disinformation involves private profit motives. Kleptocratic regimes which use authoritarianism are also more privatized than nationally representative or participative, so they often rely on private profit motives to retain collaborators, but primarily and mainly it’s used for political aims.

Disinformation is built up in layers — the first ingredient is always some truth, even if it’s just an apparently similar event, long ago, with no real causal connection or logical relevance to the new events being reported or used to create propaganda, but always something that will be generally perceived as absolutely obviously true is used as an anchoring point, then reframe it using moral-emotional framing terms to activate the right set of moral-political emotions in the target audience to lead them towards supporting the conclusion they want, reframe the previous, true events into a big geopolitical narrative — typically an extremely reductive narrative or theory which claims to explain everything in a simple totalizing story of everything, possibly add in some simply totally false ‘facts’ or some very misleadingly exaggerated or miscontextualized real facts, and then close with the conclusion aimed for.

Disinformation strategy relies on some facts about human cognitive psychology which we usually prefer to not notice —

We very rarely do the slow, individually independent, conscious kind of thinking to make our decisions, almost all our decisions are made using social cognitive heuristics — quick and efficient approximating mechanisms, for example — the anchoring and adjustment heuristic, the majority threshold heuristic, which I think are most often manipulated in propaganda strategies.

Anchoring and adjustment heuristic means that we often make decisions on complex issues, when we can’t gather enough information to really independently and logically decide for ourselves, by deciding 1) is the anchoring point of the claim or argument probably true?, 2) does the adjustment or distance between my old perception (the anchor) and the new perception which the argument is claiming to be true seem reasonable or credible? Anchoring and adjustment heuristic is very efficient and adaptive — without it we probably wouldn’t survive, but it can easily be manipulated.

‘Bandwagon’ or what I prefer to call threshold heuristic is also basically highly adaptive (and that’s why I don’t like the name ‘bandwagon’ because that implies a negative judgement on it before even describing what it means). Imagine a situation in the human ancestral environment or the environment of evolutionary adaptedness, living in hunter-gatherer semi-nomadic groups of 30–150 people, frequently in inter-tribal feuds with other groups, usually over women and paternity rights or farming or hunting territories, then a pair of men return to the village saying they’ve seen the other tribe approaching, and another, and then another three come and say the same. At some point, the balance of risks and costs of deciding to believe them without really knowing whether it’s true means that it’s adaptive to err on the side of caution and prepare for an attack. Manipulations of bandwagon or threshold heuristic very often evoke fear first because that makes people much more susceptible to this manipulation tactic. Then recollect a situation when you’ve seen comment wars underneath a media source’s post on social media — in that environment, naive people who aren’t more familiar with the topic than what they’ve just read or watched in the news article/ video posted usually don’t know much about who is bloc commenting and where they come from socially and politically. In the ancestral environment, we would have known a lot of intimate details about who was telling us what, and our filtering mechanisms would have been able to work out most of the time who was credible, and make those efficient survival decisions in conflict situations.

I’m trying to not make this section too long now, but the two paragraphs above also indicate briefly the evolutionary mismatches in the structure of the information environment, i) on social media platforms as they’re designed now, ii) the structure of the underlying social graph algorithms which select, rank and recommend what we see, e.g. EdgeRank, iii) the structure of the information environment which our social cognitive heuristics are still mostly adapted to, because our social environment has changed much faster than genetic evolution of our social psychological traits can catch up. A heuristic in a mismatching environment becomes a systematic cognitive bias.

Disinformation is a complicated mixture of truths and lies, so dissecting and unpicking how it was woven together takes far more time than producing more of it. Volume, repetition and recirculation is how they are winning. Debunking it bit by bit is far too slow and costly to even keep up.

Even though there is almost always some truth used in the making of disinformation, the contextual and emotional reframing makes it appear to mean something very different. This is one of three reasons why a piecemeal approach to debunking disinfo cannot work efficiently enough.

Framing

Framing refers to the way the terminology used to characterise and categorise events or people evokes certain moral-political emotions, without us necessarily being conscious of the process. I imagine humans processing moral-emotional frames as rather like ants smelling ant scent trails — was this trail laid by my nest-mates, my colony, my species? If yes, the ants and we tend to accept it, if not, we tend to reject it. We don’t know where the info-trail goes, we just know who it socially smells like and how much we trust them.

It’s probably impossible to introspectively observe directly how framing language evokes a moral-emotional cognitive response in us. We can only ‘see’ the output not the process itself directly. The cognitive ‘modules’ influenced by framing language (or more broadly and accurately, all forms of symbolism) probably evolved long before language, and occur prior to conscious thought, but experimental psychologists have inferred some indirect observations and general principles from how people respond to framing stimuli.

What people actually remember most and react to is the moral-emotional framing, not nearly so much its factual or apparently ‘factual’ contents.

That’s why even if you debunk the fake ‘factual’ content and someone consciously agreed the day before, they’ll often unconsciously revert to repeating the same disinformation the next day because it felt more convincing than reality. Disinfo feels hyper-real. This is reason two of three why piecemeal debunking cannot work adequately or fast enough.

To change that, you have to reframe it so that they feel significantly and memorably different about the topic and then next time they’ll recall the factual details which made them feel differently. I think one of the best ways to do that is to point out the frame itself and how it is manipulative. That might even be the primary adaptive function of the sense of betrayal.

Five basic techniques to check sources and discover networks

Check what else the site or Page posts, especially on political topics; you can use advanced graph search operators to make this more efficient. Check one or more known Russian regime automated media content aggregator sites: do they amplify this site or author or meme, etc.? Check referrals network using e.g. Similarweb or Alexa tools. Check plagiarism using free tools to see where else the text has been shared and maybe where its first occurrence on internet probably was. Check photos or images with reverse image search tools to see where they’ve come from and where their first occurrence probably was.

There are much more advanced, systematic and efficient techniques and tools available too, but the simple tests above will probably help you catch 95%.

I’m now going to explain how to do each of these five tests.

Check what else the site or Page posts

You can just eyeball and scroll, but it’s far more efficient and so you’re more likely to have the patience to get a more representative sampling of content and do it more consistently if you use advanced search operators. You can get very fancy with advanced search operators, but knowing even a few basic ones will make your searching habits hugely more efficient.

The simplest and most useful one is just site search, e.g.

site:thefreethoughtproject.com Syria

In the search results, you can easily see that this site echoes Russian regime propaganda lines about Syria. If you’re not familiar with those lines, compare RT or Sputnik. You may also find content verbatim copied from RT to TFTP.

Guides to learning advanced search operators:

Check aggregators

This is similar to the previous step but means searching on known automated media content aggregators obviously associated with and probably controlled by the Russian regime. An automated media content aggregator, or simply ‘aggregator’, is a site which automatically collects media content from a set of sources determined by filters on sources and terms. I don’t know to what extent the filters are automated and how much human intervention there is.

I usually start by doing a Google site search on The Russophile, a very obvious major Russian regime associated media content aggregator site. This site consistently lists media content which aligns with and is approved by the Russian regime. So if you find a site, article or author there frequently, it’s a fairly good indicator that they are either knowingly or unknowingly part of the Russian regime’s information warfare campaigns.

E.g. to search whether a site is aggregated there:

site:www.therussophile.org “commondreams.org”

This one exemplifies what I said above that even true or mostly true content can be used in overall manipulative persuasion. I don’t think Common Dreams is a site controlled by the Russian regime, but they find some content useful.

You can also just scroll down that aggregator and see the names of sites and the variety of audiences they target — from overt neonazi sites to Natural Health’ hippy Leftist sites, and loads of conspiracy theories hobbyist sites (why). The range of sites with such shared third sources and shared text is not plausibly explainable by chance or by natural affinities. You might feel like washing your eyes out with soap afterwards, and almost all of us have been manipulated this way (including me), but it’s better to know.

Check referrals network

‘Referrals’ means the inward and outward html links in web pages and sites. This is not a reliable test on its own — you’ll get a lot of false negatives but almost certainly no false positives. (My impression is that it has become a less sensitive test over the last few years since we started using this method, and I guess they reacted by redesigning their sites so that meaningless junk comes up in the top 5 results on a referrals network test more often than before.)

Simply put the site name which you’re investigating into one of the social marketing intelligence search engines, e.g. Similarweb provides a lot of functionality for free, and see what you get in the referrals section.

Similarweb is a Search Engine Optimisation (SEO) tool, designed for website marketers, but you can also use bits of it for OSINT. You don’t need to register and the free stuff is enough (altho a Pro account would be fun to try, please), just type or copy-paste the site name in and click. The most relevant section for us is Referrals — that means sites which refer into the site you’ve searched on and sites which are referred to by that site.

Alexa is like Similarweb, better in some ways but without a free trial version-

Initially, an easy starting point for comparison to notice covert disinfo sites in the Referrals analysis results of the site you’ve searched on is:

From:

Alexander Reid-Ross used Alexa search engine to gather the data represented in the graph above, which is like Similarweb but without a free trial version.

This is just an example, showing the network of referrers to and from ZeroHedge.com —

The real scale of the Russian disinformation grey sites network is vast. Guessing by number found / search effort, my educated guess is about 40,000 covertly associated sites in their network globally. I think they keep many in reserve appearing innocuous and gathering followers, in order to replace sites and Pages when they get de-platformed by the social media companies.

This list of Kremlin associated sites was built using basically the same methods as explained in this guide —

(I disagree with the removal of sites from the list just because the site owners or editors deny their association. They may indeed not have been consciously intentionally connected, some of them, but consequences matter more than intentions in this context, and they were certainly closely associated.)

You’ll get much quicker at interpreting referrals network analysis results with practice when you’ve learned the names of more covert disinfo sites.

If search on Similarweb for a site you suspect and find it doesn’t have any obviously dodgy results in the referrals section, click through to a few of the site names you don’t recognise and see their referrals network, often it’s just one or two transitive steps away til you hit an obvious one like Sputnik or Russia Today. Sites don’t have to have referrals, Similarweb free version will only show you the top 5, and I think they might be adjusting to this method we’ve been using to reveal their covert connections and deleting their hyperlinks, so not having other covert disinfo sites in the referrals network of a site you’re investigating doesn’t necessarily mean it’s unconnected.

Check plagiarism

Russian regime trolls are lazy at copy-writing because they don’t need to produce unique content when so few people check sources and check where information has actually come from. Copy-pasting is so much easier. Plagiarism analysis tools designed for checking students’ essays will also help you to discover covert networks of disinformation sites pretending to be “independent” but actually clearly associated with the Russian regime.

It works quicker if you select a chunk of text from a political news article which you find suspicious. You can just use Google and “” search, but Quetext is a bit more convenient and efficient —

I use it to discover first occurrence of text on the internet and visualize the information dispersal trajectory network graph. Like referrals network test, it’s not reliable on its own, because you will get a lot of false negative results.

Checking photos and videos

Finding that the main fact claims in a report correspond with some real events somewhere, sometime, by someone, does not necessarily mean that it is not disinformation — what happened and what was reported may have happened in a different place and time, even if there’s photos or video used. Video is such an emotionally powerful medium that if the factual contents in a video are reframed within a false context to create a false meaning it has an enormous and persistent effect on convincing people that bullshit is true.

For example, they recycle photos of an old case of child organ harvesting and human trafficking but misrepresented as if it was recent and in a different place, depending who they want to demonise this week. The who did what to whom, where and when, will be deliberately falsely mixed up or ambiguated, or they re-use photos from autopsies on car crash victims misrepresented as evidence of organ trafficking, in order to smear the nationality, race or religious identity which the traffickers were falsely attributed with. Paedophilia is another highly emotive topic they often use in disinfo smear campaigns — it’s often used as a thinly veiled homophobic slur or precursor.

Another favourite theme of disinfo photo memes is to create false equivalences, to reduce several different complex situations to one simple story which makes the viewer feel themselves to be the centre of their universe, whether always the good guy or always the bad guy, it works on narcissism — that we would rather be the centre of attention even if it’s bad. False equivalence, reductive, narcissistic narratives also imaginatively erase the autonomy and agency of the people directly affected by the issue.

If you’re investigating a post with a photo you suspect is being misrepresented in another context to make it appear to mean something else, you can do a google reverse image search — in Chrome just right-click on the photo and select “Search google for this image” — it’s possible in other browsers but just takes a few more clicks.

Better than Google reverse image search but a tiny bit more effort, is —

A bit surprisingly and perhaps a bit worryingly (why would they invest so much in it?), Yandex reverse image search is the best, better than Google —

For videos on YouTube, there is this reverse search tool —

Unfortunately this tool only works for videos on YouTube, so far, but sometimes you can find the same video on YouTube and then at least show its network on there.

Sometimes they get complacent about how easy it is to fool the people who desperately want to be fooled because it’s more comfortable and seems to absolve them of human responsibility, and then they do something like this:

They re-used photos from a school they bombed in Idlib as if the photos showed a school in West Aleppo they claimed was bombed by rebels, but they forgot to change the AFP (Agence France Press) pop-up tag on the photos they nicked from the internet, so the description of the original context was still there. Sloppy.

Also google the name of the supposed author. Are they in a position to possibly have done the kind of investigative research journalism which they claim to rely on in their article? If not, where do the fact claims come from?

A couple of common assumptions which don’t match the environment

When people share a post on Facebook or any other social media platform, because of the structure of the environment, they tend to assume that their decision is just about sharing that post or not, without realising that every interaction influences what they and everyone who interacts with them will see selected and ranked in their newsfeeds next. Sources also have affinity factors, so if you share post X from source A, and if B has a high affinity score with source A, you and everyone who’s interacted with post X from source A will also start to see more posts from source B. So if you post something from ‘Collective Evolution’ you will probably also see more from Russia Today.

Basic explanation of how Facebook Edgerank algorithm works—

Technical explanation from Facebook —

More comprehensive explanation of how Facebook newsfeed algorithm works —

Overtly connected propaganda sources and contents are termed ‘white’ propaganda. Sources and content which pretend to be ‘independent’ but are actually closely associated or controlled are termed ‘grey’ propaganda. ‘Black’ propaganda means sources or contents which pretend to be from the opposite side — usually that’s a few authors among a reputable mainstream sources.

The overt (‘white’) propaganda sites like Russia Today are a tiny proportion of authoritarian regimes’ associated disinformation “news” sites — most of them are ‘grey’. ‘Black’ propaganda is hard to prove from open source data alone.

The idea of everyone’s an individual and we shouldn’t judge them by association is good when in its an individual matter of justice, but it’s really far too cognitively expensive to process all political information in that way — it’s impossible, so people actually do judgement by association, unconsciously. I’m merely proposing that we do it more consciously, explicitly and carefully.

Reframing is more effective emotionally and politically than debunking —people don’t get motivated about factual debunking even if on a superficial ‘rational’ cognitive level they accept it — the moral-emotional framing sticks longer than the ‘surface level’ information content (because human cognition is actually embodied not rational in the Cartesian sense, see Lakoff and Johnson, 1999). People tend to carry on fitting new information content into a misrepresentative frame used to mislead and manipulate their political decision making even after they’ve been shown that it is misrepresentative. Human rationality is not as individual as its traditionally been theorised to be — we mostly make decisions based on social cues, which for humans are mainly linguistic moral-emotional framing. Moral emotions occur in our whole body, so are more memorable than particular information content.

Generating bullshit is so much quicker and less costly than debunking it. If we try to debunk each article, each repetition of that content, bit by bit, page by page, site by site, there is absolutely no chance we can match the efficiency of the barrage of bullshit or ‘firehose of falsehoods’ strategy. This is reason three of three why the piecemeal debunking strategy cannot work.

The only strategy which can match or outrun them in efficiency is to look at and map networks of information sources and judge provisionally by association — you can still suspend definitive judgement if you think guilt by association is too unfair, but at least take into account the circumstantial evidence of association with known propagandists or propaganda sites.

If the social networked-ness of information sources was represented graphically in an intuitively instantly understandable way for human beings, then we could and probably would naturally use that spare attention and time to independently scrutinise the content, framing and themes more thoroughly. There would be more public attention capacity freed up so people could spend it on more in-depth reading and interpretation of news.

Currently, the social graph APIs of major social media companies are commercially private and what’s represented to us misleadingly is an atomised linear sequence of posts. That must change — I have no prejudice against private profit, but the public externalities of Google and Facebook’s current business model and algorithms are just too high to tolerate. They can change for the better and still be profitable too, perhaps even more so.

Indeed with my recommendation of judging by association before content there’s always a risk of false positive results and occasionally being unfair to individuals and to new media sites, but the public risks of not preferring to err on the cautiously suspicious side are far more severe— e.g. when disinformation leads to mass murder, whereas very little real harm is done by mistakenly tentatively identifying a new or alternative media site as a covert disinfo site — if they object and show why, then the accusation is dropped.

Social bees and ants do not wait to find out whether larvae which smell of oleic acid are really infected with parasitic Varroa destructor mites by waiting for them to hatch out in order to see individually; if they smell of oleic acid they remove them from the nest before any mite eggs hatch and infect others.

A few strategic narrative themes which occur so frequently in Kremlin disinfo sites you can use them as indicators:

“MSM!” — functionally equivalent to Goebbels’ tactic of “Lügenpresse!”, it works to isolate their followers from the rest of the public and make it easier to manipulate people through groupshift and deindividuation processes.

“But WW3 with Russia!” — Putin is obsessed with hyping up the perception of external threats to Russia, because that’s one of the only things he’s seen as good against by his own people. To international audiences, it maintains public support for foreign policies of limitless appeasement regarding Russia’s increasingly aggressive global hybrid warfare strategies now.

Smearing the rescuers — the Syrian Civil Defence ‘White Helmets’ and any medics working in opposition territory in Syria make Russian actions look bad, so they must be discredited. This is such a major obsession for Kremlin-allied disinfo sites that you can use it to identify them. MSF Sea also get incredible amounts of harassment on Twitter, much of it led by Kremlin bots.

Mirror Accusation tactic — a defensive propaganda technique which works by pre-emptively accusing the other side of what you just did, so that when they accuse you of it it will seem less credible, so they may not even try. For example, the Assad regime really did enable and cooperate with Al-Qaeda in Iraq and really did setup and still supports Daesh (ISIS) in Syria, so to defend against those true accusations they use the big bold lie propaganda tactic to falsely accuse all their international enemies of doing what they actually did. If you see “US created ISIS” bullshit (why), that site is either directly connected to them or a useful idiot for them. For tyrants globally now, the most convenient international propaganda line is to call all their political opponents “terrorists” or “terrorist supporters”.

For more advanced OSINT tools and training, see:

Articles showing how much you can investigate and see for yourself using only open source data and public tools:

On google search ranking manipulation using SEO techniques plus bot-nets:

Longer reflective piece about the ethical implications and consequences —