whether the server is SoftEther VPN and not some other

HTTPS server. Because the active prob ers do not send the

preceding GET request, we can distinguish them from legit-

imate clients.

The TLS ﬁngerprint of the SoftEther prob es diﬀers strik-

ingly from that of the actual SoftEther VPN client softw are,

whic h has more and newer ciphersuites, and v arious exten-

sions.

Application Layer—AppSpot.

The sp ecial Host header of the AppSp ot prob e t ype is a

dead giv ea w a y to its purp ose of discov ering servers capable

of fronting access to Go ogle App Engine. All the prob es we

sa w carry a fairly distinctiv e and sp eciﬁc User-Agent string,

whic h is probably sp o ofed, as the rest of the header is in-

consisten t with its purp orted version of Chromium. The

declared version of the brow ser was originally released in

Apr. 2014, and sup erseded just tw o w eeks later b y a new

up date. W e found a small num ber of real web requests using

this User-Agent , but the great ma jority were active prob ers.

The ﬁrst AppSpot prob es arrived in Sep. 2014.

Among other header inconsistencies, the probes set the

header Accept-Encoding: identity , which forces the server to

send the resp onse b o dy uncompressed. W e used this char-

acteristic to weed out the small num ber of non-prober re-

quests that happ ened to use the same User-Agent string—

these requests, using a real Chromium browser, would hav e

set Accept-Encoding: gzip , and the serv er would hav e com-

pressed its resp onse. W e can therefore identify active prob es

in our serv er logs because the num ber of transferred bytes

is greater than it should b e.

The TLS signature of AppSp ot prob es entirely diﬀers from

that of the claimed v ersion of Chromium. The prob es almost

certainly reﬂect use of a custom program that merely imi-

tates a w eb browser.

5.6 Characteristics of the Probing System

W e designed our Counterprobe exp erimen t (Section 4.4 )

to illuminate m ultiple features of b oth the active probing

sensors and its probing netw ork. W e ﬁnd clear evidence

that the sensor resp onsible for triggering prob es op erates in

a single-sided fashion, meaning that it only considers uni-

directional ﬂows. Our exp eriments show ed that an unac-

kno wledged series of a SYN segmen t, follow ed b y an ACK,

and ﬁnally data (i.e., T or’s TLS client hello) suﬃces to trig-

ger a probe. The following subsections discuss additional

ﬁndings.

The sensor does not pr ocess stateless se gments.

Some DPI sensors are stateless, i.e., they pro cess TCP

segmen ts i n isolation, without considering the TCP connec-

tion state. T o learn if the activ e probing sensor is stateless,

w e set out to attract a probe in tw o w a ys: once after es-

tablishing a three-wa y handshake and once—on a diﬀerent

p ort— without prior handshake . The stateful data triggered

a prob e and the stateless did not. This matches our under-

standing of the b ehavior of the Great Firew all. How ev er, it

diﬀers from the Great Cannon [

17 ] that has been used to

i

nject malicious Jav aScript into w eb pages, which acts on

nak ed pack ets.

The sensor does not seem to r obustly r eassemble TCP.

Next, we tried to establish if the sensor is reassembling

TCP streams. In the ﬁrst step, w e sen t the triggering data

in a single TCP segment after establishing a TCP connec-

tion, which, as exp ected, attracted an active prob e. In the

next step, we split the triggering data across pack ets in

20 b yte incremen ts—again after establ ishing a TCP connec-

tion. The fragmented data did not trigger an active prob e,

whic h diﬀers from the G FW [

13 ].

T

his b ehavior was already observed by Win ter and Lind-

sk og [ 32 , § 5.2] in 2012. There are, how ev er, rep orts stat-

ing that the active probing sensor used to reassemble TCP

streams at some p oint [ 31 ].

T racer oute to the sensors.

W e sen t resp onse-triggering pack et trains with the TTL

enco ded in the p ort selection, and also p erformed a similar

traceroute to lo cate the Great Firew all, from b oth a Unicom

serv er and a CERNET serv er. Unicom’s sensor app ears to

op erate on the same link as the GFW, but the CERNET

sensor app ears one hop closer to our serv er.

T ogether, these three tests suggest that the act ive prob er’s

sensor is distinct from b oth the RST-injecting p ortion of the

Great Firewall and the sensor in the Great Cannon.

Inferring the physical infrastructur e.

Section 5.5 suggests that there is clearly a substantial

amount of centralization, as prob es from a diverse range

of IP addresses share both TCP timestamps and initial se-

quence num ber patterns. But what is the nature of the IP

addresses from which the prob es originate? W e envision

three p ossibilities:

1. A netw ork of distributed proxies that simply forwards

ra w pac k ets, and is centrally con trolled by the active

probing system.

2. A few centralized pack et injection devices that extract

the prob ed server’s reply via passive monitoring.

3. A few centralized man-in-the-middle devices that se-

lectiv ely in tercept traﬃc, tem p orarily hijacking end-

system IP addresses, in a manner similar to the Great

Cannon.

Our solution to distinguish these three p ossibilities was to

deplo y a system that resp onds to incoming prob es with a

series of TTL-limited pack ets, eﬀectively acting as a tracer-

oute. Our resp onses included:

• SYN-ACK pack ets, enco di ng the hop in the sequence

n umber.

• UDP pack ets, encoding the hop in the IP ID ﬁeld.

• UDP pack ets to the probe’s source.

• SYN-ACK (with the hop enco ded in b oth the p ort and

sequence num ber) and UDP pac k ets to the top ologi-

cally next IP address.

• SYN-ACK and UDP pack ets to the top ologically next

subnet.

W e triggered probes by sending requests from our server in

China, and our resp onses were sen t blindly , only capturing

pac ket traces for a post-pro cessing analysis.