Roundup Here's a quick summary of what's been happening in the machine learning lately, beyond what we've already reported.

Newsflash! Facial recognition systems are still racist: The latest benchmarking tests performed by the National Institute of Standards and Technology, reveals that facial recognition algorithms made by a French startup for immigration screening by the US government struggles with identifying women and people of darker skin.

The latest NIST test results [PDF] published this month shows that AI software from Idemia used to scan the cruise ship passengers coming to the US suffers from racial biases. Their models are least accurate when tasked with identifying black women.

Idemia’s software misidentified black women ten times more frequently than white women. Thankfully, the algorithms aren’t available for commercial use yet, according to Wired.

The issue of racial bias in these machine learning systems is a well-known flaw, and is at the heart of all the controversy surrounding the technology. Demographic problems were raised in two recent congressional hearings on facial recognition. The NIST results are just another reminder that this type of technology still isn’t good enough to use yet, if at all.

New self-driving dataset from Lyft: If you need more data to train your algorithms to drive cars autonomously then look no further.

Lyft, the ride-hailing service, has published a dataset complete with visual inputs processed by the cameras and LiDAR on its self-driving cars, as well as maps of the road.

Important objects, like other cars and pedestrians, have been highlighted with bounding boxes that have been carefully hand annotated by people. You can download it here.

Anonymous data isn’t ever really anonymous: A new research paper published in Nature this month reveals methods that can overturn data anonymisation processes by predicting the identity behind the data.

Researchers from Imperial College London and the Université catholique de Louvain, Belgium have found that a whopping 99.98 per cent of Americans would be correctly re-identified in any dataset that used up to 15 demographic attributes.

To understand how, you’ll have to wade through the mathematical proofs in the paper. The results are pretty startling nonetheless. “They suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model,” the researchers wrote.

Here’s how to compress your convolutional neural network: The best computer vision models are so big after having been trained on large datasets that its difficult to cram them onto low-powered devices.

A team of researchers from Facebook AI Research and the University of Rennes in France have devised a new method that compresses these models so they take up less memory. The structured quantization algorithm works on the “reconstruction of activations, not on the weights themselves,” Facebook explained this month.

The algorithm managed to compress a ResNet-50 model trained on ImageNet with 76.1 per cent accuracy down to 5MB of memory, as well as a Mask R-CNN model to 6MB. Both are 20 and 26 times smaller compared to the original model.

You can read the paper here [PDF] and see the code here.

Contractors working on Apple’s Siri have heard you having sex: Oh dear, human contractors listening to the digital assistant's audio recordings have apparently listened to illicit drug deals, private medical information, and people having sex overheard by the voice-activated software. Yes, Apple keeps Siri's audio recordings of you, in case you forgot.

These people are employed by the Silicon Valley giant to investigate any technical errors, for example if the AI bot incorrectly hears "Hey, Siri!" and responds when it wasn’t explicitly activated, or if its replies to requests are unsatisfactory. But in between all that, contractors regularly hear the more intimate details of people’s private lives picked by a Siri device's microphone. Just the sound of someone undoing a zip can activate the personal assistant, it is claimed.

A whistleblower working as a contractor to Apple told The Guardian: “There have been countless instances of recordings featuring private discussions between doctors and patients, business deals, seemingly criminal dealings, sexual encounters and so on. These recordings are accompanied by user data showing location, contact details, and app data.”

The anonymous contractor believed Apple wasn’t being transparent enough about who could be listening in and what they might be hearing.

Microsoft wheels out trendy ol' AI for Defender: Microsoft has described some of the machine-learning techniques it has apparently injected into its cloud-based Defender ATP enterprise antivirus to stay one step ahead of malware makers.

Trojan and worm writers typically run their creations through scanning software like Defender, and modify their code until the security tools fail to catch the new nasties. So Microsoft has started using something called monotonic models, based on based on computer science research [PDF] by the University of California, Berkeley, to inspect files and identify malware samples in a new way.

For a start the monotonic models run in Microsoft's cloud, so if a malware developer wants to try their latest strain against the scanner, they'll have to upload their samples to Redmond, rather than testing them on an offline machine. This means the Windows giant is automatically tipped off with a load of useful information about the fledgling malware.

Microsoft has been using three different monotonic classifiers running alongside its traditional antivirus software as part of its Microsoft Defender ATP package since 2018, we're told. The machine-learning technology can block 95 per cent of malicious files, apparently. One of them blocks nasty code on an average of 200,000 devices every month, Redmond claimed this month.

Phuck off, phishers! JPMorgan Chase crafts AI to sniff out malware menacing staff networks READ MORE

Another way that attackers trick antivirus software is by signing their nasty code with a trusted certificate so that it looks legit. Since monotonic models only analyse features, and don’t consider a file’s certificate, this method of faking certificates is useless against them.

Another increasingly common trick is to surround malware with large chunks of legitimate code to trick the scanner system into thinking the trojan or worm is a harmless normal program. However, Microsoft's monotonic model can apparently see through such obfuscation techniques.

"Monotonic models are just the latest enhancements to Microsoft Defender ATP’s Antivirus," said the Defender research team.

"We continue to evolve machine learning-based protections to be more resilient to adversarial attacks. More effective protections against malware and other threats on endpoints increases defense across the entire Microsoft Threat Protection.” ®