This machine-learning upstart trained software to snare online drug dealers. Now it's going after fake coronavirus test equipment peddlers – The Register
Machine-learning software to snare scammers hawking fake COVID-19 test kits on social media is being built by a tiny startup funded by the US National Institutes of Health.
S-3 Research, founded by Timothy Mackey, an associate professor at the school of health sciences at the University of California, San Diego, was focused on sniffing out drug dealers online using artificial intelligence.
The startup trawled the internet searching for ads touting opioids and marijuana, and trained a classifier using this advert text. The resulting system could, with the help of various natural language processing techniques, automatically identity illegal drug ads on the internet, alerting websites, regulators, and the cops to remove the material.
Then the COVID-19 coronavirus broke loose. People in China started talking about the mysterious respiratory illness on Weibo, a Chinese microblogging site. Mackey kept a close watch, and as the virus infected more and more regions and countries, word spread on social media like fire.
By March, the World Health Organization declared the disease a pandemic. Scammers began taking advantage of people’s panic, peddling bogus treatments, vaccines, and test kits online for COVID-19. S-3 Research decided to redirect its efforts to detect these dodgy ads using its technology to help thwart these scumbags before they caused any serious harm.
Europe calls for single app to track coronavirus. Meanwhile America pretends it isn’t trying to build one at all
“At first, it was all pretty harmless stuff,” Mackey told The Register. “People were promoting Chinese herbs or Vitamin C drips claiming that they boosted your immune system. It’s definitely not going to help you, but it won’t really harm you either.”
As governments struggled to contain the outbreak – with a lack of testing kits and hesitation in issuing stay-at-home orders and travel bans – the internet exploded with tricksters touting gizmos capable of screening for COVID-19 using finger pricks, urine samples, or saliva. The kit is all fake. Real tests detect infections by probing the cavity between the nose and mouth using a six-inch swab, which is sent to a lab to process using specialist equipment.
“These fake tests could make the outbreak worse,” Mackey said. “What if they come out negative? People might be less inclined to follow social distancing guidelines when they could be carriers of the disease.”
America’s medicines and trade watchdogs, the FDA and FTC, respectively, have since issued warnings to seven companies selling bogus COVID-19 products. “The FDA considers the sale and promotion of fraudulent COVID-19 products to be a threat to the public health. We have an aggressive surveillance program that routinely monitors online sources for health fraud products, especially during a significant public health issue such as this one,” said FDA Commissioner Stephen Hahn.
Sham adverts selling COVID-19 test kits can be found in all sorts of languages, from English, Spanish, and Russian, to Chinese and Japanese. “Like the pandemic, it’s a global problem,” Mackey said.
To retrain its software – an effort that’s a work in progress – S-3 Research scraped more than 80 million posts from Twitter, as well as material from Reddit and LinkedIn, mentioning the virus. The text in each post was then analyzed by unsupervised learning algorithms to identify clusters of words related to the coronavirus and selling. Then patterns of suspicious text – such as “testing kits”, “rapid results”, and a link to a shopping website – were pinpointed as indicators of malicious ad for fake coronavirus stuff. These patterns will then be used to teach a classifier so that it can automatically flag up internet postings if they appear to be flogging bogus test equipment.
Regulators, such as the FDA, would benefit from the trained system, when finished, and it could help social networks, too, we’re told.
“We’re still trying to figure out the exact features to train the classifier,” Mackey said. “Nobody is doing a good job removing bad content right now. Platforms can do it themselves, they have much more resources, but they’re not really focused on filling that public health need and it’s quite difficult from a technical standpoint.
“It’s very episodic: it’s all based on the news and what information people are sharing. For example, if the FDA suddenly approved a new kind of test kit, scammers will begin touting that exact model.”
Taking down sellers of fake gear is like a game of whack-a-mole, though: if they’re booted off a website or app, crooks will just register new accounts. It can’t be solved by technology alone: people need to be educated and told the reality of the tests and treatments, so that they don’t buy bogus stuff online. However, we fear the kind of person who believes a random tweet about a miracle cure won’t listen to reason from experts.
Stronger, clearer advice from authorities, describing how real tests and treatments work, could help, along with systems to detect and block the spread of misinformation and baseless, harmful conspiracy theories about fake cures, causes, and testing.
“We need to be able to identify and characterize what’s going on to provide the public with the correct information on products or drugs they should be using. The coronavirus is not only a pandemic, it’s an infodemic, and we need technical solutions to combat that. It’s not like this is going to be the last pandemic we see.”
S-3 is due to issue a paper detailing its work this month. We’ll let you know when it’s ready. ®
Sponsored: Webcast: Build the next generation of your business in the public cloud