Microsoft data leaks and the importance of open-source intelligence

December 22, 2023

Security — Image: Adobe Stock / Michael Traitov

Vaidotas Šedys, Head of Risk Management at Oxylabs, discusses how companies can bolster their cyber resilience and stay ahead of the game.

Interconnected digital technology advances at a rapid pace, and so do the tactics and strategies employed by malicious individuals, criminal groups, and even nation-states. The World Economic Forum predicts global cybercrime will reach $10.5 trillion by 2025, forcing businesses and governments to look for next-generation solutions against emerging digital threats.

Unfortunately, deliberate criminal activity is only part of the challenge in this data-driven era. Costly leaks of sensitive data might happen due to simple human errors — in September, Microsoft’s data was leaked two times, not only disclosing the company’s plans for the next-gen Xbox, but also exposing private employee data. As we already know, at least one of these events happened due to an accidentally misconfigured URL link.

Raising public awareness, educating employees, and implementing standard security measures (such as data encryption, multi-factor authentication, or routing traffic through VPNs) are good recommendations for increased organisational security. However, they are hardly enough today if one does not employ open-source intelligence.

What is open-source intelligence?

Open-source intelligence, or OSINT, defines the efforts of collecting, analysing and utilising information from publicly available web sources, including forums, libraries, open databases, and even the dark web. Though OSINT can be used to gather commercially important business information and perform market analysis, at Oxylabs, we usually use it in the context of cyber threat intelligence.

Cybersecurity companies that employ open-source intelligence crawl through thousands of sites, forum messages, and dark web marketplaces, looking for stolen personal credentials and other confidential information, such as source code or trade secrets. Monitoring these sources also helps identify insecure databases and domain squatting.

It might sound counterintuitive, but organisations often do not suspect that some of their sensitive data is lurking somewhere in the open cyberspace. As such, OSINT helps organisations find both unintentional data leaks and criminal data breaches. It can also aid in identifying insecure devices and outdated applications.

The breakthrough that OSINT brings to the cybersecurity landscape mostly comes from the fact it uses publicly available information, releasing cybersecurity organisations of a legally troubling necessity to scour through classified or restricted sources looking for criminal evidence. Moreover, modern data scraping solutions, combined with artificial intelligence (AI) and machine learning (ML), allow them to pull and analyse raw cyber intelligence in real time.

OSINT ‘starter’ pack

To gather cyber threat intelligence, cybersecurity providers must scan thousands of URLs looking for specific client data — it can be corporate email addresses or phone numbers, company names, employee information, and technical details, such as access tokens or IP addresses. The company can be instantly alerted whenever compromised data becomes available in the public domain or the dark web.

It is important to note that companies might monitor not only data directly related to their business and employees but also their client data, alerting them in case their passwords or other sensitive information has been breached.

The biggest challenges here are those of scale and anti-scraping measures. First of all, the global ‘surface’ web hosts about 6 billion websites, which is only the tip of the iceberg. The deep web, which isn’t indexed by search engines, is estimated to be 400 to 550 times as large. Scraping at such a scale requires powerful automation and ML-driven solutions to structure otherwise a massive mess of unstructured data that comes in various formats and languages.

Furthermore, threat actors today are technically advanced professionals, employing anti-bot measures that can include anything from honey-pots serving erroneous data to IP blocking that compromises real-time data flow. It means that cybersecurity companies have to employ resilient proxy networks together with adaptive scraping solutions to circumvent the blocks. With this in mind, it is well worth leaving OSINT efforts for cybersecurity professionals, especially if it involves monitoring the dark web.

Diving into the dark

The dark web is a part of the deep web that is inaccessible to ordinary browsers and hidden by multiple proxy layers. Although there are legitimate actors that use this part of the internet, e.g. investigative journalists, law enforcement actors, and intelligence agencies, the dark web is mostly employed by criminals. This is where stolen private data, intellectual property, confidential information, drugs, and illegal weapons are sold.

As in the case of the surface web, dark web monitoring is performed with the help of custom crawlers and scraper bots. Surveilling the dark web is a valuable source of information about fresh data breaches and new cyber attack methods and vectors. It enables a faster incident response, closing the time gap between the data breach and the moment an organisation becomes aware of it. For cybersecurity researchers, dark web monitoring also allows deep-diving into the newest cybercrime strategies.

However, even if your organisation suffered a breach, it is definitely not recommended to scour the dark web looking for that data yourself – firstly, the dark web is difficult to navigate without prior experience. Secondly, even if you’re armed with proxy servers and VPNs, the risk of exposing your organisation to malware and cyber attacks is still high. Therefore, it is always recommended to use ‘burner computers’ for such tasks instead of devices connected to your corporate network.

Final recommendations

Powered with modern scraping solutions and ML technology, open-source intelligence today allows cybersecurity companies to take a proactive approach to incident management and prevention. OSINT speeds up the detection of data leaks, cyberthreat hunt, and research on the newest criminal strategies.

However, it is important to stress that, although becoming an imperative for cybersecurity, OSINT cannot and shouldn’t replace standard security measures. Businesses should first of all ensure their sensitive data is actually safe. Removing unused access, updating passwords, using multi-factor authentication, working with reliable proxy and VPN providers, and periodically educating employees is the best way to make sure that your business data doesn’t end up as a Black Friday deal on some dark web marketplace.

The same applies to the recent hype around monitoring the dark web. Without denying the opportunities the dark web surveillance opens up for professional cybersecurity researchers and threat hunters, for ordinary businesses out there, pulling valuable information from the surface web and integrating digital security best practices and standards into daily operations might be a more rewarding path to follow.

Microsoft data leaks and the importance of open-source intelligence

Related Articles

AI won’t be won in the server room alone

The UK data centre power debate has a queue problem

Data centre heat should be treated as strategic infrastructure

More stories

Legrand acquires TES as it looks for growth in data centre market

AI won’t be won in the server room alone

The UK data centre power debate has a queue problem

Data centre heat should be treated as strategic infrastructure

Equinix’s latest data centre in Dublin promises no additional grid strain

Confidence isn’t a women problem – it’s an industry challenge

Nscale latest to face public backlash over proposed data centre

Top Stories

Guidance for RDHx deployment: Whitepaper by nVent

Trend Report: How data centre cooling challenges are driving UPS innovations

Zoho announces plans for UK data centre

Why a data centre need more than a UPS and some diesel to keep it running

Half of England’s data centres now use waterless cooling, techUK report finds

Benefits of registering with Data Centre Review