Medical Data Being Exposed via Public GitHub Repositories

Research recently published in a collaborative report by security researcher Jelle Ursem and DataBreaches.net has revealed healthcare organizations and their business associates are inadvertently leaking patient data online via unprotected public GitHub repositories.

GitHub is a Git repository hosting service that is used for software development and version control. The platform is used for code sharing, publication, and collaboration and also acts as a social networking site for programmers. The platform has access controls and allows code to be privately hosted or loaded into public repositories that can be accessed by anyone.

Jelle Ursem, a security researcher from the Netherlands, had previously examined online databases and GitHub repositories and identified more than 400 data breaches by many private sector companies, Fortune 500 firms and government agencies.

Ursem decided to search Github to see if used the platform had been used in connection with medical information; highly sensitive data protected under several regulations such as the EU’s General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) in the U.S.

It took Ursem around 10 minutes using simple search terms such as “companyname password” to identify one such breach. Further investigation identified a further 8 companies that had inadvertently leaked credentials that allowed access to be gained to patient health information.

GitHub repositories are searched by cybercriminals as they can contain sensitive information such as hard-coded usernames and passwords in source code. Those credentials are then used in password reuse attacks against companies. There have been many such attacks, with exposed credentials on GitHub responsible for the 2016 data breaches at Uber and Lynda.com.

Ursem found hard-coded credentials in source code which could be used to gain access to Microsoft Office 365 accounts, G Suite, and secure FTP accounts. Ursem then tried to use those credentials on the right software and was able to connect. He connected to G Suite and Office 365 accounts and gained access to contracts, user data, internal documents and agendas, team chats, address books and much more, including sensitive patient data. During the course of his investigations he found the protected health information of between 150,000 and 200,000 patients.

The credentials allowed Ursem to view data as if he were an employee of each company. While Ursem had no malicious intentions, other individuals were discovered to have accessed and made copies of some of the repositories. Their actions are unlikely to be benign.

Ursem attempted to notify the entities concerned to get them to secure their data but his emails were ignored or not acted upon in several cases. Ursem sought assistance from DataBreaches.net, which has previously helped to get sensitive medical data in exposed databases secured. In most cases, but not all, the exposed patient information was eventually secured.

The companies found to be leaking data were:

  • Xybion
  • MedPro
  • Texas Physician House Calls
  • VirMedica
  • MaineCare
  • Waystar
  • Shields Healthcare Group
  • Acc-q Data Network
  • An unnamed entity that still has not secured the exposed data

In most cases, the leaks were due to third party developers embedding hard-coded credentials in code, rather than using private repositories. In many cases, the data had been exposed for several months or years and some developers had simply abandoned repositories when they were no longer needed, without deleting data. The entities concerned may not have realized that GitHub was being used but did not help by failing to implement two-factor authentication, which allowed the hard-coded credentials to be used to access email and other accounts.

Healthcare organizations may not realize GitHub is being used and data is being exposed. All it takes is for a developer to make a mistake for a data breach to occur. In some cases, contracted developers have continued to make the same mistakes with client after client, most likely unaware of the regulations covering healthcare data and the potential consequences of data exposure. One such developer, dubbed The Typhoid Mary of Data Leaks, appeared to have made just about every security mistake that was possible, across multiple clients.

The research highlights the importance of auditing vendors such as software developers, even if they have signed a business associate agreement, to make sure that they are not inadvertently exposing data or giving hackers easy access to their employers’ systems. It is also important for organizations to act on attempts by researchers to notify them of data breaches through responsible disclosures as, in at least three cases, Ursem’s attempts to alert the entities to the breaches were not acted on because they thought they were social engineering attempts. As a result of the inaction, data were exposed for even longer, giving malicious actors even more time to find and steal sensitive information.

Details of the data exposures and risk of data breaches via GitHub have been published in the report, No Need to Hack When It’s Leaking.