Amazon Developed a New System for the De-identification of Medical Images

Amazon recently announced a new system has been launched which can identify the protected health information (PHI) included in medical images and automatically redact the information so that it is no longer possible to identify a patient from the images.

Medical images usually include patients’ PHI, such as names, birth dates, and other information. The PHI usually appears as text in the images. Before using medical images for research, it is first necessary to obtain authorization from patients. An alternative is to remove all identifying protected health information from the images. The removal of PHI from medical images is a manual process and can be a costly and time-consuming procedure, particularly when a lot of images need to be de-identified.

With Amazon’s new Rekognition machine-learning service, it is possible to identify the text in images and extract it into a text file. Then, the text is entered into the Amazon Comprehend Medical to determine whether any PHI is included in the text. Together with Python code, it is possible to rapidly redact PHI in the images. Images in PNG, JPEG, and DICOM formats can be used with this system.

The system provides a confidence score which shows the level of confidence in the precision of the identified entity, which becomes the basis of evaluations to ensure the correct identification of PHI. The confidence level – from 0.00 to 1.00 – may be specified by the user. A confidence level of 0.00 will see all text identified in the images redacted.

Amazon states that the system enables healthcare companies to de-identify large numbers of images quickly, efficiently, and at low cost. Amazon remarks that the system may be employed for batch processing of very large numbers of images. Additionally, it is possible to set up the system with a Lambda function to instantly redact PHI from new images whenever they are uploaded to an Amazon S3 bucket.