How to Ensure Data Integrity and Compliance of Your Data Analytics Systems

Data Analytics
Oct 31, 2018  |  7 min read

Making sure your data and processes from research and development through to production are compliant is essential in today's highly regulated life science, biopharma, pharmaceutical and food industries. But it's no easy task. Following all of the required steps and ensuring the integrity of your data at every stage is easier and more successful when you use a product designed to keep your data compliant.

What does a compliant data analytics solution look like? Let's take a look at the components and essential elements of a solution that meets all of the regulatory requirements for data integrity and validation.

This article is posted on our Science Snippets Blog.

Elements of Compliance

Compliance awareness and efficient quality assurance procedures are important cornerstones of a data analytics tool that meets the regulatory guidelines. At the heart of this process is awareness of Good Automated Manufacturing Practice (GAMP) and FDA regulation: 21 CFR part 11. GAMP5 requires a system to use compliant validation processes; and compliance with 21 CFR part 11 requires that electronic signatures are used so that any change can be logged and attributed to an individual.

Compliance overview chart: GAMP 5 and 21 CFR part 11 compliance requires following a series of steps including electronic signatures for any changes and an audit trail for all electronic records. 

What is GAMP 5?

Production systems for the pharmaceutical and food industries have to comply with ever-stricter legislation, including regulations of the European Medicine Agency (EMA) and Food & Drug Administration (FDA). Although GAMP is not legislation, it’s an important guideline for companies involved in the development and/or implementation of automated production systems.

GAMP is both a technical subcommittee of the International Society for Pharmaceutical Engineering (ISPE) and a set of guidelines for manufacturers and users of automated systems in the pharmaceutical industry. It describes a set of principles and procedures that help ensure that pharmaceutical products have the required quality. One of the core principles of GAMP is that quality cannot be tested into a batch of product but must be built into each stage of the manufacturing process. As a result, GAMP covers all aspects of production; from the raw materials, facility and equipment to the training and hygiene of staff.  (Source).

What is Title 21 CFR Part 11?

Title 21 CFR Part 11 is the part of Title 21 of the Code of Federal Regulations that establishes the United States Food and Drug Administration (FDA) regulations on electronic records and electronic signatures (ERES). Part 11, as it is commonly called, defines the criteria under which electronic records and electronic signatures are considered trustworthy, reliable, and equivalent to paper records. (Source).

Audit trail for all data

An important element of compliance is to ensure is that electronic records are supported with an audit trail, where Event, User, Timestamp, and a required comment are logged (electronic signature). Being 21 CFR part 11 compliant also means having a good grip on the issue of data integrity.

Data Integrity

The importance of the concept of data integrity is emphasized in an FDA guidance document for industry entitled “Data Integrity and Compliance With CGMP”. A driving force behind releasing this industry guidance is that the FDA increasingly encountered violations involving data integrity during inspections. The guidance document is structured as a Q&A document and strives to explain what the FDA expects from industry.

Data integrity refers to the fact that data must be reliable and accurate over its entire lifecycle. Data integrity assumes that data are complete, consistent and accurate.

An often seen acronym in this context is ALCOA, which FDA uses to define five data integrity attributes corresponding to good manufacturing practices.

What is ALCOA?

The five data integrity attributes as defined by the FDA:

  • Attributable. Data must be stored so that it can be connected to the individual who produced it. Every piece of data entered into the record must be fully traceable in time.
  • Legible. Data must be traceable, permanent, readable, and understandable by anyone using the record. This also applies to any metadata attached to the record.
  • Contemporaneous. Data must be fully documented at the time they are generated or acquired.
  • Original. Data must be the original record or in a certified copy. The data record should include the first data entered and all successive data entries required to fully understand the data.
  • Accurate. Data must be correct, truthful, complete, valid and reliable.

In today’s marketplace, companies need to feel confident that there is no loss of quality or information when using software solutions. So, then, how are the various solutions of the Umetrics® Suite of Data Analytics Solutions living up to the standards expected for sound data integrity? Let's take a closer look at SIMCA®-online, as an example.

Data Integrity and SIMCA®-online

SIMCA®-online enables multivariate process monitoring and control using SIMCA® models and data taken from a data source, such as a process historian, in real-time. SIMCA®-online provides extreme value to customers in regulated industries, such as biopharma, where 21 CFR 11 and CGMP are important factors.

Over the last 20 years, SIMCA®-online has continuously been improved to match customer- and regulatory demands. SIMCA®-online is validated software; each release is thoroughly tested to ensure it is correct, compared to previous versions and to the SIMCA® software for offline modeling. The core features of SIMCA®-online that support regulatory compliance include:

  • Data sources. More specifically, the data used in SIMCA®-online comes from a data source (e.g. a database or a historian), such as an OSIsoft PI server. Acting as a layer, the data source “owns” the data and guarantees its integrity. The SIMCA®-online server samples data from the data source at regular intervals, and executes SIMCA® projects using the data. Sampled data, and results from the multivariate models are stored. The server is robust and uses database transactions and other techniques to ensure data integrity.
  • Data storage. SIMCA®-online stores its data uncompressed in an internal database. Examples of “items” stored are raw data, metadata, configurations, models and copies of the SIMCA® projects that contain the multivariate models. This internal database is not accessible from the outside using third party tools, but only using SIMCA®-online desktop and web clients, and through the Web API. These protocols meet the strict regulatory standards. 
  • Backups/encryption. The SIMCA®-online server is installed on a Windows server hardware or machine. Security and backup procedures on that hardware or machine in question are important for the integrity of the server and thus for data integrity in a larger context. Network traffic between the SIMCA®-online server and the desktop client is encrypted. Web clients and the Web API can be protected by TLS (Transport Layer Security; cryptographic protocol that provide the communications security when you use HTTPS in a web browser to connect to a web server). This means that the communication cannot be tampered with.
  • Security/authentication. To access SIMCA®-online, a user needs to authenticate via user accounts in SIMCA®-online, or using Active Directory authentication. Groups and roles can be used to control access to project configurations, and which tasks (such as deleting data) a user is allowed to perform. This meets compliance with  21 CFR part 11 - as any changes require a user input (comment) and user password.
  • Audit trails. User-initiated actions in SIMCA®-online are audited on the server, in a server audit trail, and per project configuration audit trails. These show when something was done, what was done, and by whom. Electronic signatures (providing a user name and passwords) are used for potentially destructive actions. Furthermore, server logs provide an additional way that data integrity can be verified by enabling administrators to learn exactly what the server did when analyzing potential issues. 
  • Revisions log. Revisions of project configurations are stored and there is built-in comparison of revisions. Roll-back to earlier versions is possible.

Quality Assurance and Validation

Umetrics quality assurance team works fulltime with testing and validation of our software products. All team members have attended GAMP and 21 CFR Part 11 related training. The test phase includes testing of new functionality, graphical and numerical regression testing and robustness testing. Bugs found during testing are addressed by the development team. The test phase is complete when all scheduled tests have been run, and re-run as required, and the software conforms to the quality goals. The test period is usually 3-12 months, depending on the software.

The validation phase takes place after the code has been locked for changes. Validation of the offline software includes validation of new functionality and graphical and numerical validation vs. the previous version or specification. The online software is validated vs. the current version of the corresponding offline software.

Validation documentation is assembled in a validation package with a summarizing validation report. The validation is then approved and signed by the Managing Director before it is made publicly available. Backups of the software source code and all original validation documents are stored securely by a third party.

Want to Know More?

Download the Quality Overview Document from Sartorius Data Analytics showing the development lifecycle and test plan for Umetrics® Suite solutions.

Download the Quality Overview Guide