WP8 – Data Management
The CARE work-package 8 (WP8) Data Management team focuses on the implementation, operation and administration of two databases, including a Labkey data management platform to host immunological test data from WP5 and clinical trials data from WP7 in order for the Inserm SISTM team to perform statistical analysis. WP8 Data Management Task Lead, Panthéa Tzourio (SISTM, Inserm) shares her views on this work of the CARE project.
What has your team delivered so far and what are you still working on during the remaining time in CARE?
SISTM is a data sciences team based in Bordeaux directed by Pr. Rodolphe Thiébaut, specialising in developing appropriate statistical and modelling tools to describe the immune response to infections and vaccines using clinical and various complex biological datasets. SISTM is involved at several levels in CARE: statistical analysis of immune data for Work Package 5 (WP5), clinical trial methodology support for Work Package 7 (WP7) and data management in a dedicated data warehouse system in Work Package 8 (WP8).
Our main activity so far has been to manage the implementation, operation and administration of a Labkey Data management platform to host of Immunological test data from WP5 and Clinical trials conducted in WP7 in order for the SISTM team to perform statistical analysis. To reach this goal, we contributed to the preparation of the CARE Data Management Plan (DMP). This document, created during the preliminary phase of the project, is a living document outlining and describing the research data management pipeline and of course, requesting updates when either the data to be transferred or the repository system evolves.
In the case of Labkey, we expanded the data platform capacities by upgrading to a Primary edition of Labkey to better address specific research requirements. This included the direct connection between the data present in Labkey with the RStudio server analysis environment (in full GDPR compliancy) and the capacity to store and share the bio-statistical R analysis results and visualizations with the research team members within the study project secure environment in Labkey.
We also acquired and set-up integrated analyzing modules for lab experimental data file e.g. Nab, Elisa, Elispot, Luminex, Flow Cytometry. Those updates lead to modifications of the initial DMP submitted to IMI for approval.
We are glad to say that LabKey is ready to work with CARE data and we have already uploaded the French COVID 19 cohort data in our Labkey platform, analyzed the data and published the results (Levy et al., 2021). We have also created in-house statistical methods (differential gene expression analysis, intra-cellular staining analysis etc.) for which we developed a set of integrated visualization tools. At this stage, our work for CARE continues with the development of data visualization tools, refinement of statistical methods and obviously the maintenance of the data warehouse. We are fully ready to receive the data from the laboratories involved in WP5 investigations and the data from clinical trials.
What have you personally learned so far through the experience of working in CARE?
As the IT project manager in SISTM, CARE was my first experience in a research project of this magnitude. I really find it enriching to see the commitment of so many teams and industries to collaborate and advance the research in anticipation of future pandemics. Of course, competition is still present, but it was really nice to see how the IMI funding scheme really helped defining a common ground on which to build up new knowledge and paths for the future developments.
What has surprised you about working on the CARE project?
As I said above, the magnitude of the project and the quality of the profiles involved are impressive. It is amazing that such a “big machine” like the CARE consortium, representing a variety of countries, public or private organizations works and even better: is delivering. I can say I am very glad to be part of this adventure.
What makes the work for CARE special for you and your team?
For SISTM, which has been involved almost since the beginning of the pandemic in the analysis and modeling of COVID data, CARE is the opportunity to work with the best representatives of the scientific and industrial community and thus keep contributing to the global effort. On a more personal note, I like the idea to keep the research and innovation focused on future COVID outbreak risks: providing more tools to avoid the need to pay such a heavy toll ever again.
What are the biggest challenges you have encountered with respect to your task activities? (and how did you overcome them?)
Without hesitation, I would say the biggest challenge we encountered in our activities was linked to the data management of some patients’ clinical and biological data collected during the COVID pandemic that we received from a variety of sources.
For most of these transferred data, we were not associated with the initial design of data or the initial protocol and as can be understood, during the pandemic crisis the hospitals collected in haste and did not have time or resources to harmonize their practices. This led to the creation of heterogeneous cohorts with missing data or discrepancies on files structures, common variable nominations, data column formats, etc…
This is why with this experience in mind, whenever possible we now try to take an upstream approach, and participate to the protocol designs so to understand or propose the proper structure for the different data to collect and define the proper information to put in the Data Transfer Agreement.
Although, this is a rather current practice in the industry, where the different services are integrated, there is still some room for improvement in academic structures, but this is definitely something on which we are making progress!