![]() |
| July 2026 - Issue 20 |
| April 2026 - Issue 19 |
| January 2026 - Issue 18 |
| November 2025 - Issue 17 |
| October 2025 - Issue 16 |
| September 2025 - Issue 15 |
| May 2025 - Issue 14 |
| March 2025 - Issue 13 |
| Winter 2024 - Issue 12 |
| Fall 2024 - Issue 11 |
| Fall 2024 - Issue 10 |
| Spring 2023 - Issue 9 |
| Fall 2022 - Issue 8 |
| Summer 2021 - Issue 7 |
| Spring 2020 - Issue 6 |
| Summer 2019 - Issue 5 |
| February 2019 - Issue 4 |
| September 2018 - Issue 3 |
| March 2018 - Issue 2 |
| November 2017 - Issue 1 |
| Introduction |
July 2026 - Issue 20
Read the July 2026 (Issue 20) Edition
Building the Future of Research Data at UCSF
Executive Council recently endorsed a faculty-led initiative to improve research data discovery, reproducibility, and AI-enabled science, while launching a pilot project supported by a $50,000 Chancellor's Fund award.
Every year, UCSF investigators generate thousands of research datasets representing substantial investments of public funding, scientific expertise, advanced technologies, and access to patients and biological samples. However, without consistent metadata standards, much of that information remains difficult to discover, integrate, reuse, and sustain—an increasingly urgent challenge as FAIR principles and NIH data-sharing policies reshape expectations for research data stewardship. While these problems may seem local, collectively they impose a substantial tax on scientific progress by slowing discovery, limiting collaboration, and reducing the long-term value of publicly funded research.

These challenges are not simply information technology problems—they are scholarly communication problems. As the Senate committee charged with advising on scholarly communication, open scholarship, and the accessibility and dissemination of research, the Committee on Library and Scholarly Communication (COLASC) recognized that modern scholarship increasingly extends beyond journal articles to include datasets, software, workflows, and digital repositories. To address the need for resources to remain discoverable, interoperable, reusable, and sustainable over time, COLASC established the Research Data and Metadata Standardization (RDMS) Task Force in late 2024.
The Task Force's review identified four major institutional challenges at UCSF:
- Research datasets are often difficult to discover because metadata standards are inconsistent and there is no effective way to search across the many independent repositories, databases, and data lakes that exist throughout UCSF.
- Information describing how data were generated, processed, and analyzed is documented differently across research groups, making datasets difficult to interpret, compare, and integrate.
- UCSF lacks shared infrastructure that enables efficient collaboration and consensus-building around data analysis.
- The separation of publications, datasets, and analytical code contributes to broader concerns regarding research reproducibility.
To better understand current practices and institutional needs, the RDMS Task Force conducted a survey between April and June 2025, which generated 181 responses across all UCSF schools, 35 departments, and 34 institutes, centers, and organized research units. The results revealed that metadata standardization remains uncommon across the institution: 67% do not use standardized nomenclature to describe their data, 70% do not use metadata standardization tools, and 84% are unaware of organizations that develop metadata standards. 65% indicated that they believe a reproducibility crisis exists in biomedical research, and only 17% believed UCSF is adequately addressing the issue institutionally. Nearly 80% supported greater institutional investment in open science, with many calling for centralized support, improved infrastructure, stronger data stewardship practices, and additional data science expertise.

This figure below illustrates the long-term vision of the Research Data and Metadata Standardization (RDMS) initiative, which combines standardized metadata, AI-assisted curation, Starfish indexing, and secure storage to transform research datasets into discoverable institutional assets that support collaboration, reproducibility, and AI-enabled discovery.

Another important factor distinguishing this effort from previous discussions about research data management is the availability of new institutional infrastructure as UCSF’s Information Technology has already invested in Starfish, a metadata-driven research data management platform used by major research institutions, universities, and life science organizations to catalog and manage large-scale research data environments, and is already using it to scan files within the Facility for Advanced Computing (FAC). The RDMS initiative provides the missing ingredient: a standardized metadata framework that allows these discovery capabilities to be fully realized. Together, FAC, Starfish, and the metadata standards developed through RDMS could create the foundation for a searchable research data ecosystem in which investigators can identify relevant datasets across participating repositories without needing to know where those data are stored. In this model, FAC provides secure storage and computing infrastructure, Starfish serves as the metadata catalog and discovery layer, and RDMS provides the standards, governance, and AI-assisted tools that make meaningful dataset discovery possible.
The Task Force concluded that the need for action is becoming increasingly urgent as the research landscape evolves. National and international initiatives, including FAIR data principles and the NIH Data Management and Sharing Policy, are raising expectations for how research data should be managed, documented, preserved, and shared. At the same time, leading research institutions are increasingly treating datasets and software as first-class scholarly products alongside traditional publications. Metadata standardization is therefore not simply a compliance exercise; it is an academic excellence strategy for the twenty-first century.
Artificial Intelligence. The emergence of artificial intelligence (AI) adds a new dimension to the challenge, as these technologies offer unprecedented opportunities for data synthesis, pattern recognition, workflow automation, code optimization, and scientific discovery. However, AI systems are only as effective as the data they can access and understand. Poorly organized or inconsistently described datasets limit the ability of AI tools to identify meaningful relationships across studies and can amplify existing data quality problems—the familiar principle of "garbage in, garbage out." At the same time, the growing use of AI in research raises important questions related to data security, intellectual property, and the increasingly blurred distinction between computer-generated and human-generated scientific outputs. The Task Force concluded that metadata standardization is foundational to responsible AI-enabled science because AI requires structured, machine-readable data to achieve its full potential while ensuring that research data remain trustworthy, interpretable, and reusable. To that end, one of the pilot project's major goals is the development of an AI-powered metadata standardization wizard that will help researchers annotate datasets using standardized terminology and metadata frameworks. By automating portions of the curation process, the tool aims to reduce administrative burden while improving consistency, discoverability, and reuse, making high-quality data stewardship easier and more valuable for researchers.
Timeline. To address these challenges, the Task Force proposed a phased strategy, leveraging the FAC and the Starfish metadata platform, while focusing initially on harmonizing omics datasets from the UCSF Data Library, OMICON, and the Benioff Microbiome Center to build a more structured and discoverable research data ecosystem at UCSF:
- Year One will establish metadata standards, AI-assisted tools, pilot dataset harmonization, Starfish integration, and governance.
Year Two will expand harmonization, integrate with Project One, enhance search capabilities, and establish incentives for data stewardship.
Year Three and beyond will expand the framework beyond omics datasets, create institution-wide dataset discovery capabilities, pursue certification of a UCSF data repository, strengthen connections between datasets and code repositories, and establish sustainable long-term stewardship practices.

Chancellor’s Fund. At its June meeting, the Academic Senate Executive Council unanimously endorsed the RDMS Task Force recommendations and approved a $50,000 Chancellor's Fund award to launch the pilot phase of the initiative. The funding will support Year 1 activities, while positioning UCSF to pursue larger external funding opportunities, including a planned National Library of Medicine R01 proposal. Consistent with the Chancellor's Fund mission of supporting faculty scholarship and faculty life, the initiative responds directly to concerns raised by UCSF investigators by reducing barriers to data discovery, organization, sharing, and reuse, allowing researchers to spend less time managing data and more time conducting research.
Governance. Beginning in the 2026–27 academic year, COLASC will establish a new subcommittee to oversee implementation of the pilot project and operationalize the strategy outlined in the Task Force report. The subcommittee will include faculty and staff representatives associated with the pilot repositories, as well as representatives from the UCSF Library, Information Technology, and data security and compliance offices.
The challenge is no longer producing research data. UCSF investigators already generate thousands of datasets each year through extraordinary investments of funding, expertise, technology, and collaboration. The challenge is ensuring that those data can be found, understood, reused, and transformed into new discoveries. Through the RDMS initiative, COLASC, the Academic Senate, and campus partners are working to build the infrastructure, standards, governance, and support needed to help UCSF researchers do exactly that. By reducing barriers to collaboration, strengthening reproducibility, enabling responsible use of artificial intelligence, and treating research data as a long-term institutional asset, UCSF has an opportunity to position itself as a national leader in modern scholarly practice while accelerating its mission to advance health worldwide.

