You are here:
System Of Registries
- Basic Information
The System of Registries (SOR) is an umbrella of interrelated tools, services, data repositories, and system components. These are primarily intended as a resource for developers, and enterprise architects, but may be used by others.
This umbrella system includes registries for the following areas:
- Registry of EPA Applications, Models and Data Warehouses
- Environmental Dataset Gateway
- Reusable Component Services
- Data Element Registry Services
- Terminology Services
- Substance Registry Services
- Facility Registry Services
The System of Registries is supported by a staff of data management and environmental professionals who assist users by facilitating the development of data standards and terminology, promoting identification of reusable components, and supporting the stewardship of system inventories, data dictionaries, and other important EPA metadata resources.
The first registry within the System of Registries was a data registry that contained the metadata (data about data) describing standard and programmatic data elements found in EPA systems and applications. Next, a system inventory was created for centralizing the metadata about EPA systems and applications. This inventory evolved into the “official” EPA system list (READ - the Registry for EPA Applications, Models and Datasets), and is used for many purposes. Other specialized registries for lists of chemicals and facilities were added over time. A terminology registry, similar to a dictionary, was also incorporated to better service the need for supporting data transfer and transformation. Subsequently, eXtensible Markup Language (XML) schema were collected and registered in a system co-managed by EPA and its partners in the Environmental Information Exchange Network.
Over the past few years, many of these systems have been replaced with commercial metadata products and new customized front ends have been developed to meet EPA needs. A service component repository is being added to assist developers looking for web services and other reusable components. Web services are also being incorporated directly into the System of Registries, to allow the direct use of environmental metadata in multiple EPA and partner systems. This serves to greatly improve the quality of information about EPA data. It “brings meaning to data” as it is used for analysis and decision-making.
- Roles in Data Quality
A primary purpose of the System of Registries is to support data quality throughout the EPA. Quality is determined by six measures –
The System of Registries plays a role in each of these areas.
Management structures inherent in the System of Registries contribute to quality assurance. The capability of the Registries to supply translation and automatic data validation is used in quality control.
Comprehensive metadata reduces redundancy and improves accuracy as well as ensuring understanding of data. Complete documentation of information about EPA data and its meaning makes the data valid, accurate, and understandable. Thus, comprehensive metadata is used in measuring consistency, accuracy, uniqueness, and completeness.
Comprehensive information about EPA information assets allows them to be reused appropriately. Reuse increases efficiency and helps to assure accuracy. Automated translation and validation of data reduces errors in data received by EPA. This reduces costs associated with the need to correct poor quality data after receipt. Since such services are automated, delays are minimized, and timeliness is increased. Thus, reusability is used to measure consistency, accuracy, validity, and completeness.
The Registries facilitate reuse and ease of access to EPA information assets. Reuse of metadata, models, tools, and services by EPA staff, contractors, and partners reduces time, cost, and error in the system’s development process. Thus, use of the Registries in systems development can provide measures of consistency across systems, timeliness and accuracy in communications between systems, and validity and completeness of information contained within systems.
- Vision & Planning
The System of Registries supports EPA’s business by contributing to its architecture, system development, and the EPA's ability to understand and exchange environmental information among its various programs and with its partners. The System of Registries will help promote reuse of data, metadata, and Service Oriented Architecture (SOA) components.
For example, the System of Registries contains a registry of chemical identifiers to ensure consistent use of chemical information across EPA. This registry maps the various ways a chemical could be identified (e.g., common name, chemical formula, chemical name) to a single EPA-wide authoritative identifier.
The System of Registries is a tool which allows EPA data management staff and data owners to better document, organize, and manage data. The importance of the registries lies in their ability to facilitate services dedicated to improving data access and quality.
A major focus of the System of Registries is to help prepare EPA for Semantic Web technologies. This transition will require a major shift in the way data is managed and, in fact, in the way data is considered. The underlying meaning of data must become its critical characteristic rather than its format or the terms relating to it.
The Semantic Web is not a separate Web but an extension of the current World Wide Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. For the Semantic Web to function, computers must have automated access to collections of information and to rules that they can use to process them.
Just as current search engines can now search across documents on the web and find words that have been specified as the search criteria, Semantic Web search engines will be able to find information in databases as well as documents based on specified meanings and not just search terms. In order for this to occur, data must be associated with its meaning in such a way that automated tools and machines can process it without direct human intervention.
Terms and concepts are described by definitions. All three of these (terms, concepts, and definitions) are used to assist us in managing meanings. Ultimately, data objects with the same specific meaning should be associated with each other and that meaning. Managing terms and concepts ensures that items with the same meaning, regardless of how they are represented, will be related. It will also ensure that items not the same will not be confused with each other.
Used together all the registries help to move beyond the restrictions of data names and codes to the essential meaning of the data. This allows for discovery and use of data actually related to the same concepts with widely varying names, definitions, codes and types.
A key functionality of the System of Registries is an ability to register, map, and manage concepts used in EPA and partner systems. These concepts represent various objects and characteristics that are used within EPA and its community of partners. Through the associations of like concepts, searching and understanding of items within the System of Registries will be enhanced. Since many concepts are represented by multiple terms, it is necessary to document the meaning of each concept. It is also necessary to document each concepts various relationships.
Documenting and organizing EPA information assets via their concepts will span the continuum from values in a database to aggregations of information in documents. It is a powerful approach to information management which was impossible before the advent of modern registry tools.
Understanding and documenting the meanings of EPA information assets enables appropriate reuse. This reuse ranges from well-formed data definitions to models and to standards and services. This enabling of reusability enhances efficiency and quality and reduces cost.
Understanding the meanings also assists in minimizing misuse of EPA information assets. The System of Registries will help system developers, architects, and other users to find things to reuse. Registries provide information about meaning, quality, and intended purpose of an asset. They provide you with sufficient information to know whether or not the asset should be reused for your specific purpose.
- What is a Registry?
A Registry provides the ability to register, map, and manage information important to EPA and its partners. Registries like the Facility Registry and the Substance Registry maintain information about business objects common throughout EPA. Some registries, like the Reusable Component Registry, make quality assets available for reuse. Others, like the Data Registry, collect information about EPA systems and data stored in them. In all cases, registries serve as organizing structures for the purpose of facilitating discovery of and access to EPA information resources.
A directory to EPA metadata services and information. It contains information which makes EPA data accessible and understandable.
A storehouse of the information assets themselves. Instead, it contains information about the assets such as:
- What they are
- Where to find them
- How to use them
Exchange Network partners use the:
- Data Element Registry to register code sets and data dictionaries
- Terminology Services for vocabularies
- Substance Registry Services (SRS) for substance lists and substance identification information; e.g., for chemicals
- Facility Registry Services (FRS) for information about facilities
- Reusable Component Services for reusable assets that are used in exchange of information across the Network