The Linked Data Modeling Language, or LinkML, is a data modeling framework based on YAML that aims to "bring semantic web standards to the masses, simplifying the production of FAIR, ontology-ready data."[1] It can be used to specify data schemas that can be used for validating data via formats like JSON-LD and ShEx.
Description
The Linked data Modeling Language is an object-oriented data modeling framework aiming to simplify the production of FAIR data. It is intended to be used for schematizing a variety of kinds of data, ranging from simple flat checklist-style standards to complex interrelated normalized data utilizing polymorphism/inheritance.[1]
LinkML structure
An overview of the LinkML schema, including the metadata for the file itself, the namespaces used, dependencies on other schemas and the actual model, with links to RDF URIs (here examplified by schema.org).
The basic structure is a schema plus associated metadata (including namespace to URI mapping), a set of classes, plus their attributes. Classes follow object-oriented semantics rather than OWL semantics, allowing classes to also be metaclasses, enabling the modeling of design patterns used in some computational ontologies. Each element in the schema can be assigned URIs from existing vocabularies, allowing for increased interoperability. [1]
The LinkML schema tries to anchor the meaning of free text strings by establishing identity via resolvable URIs. Technically, the framework allows the models using both open and closed world assumptions, and when operating in a closed world provides ways to validate schema instances and their relations, using modeling paradigms like JSON-Schema and SQL-DDL. The language itself reuses terms from other ontologies, such as the Simple Knowledge Organization System Namespace (SKOS). The formalims allow users to extend or reuse existing object definitions while at the mapping data to existing standards where appropriate (for example, a "gene" object in one LinkML schema can be mapped directly to another LinkML schema’s representation of a "gene" via the "skos:exact_match" predicate).[1]
Generators
Tooling referred to as LinkML generators can translate from the schema YAML to a growing number of other formats, including JSON-schema, JSON-LD, SQL DDL, ShEx, GraphQL, Python data classes, Markdown and UML diagrams.[1]The LinkML runtime provides loaders and dumpers to convert instances of the schema between these formats.
LinkML also being discussed as a way to model to store microscopy research data in schemas that allow exports in diverse data formats, including JSON-LD and SQL, particularly in the context of the German NFDI4BIOIMAGE.[5][6][7]
Its usage includes basic research on biocuration and the semantic web, e.g in the context of data integration using Shape Expressions,[8] in the development of standards for mapping ontologies[9] or on integration of LLMs into biocuration practice. [10]
This page contains text in CC-BY 4.0 from Sierra Moxon and colleagues, from the publication The Linked Data Modeling Language (LinkML): A General-Purpose Data Modeling Framework Grounded in Machine-Readable Semantics, in the proceedings of the International Conference on Biomedical Ontologies in 2021.[1]