Share to: share facebook share twitter share wa share telegram print page

LinkML

The Linked Data Modeling Language, or LinkML, is a data modeling framework based on YAML that aims to "bring semantic web standards to the masses, simplifying the production of FAIR, ontology-ready data."[1] It can be used to specify data schemas that can be used for validating data via formats like JSON-LD and ShEx.

Description

The Linked data Modeling Language is an object-oriented data modeling framework aiming to simplify the production of FAIR data. It is intended to be used for schematizing a variety of kinds of data, ranging from simple flat checklist-style standards to complex interrelated normalized data utilizing polymorphism/inheritance.[1]

LinkML structure

An overview of the LinkML schema, including the metadata for the file itself, the namespaces used, dependencies on other schemas and the actual model, with links to RDF URIs (here examplified by schema.org).

LinkML is designed to be aligned with the standards used by modern software developers and database engineers, including JSON files, relational databases, document stores and object models in Python, while providing a semantic web underpinning, mapping all elements to Resource Descriptoin Framework (RDF) Uniform Resource Identifiers (URIs). LinkML’s formal RDF-based framework aims at abstracting the complexity and allow semantics to "hide in plain sight".[1]

The basic structure is a schema plus associated metadata (including namespace to URI mapping), a set of classes, plus their attributes. Classes follow object-oriented semantics rather than OWL semantics, allowing classes to also be metaclasses, enabling the modeling of design patterns used in some computational ontologies. Each element in the schema can be assigned URIs from existing vocabularies, allowing for increased interoperability. [1]

The LinkML schema tries to anchor the meaning of free text strings by establishing identity via resolvable URIs. Technically, the framework allows the models using both open and closed world assumptions, and when operating in a closed world provides ways to validate schema instances and their relations, using modeling paradigms like JSON-Schema and SQL-DDL. The language itself reuses terms from other ontologies, such as the Simple Knowledge Organization System Namespace (SKOS). The formalims allow users to extend or reuse existing object definitions while at the mapping data to existing standards where appropriate (for example, a "gene" object in one LinkML schema can be mapped directly to another LinkML schema’s representation of a "gene" via the "skos:exact_match" predicate).[1]

Generators

Tooling referred to as LinkML generators can translate from the schema YAML to a growing number of other formats, including JSON-schema, JSON-LD, SQL DDL, ShEx, GraphQL, Python data classes, Markdown and UML diagrams.[1]The LinkML runtime provides loaders and dumpers to convert instances of the schema between these formats.

Use cases

LinkML is used in a range of projects in biomedical research, including:

LinkML also being discussed as a way to model to store microscopy research data in schemas that allow exports in diverse data formats, including JSON-LD and SQL, particularly in the context of the German NFDI4BIOIMAGE.[5][6][7]

Its usage includes basic research on biocuration and the semantic web, e.g in the context of data integration using Shape Expressions,[8] in the development of standards for mapping ontologies[9] or on integration of LLMs into biocuration practice. [10]

The Biolink Model, a schema for knowledge graphs in the clinical and biomedical sciences, is built upon LinkML.[11]

Notes

This page contains text in CC-BY 4.0 from Sierra Moxon and colleagues, from the publication The Linked Data Modeling Language (LinkML): A General-Purpose Data Modeling Framework Grounded in Machine-Readable Semantics, in the proceedings of the International Conference on Biomedical Ontologies in 2021.[1]

References

  1. ^ a b c d e f g h i j k Moxon, S.; Solbrig, H.; Unni, Deepak R.; Jiao, Dazhi; Bruskiewich, R.; Balhoff, J.; Vaidya, Gaurav; Duncan, William D.; Hegde, Harshad B.; Miller, Mark; Brush, Matthew H.; Harris, N.; Haendel, M.; Mungall, C. (2021). "The Linked Data Modeling Language (LinkML): A General-Purpose Data Modeling Framework Grounded in Machine-Readable Semantics".
  2. ^ Eloe-Fadrosh, E. A.; et al. (2022-10-30). "The National Microbiome Data Collaborative Data Portal: an integrated multi-omics microbiome data resource". Nucleic Acids Research. 50 (D1): D828 – D836. doi:10.1093/nar/gkab990. ISSN 1362-4962. PMC 8958897.
  3. ^ The Alliance of Genome Resources Consortium; Aleksander, Suzanne A; Anagnostopoulos, Anna V; Antonazzo, Giulia; Arnaboldi, Valerio; Attrill, Helen; Becerra, Andrés; Bello, Susan M; Blodgett, Olin; Bradford, Yvonne M; Bult, Carol J; Cain, Scott; Calvi, Brian R; Carbon, Seth; Chan, Juancarlos (2024-05-07). Wood, V (ed.). "Updates to the Alliance of Genome Resources central infrastructure". Genetics. 227 (1) iyae049. doi:10.1093/genetics/iyae049. ISSN 1943-2631. PMC 11075569. PMID 38552170.
  4. ^ Putman, Tim E; Schaper, Kevin; Matentzoglu, Nicolas; Rubinetti, Vincent P; Alquaddoomi, Faisal S; Cox, Corey; Caufield, J Harry; Elsarboukh, Glass; Gehrke, Sarah; Hegde, Harshad; Reese, Justin T; Braun, Ian; Bruskiewich, Richard M; Cappelletti, Luca; Carbon, Seth (2024-01-05). "The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species". Nucleic Acids Research. 52 (D1): D938 – D949. doi:10.1093/nar/gkad1082. ISSN 0305-1048. PMC 10767791. PMID 38000386.
  5. ^ Vierdag, Wouter-Michiel A. M.; Saka, Sinem K. (2024-02-09). "A perspective on FAIR quality control in multiplexed imaging data processing". Frontiers in Bioinformatics. 4 1336257. doi:10.3389/fbinf.2024.1336257. ISSN 2673-7647. PMC 10885342. PMID 38405548.
  6. ^ Moore, Josh; Kunis, Susanne. "NFDI4BIOIMAGE: Perspective for a national bioimaging standard - 14th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences 2023" (PDF). ceur-ws.org. Retrieved 2025-09-25.
  7. ^ Moore, Joshua Allenm; Kunis, Susanne (2023-05-12). "[SWAT4HCLS 2023] NFDI4BIOIMAGE: Perspective for a national bioimage standard". doi:10.5281/zenodo.7928333. {{cite journal}}: Cite journal requires |journal= (help)
  8. ^ "OSF". osf.io. Retrieved 2025-09-25.
  9. ^ Matentzoglu, Nicolas; Balhoff, James P; Bello, Susan M; Bizon, Chris; Brush, Matthew; Callahan, Tiffany J; Chute, Christopher G; Duncan, William D; Evelo, Chris T; Gabriel, Davera; Graybeal, John; Gray, Alasdair; Gyori, Benjamin M; Haendel, Melissa; Harmse, Henriette (2022-05-25). "A Simple Standard for Sharing Ontological Mappings (SSSOM)". Database. 2022 baac035. doi:10.1093/database/baac035. ISSN 1758-0463. PMC 9216545. PMID 35616100.
  10. ^ Caufield, Harry; Kroll, Carlo; O'Neil, Shawn T.; Reese, Justin T.; Joachimiak, Marcin P.; Hegde, Harshad; Harris, Nomi L.; Krishnamurthy, Madan; McLaughlin, James A. (2024-10-29). "CurateGPT: A flexible language-model assisted biocuration tool". arXiv:2411.00046 [cs.CL].
  11. ^ Unni, Deepak R.; Moxon, Sierra A. T.; Bada, Michael; Brush, Matthew; Bruskiewich, Richard; Caufield, J. Harry; Clemons, Paul A.; Dancik, Vlado; Dumontier, Michel; Fecho, Karamarie; Glusman, Gustavo; Hadlock, Jennifer J.; Harris, Nomi L.; Joshi, Arpita; Putman, Tim (2022). "Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science". Clinical and Translational Science. 15 (8): 1848–1855. doi:10.1111/cts.13302. ISSN 1752-8062. PMC 9372416. PMID 36125173.
Prefix: a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9

Portal di Ensiklopedia Dunia

Kembali kehalaman sebelumnya