Harmonization of vocabularies for water data

  Presented at Hydroinformatics 2014, 2014-08-17, CCNY, New York
  • 1. Harmonization of vocabularies for water data Jonathan Yu | Research engineer HIC 2014, 17 August 2014 LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP
  • 2. Outline • Context and problem space – need formal mechanisms for publishing vocabularies • Use of semantic web tech to publish and harmonise vocabularies • Challenges still exist • conceptualisation as both classes and individuals – pragmatic but problematic • URI patterns • Versioning and keeping track • Suggested paths forward?
  • 3. Issues • Formalization • RDF SKOS OWL • Collections • Re-use/clone/leave alone • URI Patterns • Distribution • UIs/APIs • Versioning • Mappings • Search and discovery Presentation title | Presenter name3 |
  • 4. Formalization: classic glossary – term+definition Presentation title | Presenter name4 | CABI - http://www.cabi.org/ashc/uploads/file/ASHC/8_Glossary__acronyms__index_revised.pdf
  • 5. AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use cas_rn number ANGDTS Code ANGDTS Description Units_used WDTF Parameter chemical name ADWG name IUPAC name Group Ion EC EC ease at which conduction current can be caused to flow through material in microSiemens/centimetre us/cm ms/cm mg/L ElectricalConduc tivityAt25C_uSc m Electrical Conductivity Conductivity PH pH negative logarithm of hydrogen ion concentration in ph units pH units WaterpH_pH pH pH pH, alkalinity, acidity 16887- 00-6 16887- 00-6 concentration of chloride as Cl in milligrams/litre mg/L mg/kg Chloride Chloride Chloride Anion TDS TDS the portion of total solids that passes through filter and deemed to have been dissolved in sample in milligrams/litre mg/L Total Dissolved Solids Total Dissolved Solids Salinity TOTALAL KALINITY ALKT concentration in milligrams/litre CaCO3 of titratable bases using a methyl-orange endpoint of about pH 4.3 mg/L Total Alkalinity (as CaCO3) pH, alkalinity, acidity HARDNE SS_CACO 3 HARD the ability of water to precipitate soap and is sum of calcium and magnesium concentrations as milligrams/litre CaCO3 mg/L Hardness (as CaCO3) Hardness (as calcium carbonate) Hardness (as calcium carbonate) SAR SAR ratio of sodium to magnesium and calcium and used to assess risk of excess sodium in irrigation water Ratio Sodium Adsorption Ratio Salinity 3812-32- 6 ALKC alkalinity ascribed to carbonate in milligrams/litre CO3 mg/L %MOL Carbonate Alkalinity (as CaCO3) Carbonate pH, alkalinity, acidity NITRATE 14797- 55-8 concentration of nitrate as N in milligrams/litre mg/L mg/kg Nitrate Nitrate and Nitrite Nitrate and Nitrite Anion 7439-89- 6 7439-89- 6 concentration of iron as Fe in milligrams/litre mg/L mg/kg ug/L Iron Iron Metal Cation Formalization: table – structure + mappings Healthy Headwater - NGIS Terms
  • 6. Formalization: RDF – SKOS for basic vocabularies Linked Vocabularies | Simon Cox6 | chem:sodium a skos:Concept ; rdfs:label "sodium"^^xsd:string ; skos:broader chem:alkali ; skos:exactMatch <http://dbpedia.org/resource/Sodium> ; skos:inScheme skos:chemicals ; skos:prefLabel "nátrium"@hu , "sodio"@it , "sodium"@fr , "sodium"@en .
  • 7. Formalization: RDFS/OWL add rich predicates • Water Quality Vocabulary Presentation title | Presenter name7 |
  • 8. AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use Formalization: alignment with existing vocabularies (Water Quality extension to QUDT)  QUDT  OP
  • 9. AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use Formalization: link detailed model to SKOS  access using SKOS API
  • 10. Other approaches: OWL Class per concept • deep subsumption hierarchy: SWEET, OBO Presentation title | Presenter name10 | • intersecting constraints: CGI Lithology
  • 11. Formalization challenge • Sometimes formalized as OWL - usually as SKOS (example? SWEET / GEMET?) • Class vs individuals (Example from QUDT?) • Hybrid approaches exist – vocabulary as individuals of classes from an ontology but aligned with SKOS (Example from OP?) • https://www.seegrid.csiro.au/wiki/Siss/VocabularyFormalizationIn SKOS Presentation title | Presenter name11 |
  • 12. Collections skos:Collection –skos:member skos:Concept|skos:Collection • A new collection can claim existing concepts as members • Nested collections skos:Concept –skos:inscheme skos:ConceptScheme • Concepts assert their own membership • No nesting owl:Ontology • No membership predicate – rdfs:member? dct:hasPart? void:Dataset, ldp:Container, reg:Register Presentation title | Presenter name12 |
  • 13. Re-use: new collections from old – clone, or leave alone Presentation title | Presenter name13 | • eReefs WQ vocabulary includes a subset of 330+ chemicals from 36000+ in ChEBI • New resources in local namespace • SKOS *Match predicate gives provenance, link to more detail
  • 14. Clone or leave alone? • Question of caching content vs federating queries/discovery of content • Consider CHEBI – big • Cache or just link to its definitions? • Tradeoff between performance and convenience vs updating and synchronize • LDR allows registration of external resources • New register = subset or combination of terms already published elsewhere? Presentation title | Presenter name14 |
  • 15. URI Patterns – opaque? What does the URL path imply? http://vocab.nerc.ac.uk/collection/G04/current/008/ G04 ISO RoleCode, 008  Principal Investigator http://resource.geosciml.org/classifier/ics/ischart/Pliocene = Pliocene, URI supplied by GeoSciML, definition sourced from International Commission for Stratigraphy (ics), in the collection known as ‘International Stratigraphic Chart’ (ischart) Semantics? Management? Set-membership? Presentation title | Presenter name15 |
  • 16. Versioning • Individual items, set-as-a-whole Presentation title | Presenter name16 |
  • 17. Versioning - 2 Are these the same thing? How can we tell? How can a machine tell? http://sweet.jpl.nasa.gov/1.1/time.owl#PLEISTOCENE http://sweet.jpl.nasa.gov/2.0/timeGeologic.owl#Pleistocene http://sweet.jpl.nasa.gov/2.2/stateTimeGeologic.owl#Pleistocene http://sweet.jpl.nasa.gov/2.3/stateTimeGeologic.owl#Pleistocene Compare with http://resource.geosciml.org/classifier/ics/ischart/Pliocene – URI for the concept http://def.seegrid.csiro.au/sissvoc/isc2014/resource.html ?uri=http://resource.geosciml.org/classifier/ics/ischart/Pliocene – URI for a description of the concept (i.e. record), according to the 2014 version of the service Care with version number in URI! Presentation title | Presenter name17 |
  • 18. Versioning - 3 • Version info in item? http://vocab.nerc.ac.uk/collection/G04/current/008/ a skos:Concept ; skos:prefLabel ”principalInvestigator” ; owl:versionInfo “1” ; dc:date “2012-07-04 10:56:53.0” . Presentation title | Presenter name18 | • Version info in registration record?
  • 19. Versioning • How do we manage versions of definitions? • Do we version a definition of an abstract concept? • Does the definition of the concept change or does our understanding change? • Version the set or individual items? Presentation title | Presenter name19 |
  • 20. Distribution • Vocabulary packaged in a file or page http://resource.geosciml.org/vocabulary/timescale/isc2014.ttl http://resource.geosciml.org/vocabulary/timescale/isc2014.html • Dereference the URI for a resource in the vocabulary http://resource.geosciml.org/classifier/ics/ischart/ (all) http://resource.geosciml.org/classifier/ics/ischart/Cambrian • SPARQL endpoint http://resource.geosciml.org/sparql/isc2014 • Vocabulary service http://def.seegrid.csiro.au/sissvoc/isc2014/collection Presentation title | Presenter name20 |
  • 21. Semantic web tech to publish vocabularies • SISSVoc Presentation title | Presenter name21 |
  • 22. Mappings • Embed in vocabulary vs. store separately? Presentation title | Presenter name22 |
  • 23. Mapping challenge • Linking between ontologies – which to use? All or some? • SKOS relations - exactMatch, closeMatch, narrowMatch, broadMatch • OWL predicates - sameAs for individuals, equivalentClass for classes and equivalentProperty for properties • Dublin core • Prov-O • VoID • VOAF • Linking between classes and individuals in OWL – logics-based reasoning support Presentation title | Presenter name23 |
  • 24. Search and discovery Presentation title | Presenter name24 |
  • 25. Cox, Simons, Yu | Observable property ontology25 |
  • 26. Standards… • The standard ISO 8601 concerns dates, a common type of information used for data and documentation. • March 5, 2014 • 2014-03-05 • 3/5/14 • 05/03/2014 • 5 Mar 2014 • Multiple representations but essentially one meaning Source: http://dataabinitio.com/?p=449 Presentation title | Presenter name26 |
  • 27. Challenges still exist • Variation of formalisation and publication • conceptualisation as both classes and individuals – pragmatic but problematic • URI patterns • Versioning and keeping track Presentation title | Presenter name27 |
  • 28. Suggested paths forward? Presentation title | Presenter name28 |
  • 29. Jonathan Yu Research Software Engineer Jonathan.Yu@csiro.au Bruce Simons SDI Modeller Bruce.Simons@csiro.au ADD BUSINESS UNIT/FLAGSHIP NAME Thank you Terms of use: Image sources from Wikipedia under CC2.0 licence http://en.wikipedia.org/wiki/File:Amazing_Great_Barrier_Reef_1.jpg Simon Cox Research Scientist Simon.Cox@csiro.au http://ereefs.org.au/
