Semantic Mapping Format for CSV-to-JSON Transformation

This document defines a declarative mapping format that enables semantically rich transformation of flat CSV data into structured JSON representations aligned with the goals of the OMG Semantic Augmentation Challenge.

Rather than relying on procedural logic, this format allows you to express conceptual model alignment, identifier binding, and value coercion in a compact, portable way that supports reuse and automation.


Why this Format?

This mapping syntax is designed to:


Mapping Format Overview

The mapping is a JSON object where:

Each mapping entry looks like this:

"path.to.target": {
  "column": "CSV_COLUMN_NAME",
  "type": "string" // or number, boolean, date
}

Or for constant values:

"path.to.constant": {
  "constant": "Institution"
}

Path Syntax

The path syntax supports:

Pattern Purpose
object.property Set a nested property
array[false] Always append to the array
array[$.key == @COLUMN] Find and reuse existing array items, or create if no match
map[{@COLUMN}] Use a row value as a key in an object map. Create or match

Value Resolution

Each mapping value supports:

Field Meaning
column The CSV column name
type "string", "number", "boolean", "date" — coercion
constant A literal value

Data Type Coercion

The mapping format supports coercing CSV strings into strongly typed JSON values using a "type" field. This ensures compatibility with schema validation, semantic tooling, and downstream processing.

Each type is interpreted as follows:

"string"

Default behavior — no coercion is applied:

{ "column": "NAME", "type": "string" }

"number"


"boolean"


"date"

Supports two formats:

  1. Excel-style serial numbers (e.g. 45401"2024-05-04")
  2. Plain date strings (e.g. "01/01/1890")

Output is always ISO-8601: "YYYY-MM-DD"

{ "column": "ESTYMD", "type": "date" }

"constant"

You can skip column entirely and use a constant:

{ "constant": "Branch" }

This is useful for setting fixed "type" values or flags within the output.


Undefined and Null Handling

If a value is empty ('') or the string "NULL", it is treated as undefined and excluded from the output entirely.


Strategic benefits

This format has a number of strategic benefits:

By using declarative mappings and semantic paths, this approach turns traditional tabular data into structured, interoperable knowledge assets.


Example

This input CSV

ID,NAME,AGE,ACTIVE,START
001,Alice,30,TRUE,44561
002,Bob,25,false,01/01/2020
003,Carol,NULL,,NULL

When converted with this mapping:

{
    "type[]": { "constant": "Document" },
    "people[{@ID}].id": { "column": "ID", "type": "string" },
    "people[{@ID}].name": { "column": "NAME", "type": "string" },
    "people[{@ID}].age": { "column": "AGE", "type": "number" },
    "people[{@ID}].active": { "column": "ACTIVE", "type": "boolean" },
    "people[{@ID}].startDate": { "column": "START", "type": "date" }
}

Produces this JSON:

{
  "type": [
    "Document"
  ],
  "people": {
    "001": {
      "id": "001",
      "name": "Alice",
      "age": 30,
      "active": true,
      "startDate": "2021-12-31"
    },
    "002": {
      "id": "002",
      "name": "Bob",
      "age": 25,
      "active": false,
      "startDate": "2019-12-31"
    },
    "003": {
      "id": "003",
      "name": "Carol"
    }
  }
}

This mapping:

Observations: