Semantic Mapping Format for CSV-to-JSON Transformation

This document defines a declarative mapping format that enables semantically rich transformation of flat CSV data into structured JSON representations aligned with the goals of the OMG Semantic Augmentation Challenge.

Rather than relying on procedural logic, this format allows you to express conceptual model alignment, identifier binding, and value coercion in a compact, portable way that supports reuse and automation.

Why this Format?

This mapping syntax is designed to:

Link flat tabular data to nested semantic structures
Support identifier-based merging and grouping
Define schema-aware coercion (string, number, boolean, date)
Enable downstream use in JSON-LD, JSON Schema, and linked data
Be interpretable by both humans and machines
Allow alignment with ontology-based models (e.g. FIBO)

Mapping Format Overview

The mapping is a JSON object where:

Keys are semantic paths describing the structure of the output.
Values describe how to extract or compute values from CSV rows.

Each mapping entry looks like this:

"path.to.target": {
  "column": "CSV_COLUMN_NAME",
  "type": "string" // or number, boolean, date
}

Or for constant values:

"path.to.constant": {
  "constant": "Institution"
}

Path Syntax

The path syntax supports:

Pattern	Purpose
`object.property`	Set a nested property
`array[false]`	Always append to the array
`array[$.key == @COLUMN]`	Find and reuse existing array items, or create if no match
`map[{@COLUMN}]`	Use a row value as a key in an object map. Create or match

Value Resolution

Each mapping value supports:

Field	Meaning
`column`	The CSV column name
`type`	`"string"`, `"number"`, `"boolean"`, `"date"` — coercion
`constant`	A literal value

Data Type Coercion

The mapping format supports coercing CSV strings into strongly typed JSON values using a "type" field. This ensures compatibility with schema validation, semantic tooling, and downstream processing.

Each type is interpreted as follows:

`"string"`

Default behavior — no coercion is applied:

{ "column": "NAME", "type": "string" }

`"number"`

Trims leading zeros (e.g. "007" → 7)
Parses as a floating-point number
Returns undefined for empty strings or "NULL"

{ "column": "AGE", "type": "number" }

`"boolean"`

Accepts the following truthy values: "TRUE", "1"
Accepts the following falsy values: "FALSE", "0"
Comparison is case-insensitive and trims whitespace
Invalid or empty values return undefined

{ "column": "ENABLED", "type": "boolean" }

`"date"`

Supports two formats:

Excel-style serial numbers (e.g. 45401 → "2024-05-04")
Plain date strings (e.g. "01/01/1890")

Output is always ISO-8601: "YYYY-MM-DD"

{ "column": "ESTYMD", "type": "date" }

`"constant"`

You can skip column entirely and use a constant:

{ "constant": "Branch" }

This is useful for setting fixed "type" values or flags within the output.

Undefined and Null Handling

If a value is empty ('') or the string "NULL", it is treated as undefined and excluded from the output entirely.

Strategic benefits

This format has a number of strategic benefits:

Captures semantic structure and identity
Aligns flat data to knowledge graph–ready shapes
Integrates cleanly with JSON-LD contexts and vocabularies
Enables validation via JSON Schema
Facilitates automation in ETL, modeling, and governance pipelines

By using declarative mappings and semantic paths, this approach turns traditional tabular data into structured, interoperable knowledge assets.

Example

This input CSV

ID,NAME,AGE,ACTIVE,START
001,Alice,30,TRUE,44561
002,Bob,25,false,01/01/2020
003,Carol,NULL,,NULL

When converted with this mapping:

{
    "type[]": { "constant": "Document" },
    "people[{@ID}].id": { "column": "ID", "type": "string" },
    "people[{@ID}].name": { "column": "NAME", "type": "string" },
    "people[{@ID}].age": { "column": "AGE", "type": "number" },
    "people[{@ID}].active": { "column": "ACTIVE", "type": "boolean" },
    "people[{@ID}].startDate": { "column": "START", "type": "date" }
}

Produces this JSON:

{
  "type": [
    "Document"
  ],
  "people": {
    "001": {
      "id": "001",
      "name": "Alice",
      "age": 30,
      "active": true,
      "startDate": "2021-12-31"
    },
    "002": {
      "id": "002",
      "name": "Bob",
      "age": 25,
      "active": false,
      "startDate": "2019-12-31"
    },
    "003": {
      "id": "003",
      "name": "Carol"
    }
  }
}

This mapping:

Uses the syntax people[{@ID}] to produce a map where each key is the row’s ID, and the value is the corresponding person object.
Assigns structured fields such as id, name, age, active, and startDate using the corresponding CSV columns
Applies type coercion:
- "age" is parsed as a number
- "active" is interpreted as a boolean (e.g. "TRUE" → true, "false" → false)
- "startDate" is parsed as a date, supporting both Excel serial numbers and human-readable formats
Skips any fields with "NULL" or empty values to avoid populating invalid or meaningless data
Annotates the document with a top-level type of "Document" via a constant assignment

Observations:

"type" is a constant applied at the root.
"people" is keyed by the value of ID.
"age", "active", and "startDate" are omitted for Carol because they are "NULL" or blank and thus coerced to undefined.
Excel serial 44561 (row 1) becomes "2021-12-31".
"01/01/2020" string is parsed as "2020-01-01" in ISO format.
"false" (case-insensitive) becomes boolean false.