Semantic Mapping Format for CSV-to-JSON Transformation
This document defines a declarative mapping format that enables semantically rich transformation of flat CSV data into structured JSON representations aligned with the goals of the OMG Semantic Augmentation Challenge.
Rather than relying on procedural logic, this format allows you to express conceptual model alignment, identifier binding, and value coercion in a compact, portable way that supports reuse and automation.
Why this Format?
This mapping syntax is designed to:
- Link flat tabular data to nested semantic structures
- Support identifier-based merging and grouping
- Define schema-aware coercion (string, number, boolean, date)
- Enable downstream use in JSON-LD, JSON Schema, and linked data
- Be interpretable by both humans and machines
- Allow alignment with ontology-based models (e.g. FIBO)
Mapping Format Overview
The mapping is a JSON object where:
- Keys are semantic paths describing the structure of the output.
- Values describe how to extract or compute values from CSV rows.
Each mapping entry looks like this:
"path.to.target": {
"column": "CSV_COLUMN_NAME",
"type": "string" // or number, boolean, date
}
Or for constant values:
"path.to.constant": {
"constant": "Institution"
}
Path Syntax
The path syntax supports:
Pattern | Purpose |
---|---|
object.property |
Set a nested property |
array[false] |
Always append to the array |
array[$.key == @COLUMN] |
Find and reuse existing array items, or create if no match |
map[{@COLUMN}] |
Use a row value as a key in an object map. Create or match |
Value Resolution
Each mapping value supports:
Field | Meaning |
---|---|
column |
The CSV column name |
type |
"string" , "number" , "boolean" , "date" — coercion |
constant |
A literal value |
Data Type Coercion
The mapping format supports coercing CSV strings into strongly typed JSON values using a "type"
field. This ensures compatibility with schema validation, semantic tooling, and downstream processing.
Each type is interpreted as follows:
"string"
Default behavior — no coercion is applied:
{ "column": "NAME", "type": "string" }
"number"
-
Trims leading zeros (e.g.
"007"
→7
) -
Parses as a floating-point number
-
Returns
undefined
for empty strings or"NULL"
{ "column": "AGE", "type": "number" }
"boolean"
-
Accepts the following truthy values:
"TRUE"
,"1"
-
Accepts the following falsy values:
"FALSE"
,"0"
-
Comparison is case-insensitive and trims whitespace
-
Invalid or empty values return
undefined
{ "column": "ENABLED", "type": "boolean" }
"date"
Supports two formats:
- Excel-style serial numbers (e.g.
45401
→"2024-05-04"
) - Plain date strings (e.g.
"01/01/1890"
)
Output is always ISO-8601: "YYYY-MM-DD"
{ "column": "ESTYMD", "type": "date" }
"constant"
You can skip column
entirely and use a constant:
{ "constant": "Branch" }
This is useful for setting fixed "type"
values or flags within the output.
Undefined and Null Handling
If a value is empty (''
) or the string "NULL"
, it is treated as undefined
and excluded from the output entirely.
Strategic benefits
This format has a number of strategic benefits:
- Captures semantic structure and identity
- Aligns flat data to knowledge graph–ready shapes
- Integrates cleanly with JSON-LD contexts and vocabularies
- Enables validation via JSON Schema
- Facilitates automation in ETL, modeling, and governance pipelines
By using declarative mappings and semantic paths, this approach turns traditional tabular data into structured, interoperable knowledge assets.
Example
This input CSV
ID,NAME,AGE,ACTIVE,START
001,Alice,30,TRUE,44561
002,Bob,25,false,01/01/2020
003,Carol,NULL,,NULL
When converted with this mapping:
{
"type[]": { "constant": "Document" },
"people[{@ID}].id": { "column": "ID", "type": "string" },
"people[{@ID}].name": { "column": "NAME", "type": "string" },
"people[{@ID}].age": { "column": "AGE", "type": "number" },
"people[{@ID}].active": { "column": "ACTIVE", "type": "boolean" },
"people[{@ID}].startDate": { "column": "START", "type": "date" }
}
Produces this JSON:
{
"type": [
"Document"
],
"people": {
"001": {
"id": "001",
"name": "Alice",
"age": 30,
"active": true,
"startDate": "2021-12-31"
},
"002": {
"id": "002",
"name": "Bob",
"age": 25,
"active": false,
"startDate": "2019-12-31"
},
"003": {
"id": "003",
"name": "Carol"
}
}
}
This mapping:
- Uses the syntax people[{@ID}] to produce a map where each key is the row’s ID, and the value is the corresponding person object.
- Assigns structured fields such as
id
,name
,age
,active
, andstartDate
using the corresponding CSV columns - Applies type coercion:
"age"
is parsed as a number"active"
is interpreted as a boolean (e.g."TRUE"
→true
,"false"
→false
)"startDate"
is parsed as a date, supporting both Excel serial numbers and human-readable formats
- Skips any fields with
"NULL"
or empty values to avoid populating invalid or meaningless data - Annotates the document with a top-level
type
of"Document"
via a constant assignment
Observations:
- "type" is a constant applied at the root.
- "people" is keyed by the value of ID.
- "age", "active", and "startDate" are omitted for Carol because they are "NULL" or blank and thus coerced to undefined.
- Excel serial 44561 (row 1) becomes "2021-12-31".
- "01/01/2020" string is parsed as "2020-01-01" in ISO format.
- "false" (case-insensitive) becomes boolean false.