Skip to content

Direct mapping vs. CSV2RDF, simple case

Gregg Kellogg edited this page Apr 17, 2015 · 6 revisions

Consider a simple table, called People:

ID fname addr
7 Bob 18

RDB case

A relational table always has a schema, that the RDF Direct Mapping makes use of. In this example, the schema may define that:

  • ID is a primary key in the table, and the cells are integers
  • The name column contains strings
  • The addr column contains integers

Using these information, the result of the Direct Mapping is something like:

<http://foo.example/DB/People/ID=7> rdf:type <http://foo.example/DB/People>.
    <http://foo.example/DB/People/#ID> 7;
    <http://foo.example/DB/People/#fname> "Bob";
    <http://foo.example/DB/People/#addr> 18
    .

Where http://foo.example/DB/ is the URL for the database that contains the People table.

Simple CSV case

If the only information the CSV2RDF processor has is just the CSV file (i.e., the only available metadata is the first row, providing names for the columns) the output of the conversion is as follows (where for the sake of comparison, we consider http://foo.example/DB/People to be the URL for the file):

[]
	<http://foo.example/DB/People#ID> "7";
	<http://foo.example/DB/People#fname> "Bob";
	<http://foo.example/DB/People#addr> "18"
    .

Comparing the two conversion results:

  • CSV2RDF does not have the information that the first column provides unique identifiers for the rows (i.e., that it is a primary key); consequently, a blank node must be used for the common row subject
  • CSV2RDF does not have information on the data types and, therefore, cannot presume that the first and the third columns contain integers
  • in general, there is no reason to use the URL for the table as a class for typing. That could lead to semantically incorrect situations

CSV case with a simple metadata

To provide the necessary information the following simple metadata could be made available to the CSV2RDF processor (essentially playing the role of the RDB Schema for the conversion):

{
    "@context": "http://www.w3.org/ns/csvw",
     "null": "",
     "tableSchema": {
         "url" : "http://foo.example/DB/People",
         "aboutUrl" : "http://foo.example/DB/People/ID={ID}",
         "columns": [{
            "name": "ID",
            "datatype" : "integer"
          }, {
            "name": "fname",
          }, {
            "name": "addr",
            "datatype" : "integer"
          }, {
            "name": "type",
            "virtual": "true",
            "propertyUrl": "rdf:type",
            "valueUrl" : "http://foo.example/DB/People"
        }],
      }
}

Using this metadata, the output of the CSV2RDF processor will be identical to the one produced by the RDF Direct Mapping.