I wonder how to import a database flat file (http://en.wikipedia.org/wiki/Flat_file_database) as a graph where each unique field value will become a node(vertex) connected to others nodes according all theirs "horizontal" relationships.
id name team
1 Amy Blues
2 Bob Reds
3 Chuck Blues
4 Dick Blues
5 Ethel Reds
6 Fred Blues
7 Gilly Blues
8 Hank Reds
will become a graph with the following nodes
- "Blues" and "Reds" of type "team"
- 1,2,3,4,5,6,7,8 of type "id"
- "Amy", "Bob", "Chuck", "Dick", "Ethel", "Fred", "Gilly", "Hank" of type "Name"
- node "Bob" will be connected to "Reds" and 2 nodes,
- node "Reds" will be connected to "Hank", 8, "Ethel", 5, "Bob", 2 nodes,
See if this will do the trick: "How to Create an Edge List From a List of Items, Format 3", at
Jan 10, 2013 at 3:58 PM
Edited Jan 10, 2013 at 4:02 PM
"How to Create an Edge List From a List of Items, Format 3" is definetly is step in that direction.
To contextualize a little bit, I'm exploring a data first approach using graphs to find pattern/cluster from data and to infer potential data models and reference data out of data sets (typically OLAP style data, serialized into denormalized flat files)
In this case, as a starting point we usually start with business data organised in flat files, mostly coming from manager's spreadsheets and/or database extracts (ETL) with no schema at all.
I'm trying to visualize data first to get another cognitive view of that data different then tabular view and to move forwards in the graph view (transformation ,clustering, .....) to extract
- lookup/reference data and
- data schema.
According to me, there is need for a "table data" mode option to the already existing "node/edge" mode import feature of NodeXL.
Here are the features set I suggest for that table data mode for NodeXL:
- the importer will be able to select node sources :
- A node source could be a column to import or (perhaps in a latter release) an expression of one or more columns : eg. [Person] = [First Name] + " " + [Last Name]
- nodes sources will be merged in "union distinct" by default
With the following options:
- header name could be used defined nodes type, if there are no header name in the file, the user could fill in them
- a hierarchy between columns names (node type) should be able to be defined (eg. country/state/town) to reflect it in the graph
- assign a column, not as a node source, but as attributes of a type of node (column age, will be an attribute of node persons)