Coming soon from NodeXL: v.86 preview and beyond

Coordinator
Jun 8, 2009 at 9:48 PM

http://www.connectedaction.net/2009/05/28/coming-soon-from-nodexl-v86-preview-and-beyond/#more-1220


NodeXL ++

Last week’s NodeXL meeting focused on the coming work items for the next release.  Four major items are lined up for the next few weeks.  We may publish a release once a few of these items are complete.

> Legends and axes with scale marks for the chart canvass.

This is part of our larger work item called “Make the charts ready to publish”.  A related work item to address this goal is export to a vector file format.  A scalable vector format will allow people to create graphs of arbitrary resolutions to fit their presentation needs.

> Export to XPS (which should make it possible to get our files to PDF with some 3rd party help!)

Since NodeXL is intended to support non-programmer network analysts we plan to add support for other network analysis file formats.  A leading example is theUCINet format which has the added virtue of holding many sample data sets that are widely used in classes and network course work.  If you have a network data file format you would like NodeXL to support feel free to comment here or on the Codeplex NodeXL discussion.

UCINet file format compatibility

Improvements to the layout of complex graphs is a deep area of research.  A recent member of the NodeXL team, Janez Brank has proposed and prototyped an alternative mechanism for node layout, namely the Fast Multi-Scale Method from Harel and Koren. This method initially selects a small subset of nodes and lays them out; this initial layout is then refined in several iterations, with more and more nodes added to the layout in each iteration, until a layout of the entire graph is created. The cost function used to optimize the layout at each step is designed to reward layouts in which the Euclidean distance between nodes corresponds approximately to the length of the shortest path between them. Our developer, Tony Capone, plans to implement the design in the coming weeks.  We have not added a new layout for a while and those recent additions have been deterministic geometric layouts like grid, circle, and sine wave.  We have never added an alternative to the Fruchterman-Reingold force directed layout with which we started the project.  This is a big feature addition that should have a big impact in making NodeXL layouts more visually appealing and informative.

> Updated layout #1

We hope these features deliver a lot of value to our users and address the big themes the team is working towards: ScaleClarity, and Connection.  We seek to make NodeXL perform against reasonably sized data sets, to improve the exploration and discovery of structures in graphs, and to simplify the import and export of data between NodeXL and social media network data sources beyond the email and Twitter support present in the application today.

Some other features that we have been considering include:

> allow image paths to point at web URLs instead of exclusively local files.  NodeXL allows pictures to replcace node shapes so long as those files are stored locally.  Allowing for URLs pointing to Internet stored images will make NodeXL better at browsing social media sites at the cost of potential performance issues as files get fetched over possibly slow networks.

> clustering support that allows metrics to be calculated for a cluster and for the cluster to have a constrained region on the chart.

By allowing users to calculate metrics within the bounds of a cluster instead of globally, as we do today, NodeXL will take a step forward in its support for time series and multi-modal network analysis.  Time slices of a network can each be placed in their own cluster.  Each time slice would then be treated as a separate network while being stored along with all of the other time slices representing the network.  Multi-modal networks could also be segregated into clusters and have metrics and layout rules applied to each separately.

Spatial constraints on clusters would lead to features that allow users to see and manipulate the layout regions for each cluster.  A usage scenario for this might be a network made up of teachers and students where the two type of nodes are placed in separate regions of the canvass.  This feature would allow NodeXL to take a step towards what Ben Shneiderman calls “NetViz Nirvana” by enabling a “semantic substrate” for networks.  The idea of a “substrate” is simpler than its name: imagine the way an airline route map uses the world map as a substrate for a network map of cities (nodes) connected by edges, lines, arcs, or flights.  The same map of nodes and edges might be much less informative if all the nodes (cites) could pull each other into a big clump the way many network maps do.  Plotting the nodes on the locations of their cities on the globe adds a dimension to the meaning of the network.  When networks are not geographic in nature the challenge is greater.  Still, simple constraints or substrates may be useful.  To find out, NodeXL will have work items that allow clusters to have defined regions that constrain the layout of the nodes within them.

Clusters imply a way to create and destroy a cluster and to add and remove nodes from it.  There is some complexity here: networks are not as simple as tree structures, like file systems, where the familiar [+] [-] metaphor for opening and closing a level of a tree breaksdown.  If every node in a graph had just [+] [-] controls does [-] mean “collapse all my neighbors into me?”  and does [+] mean explode all the nodes contained in this node?  How will users to a task like seeing two nodes that they know to be the same entity (Bob and Robert, for example) and want to mush the two nodes together into a metanode?  Drag and drop may have errors, so we need an easy way to pull a node out a cluster.  Limitations in Excel 2007 constrain the ways NodeXL can talk to the Excel un-do stack so this can be problematic.

> Spigots to connect to a range of social media platforms would allow NodeXL to serve as a dashboard for many forms of social networks on the web.  From your desktop or laptop you will be able to pull network data from a number of sources, perhaps integrating them to provide a richer map of your personal or professional social media landscape.  NodeXL now connects to personal email through the Windows Search client and to Twitter to get the follows network for a given user.  More support for more silos of social media are on the agenda: Facebook is an obvious priority (and there is a useful tool from Bernie Hogan to pull an edge list into several network analysis file formats), other sites of interest include LinkedIn, YouTube, Wikipedia, and enterprise social media platforms.

> More metrics: we plan to implement a short list of additional network metrics including hubs and authorities and HITS.  Additional global metrics will be implemented including:

  • Geodesics- graph-level metrics such as mean geodesic and diameter.
  • Diameter: largest distance between connected nodes
  • Connected components
  • Count of components of each size
  • Dyad census: counts of each of 3 types of dyads, placed on the graph-level worksheet
  • Triad census
  • Network transitivity - proportion of 2 stars which close

> Tools for multi-modal networks: easy ways to transform bi-modal networks into single modes, for example taking the person to document network common in many social media repositories and transforming it either into a person to person network or a document to document network.  This is a chore now for NodeXL users and should be much easier!

> We are exploring ways to move parts of NodeXL into a web browser and web server architecture.NodeXL in the cloud is a topic of interest to the team.

> We have avoided any effort into 3D representations of graphs although the advacing power of graphics cards starts to make this option more attractive.

> Location and distance support: as more network datasets appear with location attributes, through the proliferation of sensors like GPS, mobile phone systems using triangulation or manual identification of location, there is a growing need to support geographic  calculations like distance, speed, and trail length.  Integration with web mapping tools could be interesting!