Network Data
  1. Methods
    1. Surveys and questionnaires
    2. Ethnography, observation, fieldwork
    3. Unobtrusive observation, trace studies
    4. Data mining
  2. Issues
    1. Ethics, human subjects, informed consent
    2. Validity Reliability Accuracy Precision
    3. Scaling and calibrating data (what is a "good friend")
  3. Representations & data structures
  4. Details
    1. Collecting ego network data
    2. Sampling
    3. Snowball methods
    4. Event sampling
    5. Web crawling, scraping, databases, APIs
    6. Wearable computers
    7. "Data exhaust"
    8. Cognitive social structure interviews
    9. Bounding and sampling issues
  5. Research on methods
  6. General considerations and best practices


Generic Data Formats

There are numerous ways to "write down" network data. We will describe three simple versions here: nodelists, edgelists, and full matrix.

In a node list we list each node (vertex) and then all the nodes (vertices) to which it is connected.


This means that A is tied to B and C, B is tied to C, and C is tied to A.


In an edge list, each edge — that is, each pair of vertices that is connected by an edge — is listed separately. If it is a directed network then there will be a separate entry for each direction of the edge. Thus our network from above would look like this:


A third alternative is the full matrix. Here we make a table with a row for each vertex and a column for each vertex. The elements of the table are 1 if there is an edge going from the column vertex to the row vertex (note this is an arbitrary convention).

A 0 1 1
B 0 0 1
C 1 0 0

Each software package has its own preferred format though more and more can import and export in one another's format. The formats tend to be variations on these three generic formats.

These methods can be extended to include vertex properties and edge weights.

Some formats (e.g., VNA) separate out vertex data, vertex properties, edge data, and edge properties.

In NodeXL and Gephi we typically enter edges in an edgelist format if we are not collecting the data automatically (e.g., from Twitter). In NodeXL we can add extra columns to either the edge worksheet or the vertices worksheet to record edge and vertex properties.


Augmented Ethnography: Processing Qualitative Data From Massive Conversations
Network Analysis and Ethnographic Problems

  • Network Analysis and Ethnographic Problems: Process Models of a Turkish Nomad Clan[1] (a Google book) is an anthropological and complexity science book by social anthropologistDouglas R. White, University of California, Irvine, and Ulla Johansen of the University of Cologne.

Web Sources to Explore

Article Search (NYTimes API)
Using the Wikipedia page-to-page link database