All SOC180 Problems
 Last problem was 0480 (HELP)

Q68. STUDENTS AND CLASSES

Q67. Work through the first half of chapter 7 of the NodeXL book using the senate data.

Let's see if we can find some structure within either or both of the parties. The last exercise the text book suggests — changing the edge filtering threshold (basically eliminating edges below some threshold so that we only count it as a similarity edge if, say, two senators vote together 75% of the time) — let's us see some variation.

But what if we look only at Republican-Republican edges?

=VLOOKUP([@[Vertex 1]],Vertices[[Vertex]:[Party]],28)&"-"&VLOOKUP([@[Vertex 2]],Vertices[[Vertex]:[Party]],28)

Q64. We'll use data on early 20th century Scottish industries to investigate interlocking directorates.

(From Pajek data online) This dataset contains the corporate interlocks in Scotland in the beginning of the twentieth century (1904-5). In the nineteenth century, the industrial revolution brought Scotland railways and industrialization, especially heavy industry and textile industry. The amount of capital needed for these large scale undertakings exceeded the means of private families, so joint stock companies were established, which could raise the required capital. Joint stock companies are owned by the shareholders, who are represented by a board of directors. This opens up the possibility of interlocking directorates. By the end of the nineteenth century, joint stock companies had become the predominant form of business enterprise at the expense of private family businesses. Families, however, still exercised control through ownership and directorships.

The data are taken from the book The Anatomy of Scottish Capital by John Scott and Michael Hughes. It lists the (136) multiple directors of the 108 largest joint stock companies in Scotland in 1904-5: 64 non-financial firms, 8 banks, 14 insurance companies, and 22 investment and property companies (Scotland.net). In this dataset, which was compiled from the Appendix of Scott & Hughes' book, note that two multiple directors (W.S. Fraser and C.D. Menzies) are affiliated with just one board so they are not multiple directors in the strict sense.
The companies are classified according to industry type: 1 - oil & mining, 2 - railway, 3 - engineering & steel, 4 - electricity & chemicals, 5 - domestic products, 6 - banks, 7 - insurance, and 8 - investment. In addition, there is a vector specifying the total capital or deposits of the firms in 1,000 pound sterling.

References

John Scott & Michael Hughes, The anatomy of Scottish capital: Scottish companies and Scottish capital, 1900-1979 (London: Croom Helm, 1980).
W. de Nooy, A. Mrvar, & V. Batagelj, Exploratory Social Network Analysis with Pajek (Cambridge: Cambridge University Press, 2004), Chapter 5.

History

Original authors: are John Paul Scott (1949) (ku.ca.xesse|jttocs#ku.ca.xesse|jttocs, University of Essex) & Michael Hughes (1947, University of Lancaster in 1980, not listed now).
Data compiled into Pajek data files by W. de Nooy, 2001

Use NodeXL to visualize this data. The data is in three network datasets: a bipartite network of people and companies (edges represent a person being a director of a company); a network of people (the edges are co-membership in companies); and a network of companies (edges are sharing a director).

Task 1: Create a preliminary two mode visualization that shows people as small circles and companies as larger squares. Try different layouts (including manually assisted) and produce the best visualization you can (in a reasonable amount of time). Can you color the companies by industry? Are there individuals who appear to be bridges between industries? Or who appear to be kingpins in a particular industry?

Task 2: Do a quick exploration of the people by people network. Try different visualizations. Calculate graph metrics. It might clarify the visualization if you use dynamic filtering to discard barely connected individuals. Change node size by graph metric. Can you identify a class of apparently important people? Try clustering.

Task 3: Now look at company by company network. Cluster, color, explore. How much do network clusters follow industry? Are there cluster bridging companies? Are you surprised at what they are.

Turn in short paper that shows your explorations.

The data is in the following Excel files.

Q63. Imagine a 10 x 10 square grid. Each cell can be empty…

Q60. Use NodeXL to compute betweenness, closeness, and Eigenvector centralities of the network shown below (Excel file here). Label the vertices with these.

1. Rearrange the edges so as to get the largest value for betweenness centrality for vertex A
2. Rearrange the edges so as to get the largest value for closeness centrality for vertex A
3. Rearrange to get largest Eigenvector centrality

Q59. Consider the following things1 that can flow or move on a network:

Used books
Money
Smiles
Gossip
Taught knowledge
Mooching friends/relatives
Email
Attitudes
Workers (flowing among jobs)
Infection
Packages
Greetings
Tips/how-to-info
Help/favors/acts of kindness

Categorize these in terms of four characteristics:

1. The mechanics of diffusion: does diffusion occur via replication (copy mechanism) or transfer (move mechanism)?
2. (applicable only to replication-based flows) Is the duplication is one at a time (serial), like giving a paperback to a friend, or simultaneous (parallel), like a radio broadcast.
3. Does the traffic flow deliberately or blindly/randomly?
4. Does the traffic revisit places it's already been? That is, is the flow on paths (no node repeats), trails (no edge repeats), or walks (visiting nodes and edges perhaps repeatedly).

Q57. Facebook Ego Network with Gephi

Q56.

• Go to NameGenWeb and select output in the form of a GraphML file. You may have to sign into to Facebook to make this work. When the program comes back and says "Thank you for waiting. Your network is now available" you can click and save or right click the link and save the file (to a directory where you will be doing your work). If you click the button you will see a .graphml.XML file that looks like this which you can Save as…:

<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="uid" for="node" attr.name="uid" attr.type="double"/>
<key id="sex" for="node" attr.name="sex" attr.type="string"/>
<key id="pic" for="node" attr.name="pic" attr.type="string"/>
<graph id="G" edgedefault="undirected">
<node id="name of a friend">
<data key="uid">219655</data>
<data key="sex">female</data>
<data key="pic">http://profile.ak.fbcdn.net/hprofile-ak-snc4/174207_219655_1563807427_s.jpg</data>
</node>
etc.

• If you right click you will save a .graphml file. Either file can be opened by NodeXL using File>Import GraphML file…
• In the Vertices worksheet, set the shape to 3 (disk). Double-click lower right corner of cell to auto-fill down.
• Use Graph Metrics to compute all metrics except In- and Out-degree.
• Use Autofill Columns to set vertex size to degree. Adjust sizes in Autofill Columns > Vertex Size Options as you like.
• Use Groups>Find Clusters (Girvan-Newman) to create groups of vertices.
• Under Layout Options (in Graph window at bottom of layout selection pull-down) select "Layout each of the graphs groups in its own box and …"
• For layout type try Harel-Koren2
• Use Dynamic filters to drop "leaves" or "pendants" by setting degree to greater than or equal to 2
• To figure out what your clusters are, click on sample nodes. NodeXL will highlight rows in the vertices worksheet.
• Click on Graph Metrics>Average Overall Metrics. Look on the Overall Metrics worksheet to find
• Number of connected components
• Number of single vertex components
• Maximum geodesic distance
• Average geodesic distance (the "degrees of separation" in your ego network)
• How many nodes does your network have? How many edges? What is the number of possible edges? What is the ratio of actual edges to possible edges? How does this compare to the graph density calculated by NodeXL?

Q55. Consider this adjacency matrix. Following standard conventions, calculate in- and out-degree for each vertex.

 A B C D E A - 0 1 0 1 B 0 - 0 1 1 C 0 1 - 1 1 D 0 0 1 - 0 E 0 1 1 1 -

Q54. Use the data you generated in Q53 to produce simple visualizations of these networks in NodeXL and import these into a Word document to report your results.

Q53. Practice constructing network survey instrument ("by hand"), administering and recording data.

1. Sketch out (paper and pencil mode) a brief network questionnaire that includes
1. Respondent name and a few demographics (e.g., sex and age)
2. List of R's "confidantes"
2. For each confidante we want to collect a bit of demographic info (e.g., sex and age)
3. For each confidante we want to know whether R has particular activity tie (e.g., have you had dinner in the last week).
4. Construct a grid that will let you record R's answers to whether for each pair of confidantes she has Xd with the two.
5. Construct a grid that will let you record R's assessment of whether a particular relationship exists between each pair of confidantes.

Q52. If necessary, review the Wikipedia page "matrix multiplication." Then practice with the following

a b
c d
e f

e f g
h i j

1 1 1 0
0 1 0 1

1 1
1 1
1 0
1 0

What is (1) AxB (2) CxD (3) AxC (4) DxB (5) BxC

Q51. To transpose a matrix we simply swap its rows and columns:

 A B C D E F G H I > > > > A D G B E H C F I

or

 A B C D E F > > > > A D B E C F

If the matrix is called \$M\$ then we write the transpose of M as \$tr(M)\$.

Transpose these three matrices

 A B C A B C D D E F G 0 1 0 2 1 0 3 1 0 3 0 4 2 1 4 0 1 1 0 0 1 1 1 0 1

Q50. What kind of network data might emerge from: tweets, retweets, hashtags? Assume we have powerful access to the Twitter stream (meaning we can grab all the tweets in a given time frame, all the tweets by a set of users, all the tweets that mention a hashtag or a user, etc. And assume we have access to the API and so can take a user name and get a list of who she follows or who follows her.

Describe five different networks we might construct from this data.

Q49. Convert this 2-mode data into 1-mode data

Flavor Who Likes It
Chocolate Abe, Bertha, Cyndra, Dalia
Vanilla Abe, Cyndra, Eve, Faisel, Gerd
Strawberry Bertha, Cyndra, Hettie, Iolantha
Pistachio Cyndra, Dalia, Gerd, Hettie, Iolantha

Q48. Convert the information below - data on four organizations, listing the members of their boards of directors - into a network data format and then use a network visualization program to show the 2 mode network laid out nicely with nodes labeled.

Acme Association Boothwyn Foundation The Cannalo Organization Dynamic Educational Consulting
Allen Allen Barb Chris
Barb Ethan Chris Dante
Chris Fran Ethan Ethan
Dante Gent Fran Kelly
Harri Ishtar Lori
Jack Miguel

Q47. Use NodeXL to visualize the following networks

Node List
A, B, C, E
B, A, C, F
C, A, D
D, B, E, F
E, B, F
F, A, D, E

Edge List

A, B
A, C
A, E
B, A
C, A
C, B,
C, D
D, B
D, E
D, F
E, F
F, B
F, C
F, D

Full Matrix

A B C D E F
A - 1 0 0 0 1
B - 0 0 1 1
C - 1 1 1
D - 1 0
E - 0
F -

Q46. Play with Yasiv's Amazon Recommendation Visualizer. After you have played with it for a bit, carry out some specific "analyses" (we use the term loosely here).

Select a book — my example will be The Fountainhead by Ayn Rand, a favorite of libertarians and right wingers. Jostle the network around until you get a nice layout that has settled down. You might want to zoom in or out (mouse wheel if you have one).

Copy screen image to clipboard with alt-F14 or alt-PrtScn, open a Word document and paste. Crop the picture (Picture Tools>Format>Crop), unselect it, reselect it and stretch to fill page.

Examine clusters. Now select the picture and Cut it (control-X) and then Insert>Shapes>New Drawing Canvas, expand the canvas to be page-sized, and paste the picture. Then use the circle tool to draw circles around the identifiable clusters.

Now let's create a new drawing canvas and sketch this network of clusters.

And then we will examine the books qualitatively — back on the web interface — to come up with coarse descriptions of the clusters.

Finally, we inquire as to what connections the inter-cluster edges represent, like this:

And ueber-finally, let's prepare this network for data entry in NodeXL
Vertex1 Vertex2
creativity innovation
innovation Pirsig-education
innovation Rand-novels
Rand-novels Rand-videos-etc
Rand-novels Rand-philosophy

Enter Data into NodeXL

1. Start on the vertices tab and enter vertex 1 in column A, vertex 2 in column B.
2. Click on refresh graph.
3. Moving nodes and graph.
4. Set layout to none.
5. Vertex labels, vertex size. Refresh.
6. Add edge labels for an edge or two. Refresh.
7. Vertex label position.
8. Copy to clipboard. Paste into writeup document.

Suggestions

The Real Mitt Romney]
Steve Jobs' biography
Moneyball by Michael Lewis
Easy Riders, Raging Bulls: How the Sex-Drugs-and-Rock 'N' Roll Generation Saved Hollywood
Caro Emerald's album Scenes from the Cutting Room Floor
Omnivores Dilemma
Fast Food Nation
City: Urbanism and Its End
President Obama's Audacity of Hope
The Art of Computer Programming by Donald Knuth
C Programming
The Future of the Internet and How to Stop It

Q42. Data Collection Problem

• Seeing two mode networks (when do we encounter these in the reading?)
• Grab some data from an online source.
• Use NodeXL to grab some automatically.
• Class exercise on CSS.

Q41. TEXT

page 1 of 212next »