Salemmordet betecknar mordet på Daniel Wretström år 2000 som oftast liknas med startskottet för Salemmarschen, ” den största nazistiska manifestationen i Norden” (lyssna på P3 Dokumentär här). Inom ramen för mina psykologi studier fick jag analysera händelserna ur ett social psykologiskt perspektiv. Nedan delar jag med mig av min inlämningsuppgift.Read More »
Before I start
This post is one of the longest I have written so far (it took my about a week to finish it), and I still have some open questions:
- […] what determines the centrality of single nodes or clusters [when applying layouts in Gephi]?
- […] what [does] the modularity class tell me about the community?
- Does this exemplify the importance of additional measures besides “degree”? As the number of connections does not necessarily correspond to the ability to serve as a bridge?
- Isn’t it logical when being part of a smaller sub cluster that closeness centrality and eccentricity decline as smaller groups are per se connected to fewer nodes in the network?
These questions can be found in the text marked in bold where they are also embedded in the context to offer more information. I would be very happy, if people interested in Social Network Analysis (and Gephi) could help me finding the answers. Thank you in advance 🙂
Social Network Analysis (SNA) – that has a familiar ring!
When hearing about SNA in week 3, I remembered reading this term in some of the last weeks’ resources. Baker, R., & Siemens, G. (2014) frame SNA as one of four approaches of structure discovery of data in EDM/LA which can be seen as opposite to prediction (there is no priori idea of a predicted variable). SNA reveals the structure of interaction by analyzing the relationship between individual actors. Shum & Ferguson introduce SNA as a “possibility” offered by Social Learning Analytics (SLA). This possibility is social itself – as for example discourse analysis – but opposed to social learning disposition analytics or social learning content analytics, which need to be “socialized” first. So again, this sounds very social, and I am wondering if Social Learning Analytics and Social Network Analysis can combine three of my areas of interest: (Social) Psychology, Learning and Analytics.
SNA – What are we talking about?
Social Psychology – a friend I believed to be lost – is striking back as powerful and drawing interest as usual. Burt, Kilduff & Tasselli (2013) mention two facts established by Social Psychology that network models of advantage (“as a function of breadth, timing and arbitrage) build upon. These are
- people form groups based on where they meet
- within a group communication is more influencing and frequent than between different groups (as similar views develop).
As potential sources they mention Festinger et al. 1950 – which developed the idea of group cohesiveness and on page 151 of their book it says
The gist of these conclusions may be summarized as follows: In a community of people who are homogeneous with respect to many of the factors arising from the arrangement of houses are major determinants of what friendship will develop and what social groupings will be formed. These social groupings create channels of communication for the flow of information and opinions. Standards for attitudes and behavior relevant to the functioning of the social group develop, with resulting uniformity among the members of the group. Other people deviate because they were never in communication with the group.
This is a concept we can observe in our daily life. Imagine a university course starting and some students joining a little later than the rest. A loose group has already formed, the information flow is established. A new member joins this network and wants to put herself on equal footings with the others. Group cohesiveness represents “the property of a group that effectively binds people, as group members, to one another and to the group as a whole, giving the group a sense of solidarity and oneness.” (Hogg, M. A. & Vaughan, G. M. (2014), p288). In this respect one has to bear in mind that a lot of consequences result from cohesiveness, where positive and negative lie close to each other. In- and out-groups form, which can be positive for the in-group members. On the other hand, out-group members are excluded, the base for the emergence of prejudice. Many more social phenomena could be named here – something SNA may not overlook.
Haythornthwaite (1996) combines the social and analytics perspectives on Social Network Analysis in an interesting way. She mentions that compared to other analysis techniques SNA focuses on relationships and their patterns and contents. Thus it “strives to derive social structure empirically, based on observed relationships between actors, rather than on a priori classifications” (p325). The world is hence explained by networks, not by groups. Relationships and ties are described in relation to content, direction and strength (I will focus on the two latter ones).
Direction is asymmetrical when the information flow is one-way only and direct/undirect classifications represent if the direction of flow is either not measured/not considered relevant. In terms of strength (intensity of a relationship, in addition to the mere existence) either the number of ties and/or their strength can be examined. Haythornthwaite is introducing five network principles:
- cohesion (grouping nodes regarding strong common relationships by e.g. density and centralization; clusters and cliques)
- density (degree to which members are connected to all other members)
- centralization (extend to which a set of actors are organized around a central point)
- clusters (subgroup of highly interconnected actors)
- cliques (fully connected clusters)
- structural equivalence (grouping nodes regarding their similarity)
- prominence (the node in charge)
- centrality (differs from centralization as it measures the node’s connections in the network rather than measuring the configuration of the network)
- global centrality or closeness (shortest path between an actor and every other actor in the network)
- range (a node’s network extent) and
- brokerage (bridging connections to other networks)
- betweenness (extent to which an actor sits between others in the network, playing a role as an intermediary)
So if we see SNA as the source of tools for the analysis of relational data (Grunspan, D. Z., Wiggins, B. L., & Goodreau, S. M. (2014)), we can detect two classes of hypothesis: why are relations formed and what are the outcomes of these relations? They also argue, that these questions are important, as the learner’s position seems to be correlated with her performance. So in order to understand networks, we need to understand the determinants, structure and consequences of relationship between actors. That needs to be considered in situations where social support / connections are to influence the outcomes of interest.
With this in mind, the main methods for SNA are modularity, density and centrality.
Modularity describes a way of quantifying the concept of community structure. In brief, it is calculated by subtracting the number of ties falling within groups subtracted from the expected ties in a similar network with nodes placed at random. [Newman, M. E. J. (2006)] This concept incorporates the idea that simply taking the number of ties between two groups one would expect at random is not meaningful until there is no comparison between expected numbers and actual present numbers.
Density in the words of Hanneman & Riddle (2005) is “the proportion of all possible ties that are actually present.” It is calculated by forming the sum of ties divided by the number of possible ties. In a fully connected network or subgroup of a network (a clique) would have the density of 1.
Network types can be described as unipartie (one type of actors) vs. bipartie (actors linked with the group to which they belong); undirected vs. directed (please see the Facebook network example below) as well as binary (simple existence) vs. valued ties (additional quantitative data). When talking about actor-level variables there are several proposed measurements of centrality existing: degree centrality (total number of connections a node has; in directional relation networks including in- and out-degree), betweenness centrality (actors serving as bridges in the shortest paths between two actors), closeness (how close one actor is to other actors in average) and eigenvector centrality (being connected to other well-connected nodes) (based on Grunspan, D. Z., Wiggins, B. L., & Goodreau, S. M. (2014)).
SNA in action – My Facebook network analysis in Gephi
I was surprised how easy it actually was to gather information around my Facebook network. And in addition, it felt weird. After receiving the data by running the Facebook app Netvizz (which creates a file that can be used by Gephi to analyse the network) I decided not to mention individual personal data to avoid drawing detailed conclusions from the below graphs. They are used to visualize the main methods introduced above and to briefly discuss some striking results of the analysis.
Gephi can be used for both – visualizing a network in the format of sociographs and identifying datasets of individual actors (nodes) and clusters in this dataset. The first step after running Netvizz is to integrate the dataset as a directed or an undirected network. As I am dealing with Facebook, the idea of a connection between two actors (a tie between two nodes) is that both agree to become Facebook friends. Thus, the direction is not relevant for the analysis. Hirst, T. (2010, April 16) describes this in his blog, comparing Facebook to Twitter, where direction would matter. As a result, I chose an undirected network for my analysis. From the import report we can see, that there are 320 actors (friends) in my network, connected by 3182 ties (connections).
After pressing the OK button, the network looks similar to the first graph of this section. It is an “imbroglio” of nodes and ties, not really interpretable. So it would be useful to apply some methods introduced above, starting with density.
The density has been calculated with 0,062. Strictly speaking, my Facebook network does not take advantage of its potential as it is not using all the possible connections. Whereas this might correspond to the basic interpretation of density, this is only applicable in a limited way here: there are many subgroups in my network that might have a higher density. In addition, one has to interpret the word “potential” here: yes, there are potential connections that could be made, but does that mean, they would change the quality, or the information flow of the network?
For a more interpretable visualization, I applied measurements of centrality. The first one is degree centrality, offering the total number of connection a node has (as we are dealing with a indirectional relation network, there is no in- and out-degree). In my network, there is one actor with the highest degree centrality followed by three friends with a quite similar degree centrality – all of them are in my eyes well-connected in Facebook terms. But one has to be careful about making any qualitative judgements: degree centrality is offering an overview of connections quantity, not quality.
There are so many layouts in Gephi to apply for a better visualization but I have no idea of how they work in particular. In my case, I was applying Fruchterman Reingold (“a classical layout algorithm, since 1984; rated with 2/5 stars in quality, 3/5 in speed”). So the result looks like this:
From my understanding, the layout visualizes sub clusters of the network and arranges them dependend on how close they are to each other in a given area of a circle.  However, I am not quite sure what determines the centrality (here ment to be literally as being arranged more centrally in the area of the circle) of single nodes or clusters. What is striking in my Facebook network is that the formerly defined high-degree actors all belong to one cluster (the upper right one), and in addition we can find two more clusters. In general, it seems applicable to define these clusters as the “home-cluster”, the “work-cluster” and the “university cluster”.
Let’s go for the modularity now to identify more relation patterns in this network. The results are as follows: there are 23 communities. In the size distribution chart, the number of nodes (size) is related to the modularity class.  In this context, I am not sure, what the modularity class tells me about the community. The biggest community has about 90 nodes and a modularity class of 19. Whereas there are some small communities with only 1 node but a higher modularity classes.
To get rid of too small communities, my next idea was to filter out groups that are smaller than 3 nodes. (This can be done by setting the degree range filter in the topology folder and changing it to the range of 3-70). Running the density and modularity statistics again resulted in a density of 0,08 now (former: 0,062), 9 communities instead of 23 and a slighty lower modularity of 0,686.
Some additional measures and information: By filtering the nodes, the overall number was reduced to 281 actors and 3142 connections. The average degree of an actors is 22,363 and the network diameter 6 (meaning that the greatest path between two actors is covered by 6 nodes).
The partition function in Gephi is useful to visualize the 9 identified communities (I had to re-apply the layout, so now the position of the communities does not correspond completely to the former graphs, it is turned by 45 degree clockwise). In addition to this partition by modularity class I applied the betweenness centrality measure (actors serving as bridges in the shortest path between two actors). I am focusing on two actors here (the two biggest nodes, one light green, one light blue). Whereas the light blue one corresponds to one of the actors identified by the degree measure, the light green actors is a “new” one, not identified by the degree measure before.  Does this exemplify the importance of additional measures besides degree? As the number of connections does not necessarily correspond to the ability to serve as a bridge? In more quantified terms, betweenness centrality measures how often a node appears on shortest paths between nodes in the network. So actors identified by this measure could be valuable for transfering information from one group to the other. And believe it or not, the actor indentified by Gephi is a friend of mine that I always ask for “the gossip” and he/she knows almost everything about different groups – without being connected to most of the people in my network.
Two additional measures I applied can be seen in the graphs above: closeness centrality (as how close one actor is to other actors in average or the average distance from a given starting node to all other nodes in the network) and eccentricity (the distance from a given starting node to the farthest node from it in the network). The most striking fact is that there is no actor of the network explicitly standing out from the rest. However, in smaller communities the centrality seems to be lower with both measures. Should this be surprising?  Isn’t it logical when being part of a smaller sub cluster that closeness centrality and eccentricity decline as smaller groups are per se connected to fewer nodes in the network?
Baker, R., & Siemens, G. (2014). Educational data mining and learning analytics. Cambridge Handbook of the Learning Sciences
Burt, R. S., Kilduff, M., & Tasselli, S. (2013). Social network analysis: foundations and frontiers on advantage. Annual review of psychology, 64, 527-547. doi: 10.1146/annurev-psych-113011-143828 (full text)
Festinger, L., Schachter, S., Back KW (1950). Social Pressures in Informal Groups. Stanford. CA: Stanford University Press.
Grunspan, D. Z., Wiggins, B. L., & Goodreau, S. M. (2014). Understanding Classrooms through Social Network Analysis: A Primer for Social Network Analysis in Education Research. CBE-Life Sciences Education, 13(2), 167–178. doi:10.1187/cbe.13-08-0162 (full text)
Hanneman, R. A. & Riddle, M. (2005). Introduction to social network methods. Riverside, CA: University of California, Riverside (full text).
Hirst, T. (2010, April 16). Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I, Retrieved November 11, 2014, from http://blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-visualisation-app-my-facebook-network-part-i/
Hogg, M. A. & Vaughan, G. M. (2014), Social Psychology, 7th Edition, Pearson Education
Newman, M. E. J. (2006). “Modularity and community structure in networks”. Proceedings of the National Academy of Sciences of the United States of America 103 (23): 8577–8696.
What is it that keeps us going? Is it our self-motivation or is it peer pressure? Is it both? Is it something completely different?
High aspirations can push you forward or hold you back. They can make you do better than ever and enjoy the things you do. They can make you feel deeply desperate about a world that does not share your ambitions and work-pace. Both perceptions notably occur when working in groups is involved.
How do you know, that you are doing well? Take an individual grading based on grading scales from 1-6 or 0%-100%. The basis is provided for continuous competition. Whether this grade says anything about how much you know or how well you are doing – it is a measure and it is recorded. For group work one could implement the possibility to opt for individual and group grading (meaning that individual contribution was made visible within the group project).
Switching to a Fail-Pass-Pass-with-Distinction system has been thought-provoking for me lately. Not about grading systems but about what and who defines others as doing well and how this influences individual/group behavior. Giving the pass-fail approach – I had an interesting discussion with two of my peers. How can we distinguish transparently between those that do good and those that do better? Is it simply enough to know that ourselves (how transparent is this then)? Or is it enough that the lecturer knows it and will keep that in mind for possible future PhD applications (if so, what is the benefit if someone does not want to stay in the research environment)? This might work if everything goes the way we expected it to be. But what if there is doubt and for a pass-fail decision there is no objective comparison between “competing” students at hand? Reflecting these questions it seems to me, that – independently of the grading system – we are focusing so much on the output of a learning process and forget the process itself.
In the business environment there seems to be a common agreement: you can only judge what you can measure. In the end you are responsible to proof what you did. And what you did not do. I tried to imagine a project evaluation by letting the project pass or fail. What would that change? Imagine there are clear expectations set but you can only pass or fail. There would be a hard separator between 49 and 51% (taking into account that these numbers would not be mentioned in the evaluation). So how would you consider one project over the other? By knowing it for oneself or believing in the project steering to keep in mind which project did better? By comparing the project records in detail? Again: this may work out well if we assume that people are (always)
- good and they want you to do well,
- objective and fair,
- able to memorize the way work was done and can compare different projects unbiasedly.
But to be honest, this is not how real life always treats you – and as mentioned before – we are again outcome oriented. What really happens when it comes to group work is more complex than it could be measured by comparing final results. Take the social loafing phenomena (described as the loss of motivation when working in groups compared to individual work where work results are not merged): “hiding” in a group and not participating (because “the rest” will to the work) could be best prevented by setting an attractive common goal, increasing group identification and making each individual visible by avoiding incognito states (as we loaf because we expect others to do the same and try to preserve equity, because we feel anonym and not to be identified and because there is a lack of performance standard). I am not talking about output; I am talking about the group working process. On the other hand, there is this phenomenon of social compensation. This basically is the opposite of social loafing. People engage even more in group work because they know/assume/fear that co-workers/peers won’t/can’t work hard enough to reach the set goals. [Hogg, Michael A.; 2011] Once more, the
route roote cause is the process of work, not the result. In addition, a very simplified second conclusion is that it is the individual recognition that counts towards connecting aspirations and our motivation to stay focused and on track.
To support the individual learner and the group outcome one solution would be to detect issues in the working processes before the result is presented. Definitely, we could rely on the group members to be objective and report such issues or on the lecturer/department leader to identify and sense tensions. But is this realistic taking into account the vast amount of technology used and communication around the project that is transferred to the digital environment? You cannot just trust your eyes and ears anymore, but you must be able to analyze patterns group members follow while using technological support.
In addition I am not talking about storing (intermediate) results (as I do it in this blog here for example). I am talking about the factual analysis of user data to identify patterns, workloads and routines. In anticipation on the critiques: yes, one could argue, that this is the control/observation/tracing of people by using information technology and data analyzing features. I want to argue that because people use information technology we need data analysis to acknowledge their work in the digital space.
Which leads back to the high aspirations mentioned earlier: just because I try to be as objective and unbiased about others and their work – may I assume that others try that is well? Should I set a good example and trust in others to evaluate the work done as a whole? If I do answer yes to all of these questions, does that mean I should not try to evaluate work more objectively? And when work becomes more complex, especially the progress until we reach a certain outcome, should I not use information technology (for the good cause) to improve my objectiveness?