UTArlingtonX: LINK5.10x Data, Analytics, and Learning or #DALMOOC (Week 7) – Part I

If I was a Text Mining Expert …

I would work on a model that could predict not only spelling and grammar mistakes but that could help investigating why these mistakes were made. And I would ban the words „mistake“ & „error“ from this model because defining something as such has drastic consequences: the assumption that there is only (a predefined) right or wrong. (Whereas I have to admit that I am still looking for an adequate substitute). Questioning why is made so much harder – because obviously simply choosing the right answer by a right click in the spelling check software saves a lot of time.

While this might appear to be an unsolvable quest at first sight, I think there are numerous patterns that could help to develop such a model. This text is written in English. But I am not a native speaker. I do speak 3.5 languages. Starting a new one is always a special challenge. Because the more languages you learn, the more context exists to frame the new language. This can be good. You know, which words you have to learn first for daily conversations, which grammar structures are particularly useful. You know where to start.

But one thing always stays the same: I am trying to find out WHY I make certain mistakes. This goes beyond simply knowing THAT I made a mistake. It is to find out WHERE I got the structure or word from I am not applying adequatly. It is to understand that the mistake could possibly be correct in another language, make the connection to the current language and store this connection.

(This is how it works for me. And I think that others can benefit from this as well. I am not a professional linguist so this is not grounded on any well-established theory or such. I guess there might be several research undermining or supporting this idea.)

Bist du das? ≠ = Are you it?

Literal interpretation is one example. While the English native would understand at least what „Are you it?“ is supposed to mean („Is it you?“) – does this mean that the translation is wrong? Yes – says the English teacher. No – I say.  I speak both languages and thus can make sense out of it and know what the other person wanted to ask. Isn’t that weird? At first it seemed so plain that the question „Are you it?“ is wrong – but why would some people then understand it? Because of the similar background/ context (of language) they have. Because they understand WHY someone translated the question this way.

We have so much data sources to choose from for translations. Why can’t we make the spelling check process more individual? One could implement settings to choose native language and other languages learned so far. When a mistake is detected, a model could be applied to detect if this is a simple typo or a systematic error that can / can not be connected to another language. Imagine the potential of evaluating writing patterns (e.g. as already available for messaging in Android systems: the system is guessing your next word): You could get a summary of frequently occurring mistakes and working on these in future.

Until my breakthrough with my „why-you-made-this-mistake-model“

Could I ask you for a favor? If you know someone in your social environment who is learning a new language and he/she is asking for a word: Please don’t just translate the word into a language that is easier for him/her! Try to explain the word in the language he/she is learning and give as much context as possible. It helps a lot. I know, it’s not always easier. Because the faster way is a simple translation (like the right-click in your spell-check). But on the long-run, putting language in context is worth the additional expenditure of time.

Advertisements

UTArlingtonX: LINK5.10x Data, Analytics, and Learning or #DALMOOC (Week 4)

Social Network Analysis in the area of conflict between being analytic & considering educational research

Discussing assets and drawbacks of learning analytics (LA) and social network analysis (SNA) in particular leads to discussing the objectivity of data collection. Applying these methods not only enables live-data tracking and analysis (as opposed to self-reported pre-/post-course surveys) but the objective analysis of learner’s traces in the learning environment (though this does not account for a complete picture of the learning which takes place). However, the potential stays untapped if underlying pedagogical and epistemological assumptions are not taken into account.

Gašević, D., Dawson, S., Siemens, G. (2015) point this out as they claim LA to connect existing research on learning, teaching, educational research and practice. They bemoan missing studies which evaluate the concept of the established lead indicators of LA. Three major themes in their paper are the detailed description of how tools are used, looking beyond frequency of activity and time spent on task by analyzing individual learning strategies and products  and the connection of internal and external conditions for data. Within this context the base for effective LA visualizations lies in the consideration of instructional, learning and sense-making benefits – foundations in distributed cognition and self-regulated learning being a potential direction of future research.

„The Epistemology–Assessment–Pedagogy triad“

Knight, S. et al. (2014) identify LA as implicitly or explicitly promoting particular assessment regimes in the epistemology, assessment and pedagogy triad. A wonderful, more detailed reflection on this triad in connection to learning analytics can be found here (authored by Classroomaid). Suthers, D. D. & Verbert, K. (2013) define LA as the „middle space“ between learning and analytics. In their paper they elaborate three main themes for future research in the field of LA: „the middle space“ (focus on the intersection between learning and analytics, avoiding the emphasis of one of these themes), „productive multivocality“ (facing the challenge of unifying a multifaceted research field by focusing on analyzing a common data ground) and „the old and the new“ (enhancing learning as a century-old idea that is continuously accompanied by new tools) (pp1). Given the rich online learning landscape, clustering learning environments can be the first step of detecting characteristics, underlying epistemology-assessment-pedagogy beliefs and thus identifying the appropriate measures of learning analytics.

"The Epistemology–Assessment–Pedagogy triad", adapted from Knight, S. et al. (2014), p4
„The Epistemology–Assessment–Pedagogy triad“, adapted from Knight, S. et al. (2014), p4

For example Rodriguez (2012) classified MOOCs as either c-MOOCs (following the connectivist tradition) or x-MOOCs („AI-Standford like courses“, following the cognitive-behaviorist tradition). It is important to note here, that the term „x-MOOCs“ was not coined by Rodriguez, but Liyanagunawardena, T., et al. (2013) establish ties to Daniel (2012), where they detected similar definitions and thus combined both papers. For his classification Rogriguez used Anderson & Dron’s (2011) paper on „Three Generations of Distance Education Pedagogy“ where they coin three DE pedagogy concepts. Bringing the triad back into focus: it’s the pedagogy concept that supports his classification – differentiating between connectivistic and cognitive-behavioristic pedagogy.

The implication of this classification is a different view on teaching, social and cognitive presence in the online learning environment. This view needs to be considered when analyzing the underlying epistemological concept and the assessment formats. Besides common features, this relates especially to the role of course instructors, the definition of openness (access vs. openness to personalized learning), connectedness and guidance. Knowledge is either generative (c-MOOC) or declarative (x-MOOC). By saying so, without a coherent triad the best assessment strategy does not tackle the real learning happening. Furthermore, the triad can be used to continuously challenge the assumptions of each corner.

Active Learning – an example for effective college classroom practice

One example of this triangular interplay is active learning. Grunspan et al. (2014) mention the effectiveness of active learning in college classrooms and as a result they explore this practice with the support of Social Network Analysis. However, the triad does not implicitly determine direction and interdependencies. Critical reflection of all intersections is necessary. This exemplifies the importance of meta studies, e.g. „What makes great teaching? Review of the underpinning research“ by Robert Coe, Cesare Aloisi, Steve Higgins and Lee Elliot Major (2014). A call for a constant challenge of learning assumptions, the participation in the on-going research process and the relation and integration of own research efforts go hand in hand with the importance of comparing different underlying concepts and critically questioning if researchers are particularly considering the same concepts. As one example of this meta study, active learning and its outcomes for the learning progress seem to contradict Grundspan et. al. By relating to a learning pyramid, Coe et al. argue that memory and remembering is not evidently based on being better when participating actively or passively („Ensure learners are always active, rather than listening passively, if you want them to remember (p24)). Simply the different level of complexity of both concepts (active learning and active/passive listening in relation to memorizing) discloses that we are not exposed to the same ideas here. However, these approaches seem to be connected and point in opposing directions of sound research evidence. When validating and examining existing research, it becomes more and more important to nail the underlying assumptions and research questions to create reliable conclusions for further research.

Six different case studies: Learning design can influence learner’s decision

The course presents six different case studies which emphasize the claim that learning design can influence learner’s decision. By doing so, the above triad is underpinned again – not only focusing on practical applications but also giving a direction for future research approaches that critically question underlying concepts of epistemology, pedagogy and assessment.

1. Instructor-centered network: Bringing together SNA and learning design [Lockyer, L., Heathcote, E., & Dawson, S. (2013)]

Lockyer et al. explore how the framework for the interpretation of LA might lie in the learning design. By using case-based learning they examine a concept of checkpoint and process analytics to analyse learning design embedded in a context, in real-time and behavior-based (a more narrowed-down application of LA). Hence, learning design and analytic design are connected to support learning and teaching decisions. They also propose different directions for future research including „engaging teachers and students in understanding and using visual patterns of interaction as a means to encourage learning activity; scaling up to larger numbers of classes, providing a base for comparing statistically the observed to expected analytics of behaviors and interactions; and using results to provide meaningful feedback to teachers on how their learning design is meeting their pedagogical goal and to assist them in decisions around design and pedagogical change in real time.“ (p1455)

2. Sense of community [Dawson, S. (2008)]

The research question in this study is „Is the composition of social networks evolving from a unit discussion forum related to the sense of community experienced among the student cohort?“ (p226). In general, this deals with the question of belonging vs. isolation, or better: with the extend to which learners benefit educationally from belonging and how this can operate as a predictor for students success (as social integration is strongest predictor for retention and completing university degree). The underlying educational concept is community-centered teaching practise, based on social-constructivistic ideas of Dewey and Vygotsky. The novelty with this study is that self-reported surveys are not the only data source anymore. It is a mixed method approach used here which focuses on quantitative (Classroom Community Scale, SNA centrality measures) and qualitative measures (discussion forum content, student interviews). As a result, Dawson found a association between the network position and the sense of community, in detail a positive association with closeness and degree centrality; a negative with betweenness centrality (dilemma of brokerage). In addition, pre-existing external social networks influence the type of support/information required. Concerning future research Dawson points towards the investigation of the relation between social networks and other measures having an influence on the learning environment such as pedagogy, practitioner personality and cohort demographic profiles.

3. Network brokers associated with achievement and creativity [Dawson, S., Tan, J. P. L., & McWilliam, E. (2011)]

Dawson, S. et al. discuss the correlation of cognitive playfulness to the network position where degree and betweenness centrality are oppositional to closeness (as they are positive indicators of a learner’s creative capacity) . By answering their research questions „What is the relationship between a student’s social network position and perceived creative capacity? To what extent do discussion forum mediated social networks allow for the identification and development of student creativity?“ they claim that SNA can provide insight in the creativity of students as well as a tool for instructors to monitor the learner’s creative capacity level. The individual’s self-reported creativity score thus corresponds with the overall social network position. Creativity is perceived as highly valued graduate asset.

4. SNA for understanding and predicting academic performance [Gašević, D., Zouaq, A., Jenzen, R. (2013)]

Gašević, D. et al. studied cross-class networks and the importance of weak ties by considering the relationship between academic performance and socal ties. The base for this study is social capital and network learning research. Two hypothesis where investigated: (1) „students’ social capital accumulated through their course progression is positively associated with their academic performance“; and (2) „students with more social capital have a significantly higher academic performance.“ Based on the ideas of Vygotsky one practical implication is the conception of new social ties in each course during degree programs.

5. SNA and social presence (What is the association between network position and social presence?) [Kovanović, V., Joksimović, S., Gašević, D., Hatala, M.]

This study focuses on the Community of Inquiry model, specifically on the social presence as one contributor to educational experience. Social presence consists of three parts, namely affectivity and expression, interactivity and open communication and cohesiveness. By analyzing the underlying social processes that contribute to the development of social capital, Kovanović, V. et al. give an insight in how affective, cohesive and interactive facets of social presence significantly predict the network centrality measures commonly used for measurement of social capital. Social constructivist pedagogies and the shift towards collaborative learning can be seen as underlying educational concepts. The research question „What is the relationship between the students’ social capital, as captured by social network centrality measures, and students’ social presence, as defined by the three categories in the Community of Inquiry model?“ (p3) leads to the results that interactive social present is „most strongly associated with all of the network centrality measures, indicating a significant relation with the development of the students’ social capital.“ In conclusion, in-degree and out-degree centrality measures were predicted by all categories of  social presence whereas betweenness centrality was predicted by interactive and affective categories.

6. SNA and understanding of MOOCS [Skrypnyk, O., Joksimović, S. Kovanović, V., Gasevic, D., Dawson, S. (2014)]

Skrypnyk, O. et al. explore the learning environment of a cMOOC to identify and understand important key actors. Although in this study the facilitators continued to occupy a central role, other actors emerged and complemented this picture. This was based on the two research questions „What is the influence of original course facilitators, course participants (ie., learners) technological affordances on information flows in different stages of a cMOOC?“ and „What are the major factors that influence the formation of communities of learners within the social network developed around a cMOOC?“. As a result, types of authorities can be classified as „hyperactive aggregators“ and „less visible yet influential authorities“. For the former, there might be an existing connection to natural personality traits. Another outcome is the importance of hashtags for information flow and community construction within a more learner-centered environment supported by software.

Application in Gephi and Tableau

By combining Gephi and Tableau I created a dashboard, where it is possible to see different ways of visualizing the same data source. Please click the image to enlarge it and see descriptions for the single sheets on the dashboard. As my first attempt, this dashboard shall illustrate how conclusions could be drawn from one single glance.

For example, the network analysis with Gephi detects the network structure and reveals node 3 as a central key actor when it comes to the measure of degree (top right). 9, 10 and 11 follow with some distance. The same pattern can be detected in the top left visualization, where size represents degree as well, but the color reveals betweenness. We can conclude, that degree and betweenness are correlated, as a decreasing quantity of connected nodes goes hand in hand with a decreasing betweenness (not surprisingly, but due to the image more visual). The Degree sheet down left specifies in- and out-degree values – again we see the network key players 3, 9, 10, 11 but here we can specify their communication patterns. Whereas 3 has the highest out-degree but the smallest in-degree, it’s actor 11 that has the highest in-degree and the smallest out-degree. Dependent on the underlying question, we could draw better conclusions at a glance from this data.

First dashboard trial

Still, there are some open questions regarding Gephi from my last post but I found a nice point of departure from @Edu_k ’s blog post on Gephi Layouts.

Resources

Anderson, T. and Dron, J. (2011). Three Generations of Distance Education Pedagogy,International Review of Research in Open and Distance Learning, Volume 12, Number 3. Retrieved from http://www.irrodl.org/index.php/irrodl/article/view/890/1663

Classroomaid (2014-11-14). Our Learning Analytics are Our Pedagogy, Are They? (#xAPI, #dalmooc), retrieved on 2014/10/23 from http://classroom-aid.com/2014/11/14/our-learning-analytics-are-our-pedagogy-are-they-xapi-dalmooc/

Daniel, J 2012. Making Sense of MOOCs: Musings in a Maze of Myth, Paradox and Possibility.Journal of Interactive Media in Education 2012(3):18, DOI: http://dx.doi.org/10.5334/2012-18

Dawson, S. (2008). A study of the relationship between student social networks and sense of community. Educational Technology & Society, 11(3), 224–238 (full text).

Dawson, S., Tan, J. P. L., & McWilliam, E. (2011). Measuring creative potential: Using social network analysis to monitor a learners’ creative capacity. Australasian Journal of Educational Technology27(6), 924-942 (full text).

Edu_k (2014/11/14). Social capital in SNA for LA – too much focus on individuals at a cost of the group, retrieved on 2014/11/24 from http://nauczanki.wordpress.com/2014/11/14/social-capital-in-sna-for-la-too-much-focus-on-individuals-at-a-cost-of-the-group/

Gašević, D., Dawson, S., Siemens, G. (2015). Let’s not forget: Learning analytics are about learning. TechTrends (in press),http://www.sfu.ca/~dgasevic/papers_shared/techtrends2015.pdf

Gašević, D., Zouaq, A., Jenzen, R. (2013). Choose your Classmates, your GPA is at Stake!’ The Association of Cross-Class Social Ties and Academic Performance. American Behavioral Scientist, 57(10), 1459-1478. doi: 10.1177/0002764213479362 (full text).

Grunspan, D. Z., Wiggins, B. L., & Goodreau, S. M. (2014). Understanding Classrooms through Social Network Analysis: A Primer for Social Network Analysis in Education Research. CBE-Life Sciences Education, 13(2), 167–178. doi:10.1187/cbe.13-08-0162 (full text)

Knight, Simon; Buckingham Shum, Simon and Littleton, Karen (2014). Epistemology, assessment, pedagogy:
where learning meets analytics in the middle space. Journal of Learning Analytics (In press).

Kovanović, V., Joksimović, S., Gašević, D., Hatala, M., “What is the source of social capital? The association between social network position and social presence in communities of inquiry,” In Proceedings of 7thInternational Conference on Educational Data Mining – Workshops, London, UK, 2014 (full text).

Liyanagunawardena, T., Adams, A., & Williams, S. (2013). MOOCs: A systematic study of the published literature 2008-2012. The International Review Of Research In Open And Distance Learning, 14(3), 202-227. Retrieved from http://www.irrodl.org/index.php/irrodl/article/view/1455/2531

Lockyer, L., Heathcote, E., & Dawson, S. (2013). Informing pedagogical action: Aligning learning analytics with learning design. American Behavioral Scientist, 57(10), 1439-1459, doi:10.1177/0002764213479367 (full text).

Rodriguez, C. O. (2012). MOOCs and the AI-Stanford like courses: Two successful and distinct course formats for massive open online courses. European Journal of Open, Distance and E-Learning. Retrieved from http://www.eurodl.org/?p=Special&sp=init2&article=516

Suthers, D. D., & Verbert, K. (2013). Learning analytics as a “middle space.” In Proceedings of the Third International Conference on Learning Analytics and Knowledge (pp. 1–4). New York, NY, USA: ACM. doi:10.1145/2460296.2460298

Skrypnyk, O., Joksimović, S. Kovanović, V., Gasevic, D., Dawson, S. (2014). Roles of course facilitators, learners, and technology in the flow of information of a cMOOC. British Journal of Educational Technology(submitted) (full text).

UTArlingtonX: LINK5.10x Data, Analytics, and Learning or #DALMOOC (Week 3)

Before I start

This post is one of the longest I have written so far (it took my about a week to finish it), and I still have some open questions: 

  1. […] what determines the centrality of single nodes or clusters [when applying layouts in Gephi]? 
  2. […] what [does] the modularity class tell me about the community?
  3. Does this exemplify the importance of additional measures besides „degree“? As the number of connections does not necessarily correspond to the ability to serve as a bridge?
  4. Isn’t it logical when being part of a smaller sub cluster that closeness centrality and eccentricity decline as smaller groups are per se connected to fewer nodes in the network? 

These questions can be found in the text marked in bold where they are also embedded in the context to offer more information. I would be very happy, if people interested in Social Network Analysis (and Gephi) could help me finding the answers. Thank you in advance 🙂

Social Network Analysis (SNA) – that has a familiar ring!

When hearing about SNA in week 3, I remembered reading this term in some of the last weeks‘ resources. Baker, R., & Siemens, G. (2014) frame SNA as one of four approaches of structure discovery of data in EDM/LA which can be seen as opposite to prediction (there is no priori idea of a predicted variable). SNA reveals the structure of interaction by analyzing the relationship between individual actors. Shum & Ferguson introduce SNA as a „possibility“ offered by Social Learning Analytics (SLA). This possibility is social itself – as for example discourse analysis – but opposed to social learning disposition analytics or social learning content analytics, which need to be „socialized“ first. So again, this sounds very social, and I am wondering if Social Learning Analytics and Social Network Analysis can combine three of my areas of interest: (Social) Psychology, Learning and Analytics.

SNA – What are we talking about?

Social Psychology – a friend I believed to be lost – is striking back as powerful and drawing interest as usual. Burt, Kilduff & Tasselli (2013) mention two facts established by Social Psychology that network models of advantage („as a function of breadth, timing and arbitrage) build upon. These are

  1. people form groups based on where they meet
  2. within a group communication is more influencing and frequent than between different groups (as similar views develop).

As potential sources they mention Festinger et al. 1950 – which developed the idea of group cohesiveness and on page 151 of their book it says

The gist of these conclusions may be summarized as follows: In a community of people who are homogeneous with respect to many of the factors arising from the arrangement of houses are major determinants of what friendship will develop and what social groupings will be formed. These social groupings create channels of communication for the flow of information and opinions. Standards for attitudes and behavior relevant to the functioning of the social group develop, with resulting uniformity among the members of the group. Other people deviate because they were never in communication with the group.

This is a concept we can observe in our daily life. Imagine a university course starting and some students joining a little later than the rest. A loose group has already formed, the information flow is established. A new member joins this network and wants to put herself on equal footings with the others. Group cohesiveness represents „the property of a group that effectively binds people, as group members, to one another and to the group as a whole, giving the group a sense of solidarity and oneness.“ (Hogg, M. A. & Vaughan, G. M. (2014), p288). In this respect one has to bear in mind that a lot of consequences result from cohesiveness, where positive and negative lie close to each other. In- and out-groups form, which can be positive for the in-group members. On the other hand, out-group members are excluded, the base for the emergence of prejudice. Many more social phenomena could be named here – something SNA may not overlook.

Haythornthwaite (1996) combines the social and analytics perspectives on Social Network Analysis in an interesting way. She mentions that compared to other analysis techniques SNA focuses on relationships and their patterns and contents. Thus it „strives to derive social structure empirically, based on observed relationships between actors, rather than on a priori classifications“ (p325). The world is hence explained by networks, not by groups. Relationships and ties are described in relation to content, direction and strength (I will focus on the two latter ones).

Direction is asymmetrical when the information flow is one-way only and direct/undirect classifications represent if the direction of flow is either not measured/not considered relevant. In terms of strength (intensity of a relationship, in addition to the mere existence) either the number of ties and/or their strength can be examined. Haythornthwaite is introducing five network principles:

  • cohesion (grouping nodes regarding strong common relationships by e.g. density and centralization; clusters and cliques)
    • density (degree to which members are connected to all other members)
    • centralization (extend to which a set of actors are organized around a central point)
    • clusters (subgroup of highly interconnected actors)
    • cliques (fully connected clusters)
  • structural equivalence (grouping nodes regarding their similarity)
  • prominence (the node in charge)
    • centrality (differs from centralization as it measures the node’s connections in the network rather than measuring the configuration of the  network)
    • global centrality or closeness (shortest path between an actor and every other actor in the network)
  • range (a node’s network extent) and
  • brokerage (bridging connections to other networks)
    • betweenness (extent to which an actor sits between others in the network, playing a role as an intermediary)

So if we see SNA as the source of tools for the analysis of relational data (Grunspan, D. Z., Wiggins, B. L., & Goodreau, S. M. (2014)), we can detect two classes of hypothesis: why are relations formed and what are the outcomes of these relations? They also argue, that these questions are important, as the learner’s position seems to be correlated with her performance. So in order to understand networks, we need to understand the determinants, structure and consequences of relationship between actors. That needs to be considered in situations where social support / connections are to influence the outcomes of interest.

With this in mind, the main methods for SNA are modularity, density and centrality.

Modularity describes a way of quantifying the concept of community structure. In brief, it is calculated by subtracting the number of ties falling within groups subtracted from the expected ties in a similar network with nodes placed at random. [Newman, M. E. J. (2006)] This concept incorporates the idea that simply taking the number of ties between two groups one would expect at random is not meaningful until there is no comparison between expected numbers and actual present numbers.

Density in the words of Hanneman & Riddle (2005) is „the proportion of all possible ties that are actually present.“ It is calculated by forming the sum of ties divided by the number of possible ties. In a fully connected network or subgroup of a network (a clique) would have the density of 1.

Network types can be described as unipartie (one type of actors) vs. bipartie (actors linked with the group to which they belong); undirected vs. directed (please see the Facebook network example below) as well as binary (simple existence) vs. valued ties (additional quantitative data). When talking about actor-level variables there are several proposed measurements of centrality existing: degree centrality (total number of connections a node has; in directional relation networks including in- and out-degree), betweenness centrality (actors serving as bridges in the shortest paths between two actors), closeness (how close one actor is to other actors in average) and eigenvector centrality (being connected to other well-connected nodes) (based on Grunspan, D. Z., Wiggins, B. L., & Goodreau, S. M. (2014)).

SNA in action – My Facebook network analysis in Gephi

I was surprised how easy it actually was to gather information around my Facebook network. And in addition, it felt weird. After receiving the data by running the Facebook app Netvizz (which creates a file that can be used by Gephi to analyse the network) I decided not to mention individual personal data to avoid drawing detailed conclusions from the below graphs. They are used to visualize the main methods introduced above and to briefly discuss some striking results of the analysis.

My Facebook network visualized in Gephi - First attempt
My Facebook network visualized in Gephi – First attempt

Gephi can be used for both – visualizing a network in the format of sociographs and identifying datasets of individual actors (nodes) and clusters in this dataset. The first step after running Netvizz is to integrate the dataset as a directed or an undirected network. As I am dealing with Facebook, the idea of a connection between two actors (a tie between two nodes) is that both agree to become Facebook friends. Thus, the direction is not relevant for the analysis. Hirst, T. (2010, April 16) describes this in his blog, comparing Facebook to Twitter, where direction would matter. As a result, I chose an undirected network for my analysis. From the import report we can see, that there are 320 actors (friends) in my network, connected by 3182 ties (connections).

Gephi import report for my Facebook network analysis
Gephi import report for my Facebook network analysis

After pressing the OK button, the network looks similar to the first graph of this section. It is an „imbroglio“ of nodes and ties, not really interpretable. So it would be useful to apply some methods introduced above, starting with density.

The density has been calculated with 0,062. Strictly speaking, my Facebook network does not take advantage of its potential as it is not using all the possible connections. Whereas this might correspond to the basic interpretation of density, this is only applicable in a limited way here: there are many subgroups in my network that might have a higher density. In addition, one has to interpret the word „potential“ here: yes, there are potential connections that could be made, but does that mean, they would change the quality, or the information flow of the network?

Applying nodes ranking degree (colour) in Gephi
Applying nodes ranking degree (color) in Gephi
Applying nodes ranking degree (size) in Gephi
Applying nodes ranking degree (size) in Gephi

For a more interpretable visualization, I applied measurements of centrality. The first one is degree centrality, offering the total number of connection a node has (as we are dealing with a indirectional relation network, there is no in- and out-degree). In my network, there is one actor with the highest degree centrality followed by three friends with a quite similar degree centrality – all of them are in my eyes well-connected in Facebook terms. But one has to be careful about making any qualitative judgements: degree centrality is offering an overview of connections quantity, not quality.

There are so many layouts in Gephi to apply for a better visualization but I have no idea of how they work in particular. In my case, I was applying Fruchterman Reingold („a classical layout algorithm, since 1984; rated with 2/5 stars in quality, 3/5 in speed“). So the result looks like this:

Applying the Fruchterman Reingold layout in Gephi
Applying the Fruchterman Reingold layout in Gephi

From my understanding, the layout visualizes sub clusters of the network and arranges them dependend on how close they are to each other in a given area of a circle. [1] However, I am not quite sure what determines the centrality (here ment to be literally as being arranged more centrally in the area of the circle) of single nodes or clusters. What is striking in my Facebook network is that the formerly defined high-degree actors all belong to one cluster (the upper right one), and in addition we can find two more clusters. In general, it seems applicable to define these clusters as the „home-cluster“, the „work-cluster“ and the „university cluster“.

Let’s go for the modularity now to identify more relation patterns in this network. The results are as follows: there are 23 communities. In the size distribution chart, the number of nodes (size) is related to the modularity class. [2] In this context, I am not sure, what the modularity class tells me about the community. The biggest community has about 90 nodes and a modularity class of 19. Whereas there are some small communities with only 1 node but a higher modularity classes.

Gephi modularity report for my Facebook network
Gephi modularity report for my Facebook network

To get rid of too small communities, my next idea was to filter out groups that are smaller than 3 nodes. (This can be done by setting the degree range filter in the topology folder and changing it to the range of 3-70). Running the density and modularity statistics again resulted in a density of 0,08 now (former: 0,062), 9 communities instead of 23 and a slighty lower modularity of 0,686.

Modularity report after filtering the degree range 3-70
Modularity report after filtering the degree range 3-70

Some additional measures and information: By filtering the nodes, the overall number was reduced to 281 actors and 3142 connections. The average degree of an actors is 22,363 and the network diameter 6 (meaning that the greatest path between two actors is covered by 6 nodes).

My Facebook network with the node partition modularity class and the nodes ranking (size) of betweenness centrality
My Facebook network with the node partition modularity class and the nodes ranking (size) of betweenness centrality

The partition function in Gephi is useful to visualize the 9 identified communities (I had to re-apply the layout, so now the position of the communities does not correspond completely to the former graphs, it is turned by 45 degree clockwise). In addition to this partition by modularity class I applied the betweenness centrality measure (actors serving as bridges in the shortest path between two actors). I am focusing on two actors here (the two biggest nodes, one light green, one light blue). Whereas the light blue one corresponds to one of the actors identified by the degree measure, the light green actors is a „new“ one, not identified by the degree measure before. [3] Does this exemplify the importance of additional measures besides degree? As the number of connections does not necessarily correspond to the ability to serve as a bridge? In more quantified terms, betweenness centrality measures how often a node appears on shortest paths between nodes in the network. So actors identified by this measure could be valuable for transfering information from one group to the other. And believe it or not, the actor indentified by Gephi is a friend of mine that I always ask for „the gossip“ and he/she knows almost everything about different groups – without being connected to most of the people in my network.

My Facebook network with the node partition modularity class and the nodes ranking (size) of closeness centrality
My Facebook network with the node partition modularity class and the nodes ranking (size) of closeness centrality
My Facebook network with the node partition modularity class and the nodes ranking (size) of eccentricity
My Facebook network with the node partition modularity class and the nodes ranking (size) of eccentricity

Two additional measures I applied can be seen in the graphs above: closeness centrality (as how close one actor is to other actors in average or the average distance from a given starting node to all other nodes in the network) and eccentricity (the distance from a given starting node to the farthest node from it in the network). The most striking fact is that there is no actor of the network explicitly standing out from the rest. However, in smaller communities the centrality seems to be lower with both measures. Should this be surprising? [4] Isn’t it logical when being part of a smaller sub cluster that closeness centrality and eccentricity decline as smaller groups are per se connected to fewer nodes in the network? 

Resources

Baker, R., & Siemens, G. (2014). Educational data mining and learning analytics. Cambridge Handbook of the Learning Sciences

Burt, R. S., Kilduff, M., & Tasselli, S. (2013). Social network analysis: foundations and frontiers on advantage. Annual review of psychology, 64, 527-547. doi: 10.1146/annurev-psych-113011-143828 (full text)

Festinger, L., Schachter, S., Back KW (1950). Social Pressures in Informal Groups. Stanford. CA: Stanford University Press.

Grunspan, D. Z., Wiggins, B. L., & Goodreau, S. M. (2014). Understanding Classrooms through Social Network Analysis: A Primer for Social Network Analysis in Education Research. CBE-Life Sciences Education, 13(2), 167–178. doi:10.1187/cbe.13-08-0162 (full text)

Hanneman, R. A. & Riddle, M.  (2005). Introduction to social network methods.  Riverside, CA:  University of California, Riverside (full text).

Hirst, T. (2010, April 16). Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I, Retrieved November 11, 2014, from http://blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-visualisation-app-my-facebook-network-part-i/

Hogg, M. A. & Vaughan, G. M. (2014), Social Psychology, 7th Edition, Pearson Education

Newman, M. E. J. (2006). „Modularity and community structure in networks“. Proceedings of the National Academy of Sciences of the United States of America 103 (23): 8577–8696.

Shum, S. B., & Ferguson, R. (2012). Social Learning Analytics. Educational Technology & Society, 15(3), 3-26.

UTArlingtonX: LINK5.10x Data, Analytics, and Learning or #DALMOOC (Week 2)

Interweaving resources and competencies

In my last post about the #DALMOOC I described the course structure, its challenges and relected on the content. It is a nice structure but it does not contribute to readability. This time I will try a different approach. By pointing towards the three exclamation marks (in the figure below) as main themes of the post, I will interweave different resources, competencies and assignments.

The data cycle in data analytics

The data cycle adapted from Siemens, G. (2013). Learning analytics: The emergence of a discipline. American Behavioral Scientist.
The data cycle adapted from Siemens, G. (2013). Learning analytics: The emergence of a discipline. American Behavioral Scientist.

The data cycle is the main theme for week 2’s content of #DALMOOC. This data cycle (or data loop) consists of seven successive steps which are necessary to make learning data meaningful. Data Cleaning and integration can only take place after a potential data source has been identified and the data has been stored. The cleaning and integration part is especially important when it comes to combining different data sets as both need to be able to „communicate“ with each other. The figure visualizes that the actual analysis of data is only the 5th step in the cycle – highlighting the importance of planning the analytics process. The action taken based on data relies mainly on the representation & visualization of it. By representing and visualizing them, data (or better: our analysis results) is „socialized“ to ease the understanding of the analyzing results. [related to Competency 2.1: Describe the learning analytics data cycle]

No word on learning so far, but before diving deeper in the data cycle context I want to come back to my definition of learning analytics: „The field of Learning Analytics offers methods to analyse learners’ behavior in a learning environment and by this providing groundwork and significant potential for the improvement of learning environmentsand individual learner’s feedback and learning outcomes“  The bold parts have been added with regards to Shum, S. B., & Ferguson, R. (2012) [Social Learning Analytics. Educational Technology & Society, 15(3), 3-26]. I adapted my definition as the term „potential“ points towards the possible future development of (S)LA and „learning outcomes“ as it adds a significant part to what needs to be improved: the result of a learning process.

Shum & Ferguson also introduce the interesting concept of Social Learning Analytics (SLA). It is based on learning theories and pinpoints learning elements that are significant in a participatory online culture. Primarily, they acknowledge that learners are not learning alone but engaging in a social environment, where they can interact directly or their actions can be traced by others.

While this is a new context for thinking about analytics, we need to understand the possibilities offered by SLA, which are either social itself (e.g. social network analysis, discourse analytics) or can be socialized (e.g. social learning disposition analytics, social learning content analytics). Moreover the challenge of implementing these analytics is still present. Shum & Ferguson emphasize the impact of authentic learning from real-world context through the use of practical tools. Important features of a participatory online culture are the needs for a complementary open learning platform („digital infrastructure“), the understanding of how open source tools can enable people to use the potential of these tools („free and open“), the importance of SLA as a part of individual identity and credibility of skills („aspirations across cultures have been shifting“), SLA as integral part of an employee’s toolkit („innovation in complex, turbulent environments“) and analytics as a new form of trusted evidence („role of educational institutions is changing“). Social learning adds a particular interest in the non-academic context.

What is left are still the challenges of powerful analytics: which measures do we use? Who is defining them? What do we want to measure? How do we define access rights? Are we focusing too much on by-products of online activity? How do we balance power?

Data cleaning

Ealier I was claiming that usability is not always what we need. Shum & Ferguson are stating „User-centered is not the same as learner-centered: what I want is not necessarily what I need, because my grasp of the material, and of myself as a learner, is incomplete“. When it comes to start working with a data set it is highly important that it is as complex and complete as possible. If not the data set itself, there is no other element in the cycle that has the potential to embody the complexity of the problem. It is the visualization that simplifies and „socialises“ the data, not the data cleaning nor the integration and analysis.

That automatically calls for „quality data“ [Siemens, G. (2013). Learning analytics: The emergence of a disciplineAmerican Behavioral Scientist.]. Siemens introduces this idea (besides others) with a quote from P. W. Anderson „more is different“. In my opinion, this in one of the key elements of learning analytics and/or analytics in general: we are afraid of complex data because at a certain degree we are not able anymore to process them. Instead of refusing data sets based on complexity, good analytics can help us (as a „cognitive aid“ according to Siemens) to process them and make sense of them. Because data will not stop existing – by refusing to handle them we make our lives easier but we might ignore potential sources to understand our lives better. We failed with classification systems so now time has come for big data and algorithm. 

In addition to techniques of LA (as in Baker and Yacef 2009, adapted in my last post) Siemens mentions categories of LA applications. These are used for modeling user knowledge, behavior and experience. Thus, they can create profiles of users, model knowledge domains; perform trend analysis, personalization and adaptation. Siemens illustrates his point by giving the quote „A statistician may be more interested in creating probability models to identify student performance (technique), whereas a sociologist may be more interested in evaluating how social networks form based on technologies used in a course application.“

Data representation & visualization

There was a time when I had to work with SAP, data sets and visualization of these sets every day. And I am thankful for one lesson I learned during this time: Never change your data set to make it more illustrative. Keep the data set and base you visualization on it. That’s the reason I like Pivot tables so much. They have the potential to illustrate your analysis results, other users can adapt them and the data set will stay the same.

However, on has to keep in mind that analytic tools are programmed by others and to understand the way they work it is important to be familiar to the methods in use and how they are applied within such tools. During the DALMOOC we will work with different data analyzing tools. One of them is Tableau.

Tableau

[related to Competency 2.2: Download, install, and conduct basic analytics using Tableau software]

Tableau_Software_Logo_Small

Tableau is a software that helps you in visualizing your data (based on business intelligence approaches). It has an impressive range of options for this purpose, which is (from my point of you) easy to understand and to apply. Data can be imported in different formats and besides common bar charts one is able to create „interactive“ dashboards, an arrangement of visualizations that other user can adapt via filter and that can show you more details of the data-set on demand.

However, making data fit into Tableau can be challenging. The data sets have to be formatted in a certain way so the software can work with them. That was what I faced when going a little beyond #Assignment56 from the DALMOOC assignment bank „Finding Data“. [As the title says, find data sources that can you used in Tableau (you will need to watch the Tableau videos for this week in order to become familiar with this). Look for educational data on government, university, or NGO sites. Create a blog post (or post to edX forum) on these data sources.]

Coming from Germany, I did some research on the Open data movement there. The Bundeszentrale für politische Bildung BPB (Federal Agency for Civic Education) offers an Open Data Dossier, where they describe the idea of Open Data, case studies and current running projects. In their words, the idea of Open Data is to make them available and usable for free. In this respect it is important to know the potential of data and data journalism for sustainable democratic development. Yet, this platform does not offer data sets itself but refers to a general framework for such a platform and to local pilot projects.

In this context they refer to the term „Hyperlokaljournalismus“ that can be seen as opposite towards the classic idea of „Lokaljournalism“, offering very specific and detailed news in a dynamic way. They can be adopted to the location of the user and thus concentrate on immediate surroundings.

Three examples of Open Data platforms are daten.berlin.deoffenedaten.de and the „Statistisches Landesamt Rheinland-Pfalz„. Formats and range of data differ on each platform, but the idea is related to the statements of BPB: offer data for free and available for everyone. Nevertheless, browsing bigger institutions for data sets, it was mostly the visualization and not the data set that was available. For the data set, sometimes you had to download a form, sign it and describe the purpose you want to use the data for and then send it via telefax or e-mail. Why should I do this, when I am just looking for a data set for a individual analysis? I see the point that data collection can be very demanding and a researcher wants to protect his/her work. But when will we finally accept, that Open Data can contribute to a collective researcher community that works on data together? How do you enable easy data-check-ups if I have to send you an e-mail beforehand? And regarding the different data-set formats and integrating them in a certain tool: How long will we still need to clean data so they will fit in our analyzing tool? Will there be a future without the need to edit data formats? I do hope so.

Action

Although action was not a major theme in the course, I find this a very important part of the data cycle one has to consider. Particularly the action depends on data visualization so it is crucial to know whom we are visualizing our analyzing results for and what their needs are. This can be seen in organizational levels as well („micro-, meso- and macroanalytics layers“ according to Siemens) where we can have top-down and bottom-up approaches and the application of different questions and viewpoints. This emphasizes the directions LA can take and how they need to adapt according to the interest group they are serving. The main interest group – the learner and his/her need – however, might not be forgotten in this context. [related to Competency 2.3: Evaluate the impact of policy and strategic planning on systems-level deployment of learning]

Coming back to Siemens, he describes the challenges of LA (besides the data quality) as privacy and centered on human and social processes. I had the chance to touch upon the privacy topic during the Bazaar assignment, where I was in a group with him by chance. This again shows me the value of interactive tools when they are used in a suitable way.

Privacy is also one major topic Duval writes about in Duval, E. (2011, February). Attention please!: learning analytics for visualization and recommendation. In Proceedings of the 1st International Conference on Learning Analytics and Knowledge (pp. 9-17). ACM. She describes machine readable traces for attention and raises the question what should be measured to understand how learning takes place. In regard of privacy she touches upon what elements should be tracked and if the user should be aware of this. She refers to the „Attention-Trust“ principles of property, mobility, economy and transparency. The term „filter-bubble“ is introduced, as connected to the idea that filtering per se can be a threat and can anticipate user choices and ideas. This is somehow related to Shum & Ferguson’s user-/learner-centeredness as it is always a question of who sets the filter and how do the filter work.

What is missing?

I would love to spend more time with working in Tableau and other tools in this course. But I fear I cannot cover this within the given timeframe. So I will focus on the tool matrix and complete it while using the tools for basics explorations, doing the readings and dealing with other course resources.

UTArlingtonX: LINK5.10x Data, Analytics, and Learning or #DALMOOC (Week 1)

So far, the #DALMOOC is one of the most complex online courses I have enrolled. Contentwise it covers „an introduction to the logic and methods of analysis of data to improve teaching and learning“. What is especially challenging is the course structure and the social tools involved.

In this post, I will first describe the course structure and state key messages from week 1’s content (Assignment: „share your reflections on week one in terms of a) content presented, and b) course design“). After this, I will present my bullet points for the four readings as a completion of the key messages (Assignment: „review the additional readings available for week 1 of the course and share your reflections about them“). Finally, I will attach my edited Learning Analytics Tool Matrix (Assigment: Learning Analytics: Tool Matrix) where I conducted some research on learning analytic tools.

Course structure and key messages

What is making the course so complex is the high amount of social tools and pathways to chose from. The basic idea is that there is a Guided Learner Path („blue pill“) and a Social Learning Path („red pill“) available. Either, one can chose one of those or get involved in both. To keep it simple, the blue pill is the structure learners are most familiar with: course content is provided as in a typical classroom environment where the teacher is providing the knowledge. The red pill however, is a social approach where learners interact via social media (e.g. Prosolo, Twitter) and share their artifacts.

Based on this structure a range of tools is in use to track the learning progress. Generally speaking, edx provides solely the platform for the course content. Interaction is recorded via Prosolo (a platform connected to edx, to show learning goals and competencies, share thoughts and form groups, fulfill assigments). For example, this blog post will be recorded (or tracked) in Prosolo and thus can be made available for peer assessment. In addition, there are features at which enable the user to track #dalmooc hashtags on Twitter or RSS feeds.

When talking about Learning Analytics, there usually are tools involved that apply the theoretical knowledge. In this course, we will deal with Tableau, Gephi, rapidminer and LightSide. An additional problembank is provided for advanced assignments to work with these tools.

The social learning aspect is supported by a tool called bazaar (Bazaar assignment: Discuss Week 1). Bazaar is a plattform (basically a chat system) that connects learners on demand to discuss course related topics and contents. In my case I was connected on Saturday evening to a very helpful person from India. There is a programmed digital instructor that guided us through the discussion. After an introduction we were to discuss why we take this course, how we define learning analytics, how useful we found the used cluster for learning analytic tools and how it could have been improved. We had a very constructive discussion that benefited from the fact that we had different backgrounds and levels of expertise.

My key messages for this week are

  • User always expect usability, especially in online courses. But talking about Learning Analytics means talking about a broad range of data that demands skills to graps these data and make sense of it. For me, the course itself offers an opportunity to find one’s individual way through the vast amount of learning opportunities to engage with the topic of data analytics.
  • The field of Learning Analytics offers methods to analyse learners‘ behavior in a learning environment and by this providing groundwork for the improvement of learning environments and individual learner’s feedback.
  • Analytic Tools are programmed by others and to understand the way they work it is important to be familiar to the methods in use and how they are applied within such tools.

Key messages enriched by reading contents

Usability and complexity of data

[Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. Intelligent Systems, IEEE, 24(2), 8-12.]

  • Don’t wait for (impossible) data collections but combine the already existing data more effectively
  • A small set of general rules per se is not better than a large set of applicable data (e.g. for learning a language, it can be easier to have a number of examples memorized than knowing the general rule)
  • The use of n-gram models and the „false dichotomy“ of natural language processing: deep (hand-coded) approach & statistical approach („learning n-gram statistics from large corpora“)
  • Semantic Web (machines understand semantic documents not human speech/writing) vs. Semantic Interpretation
  • The tasks that are left are not indexing but interpreting data/information/language -> using the vast amount of information on the internet to support the interpretation problem -> don’t try to make language „easier“ by forming general rules but by making use of the language in use that is available

[Tansley, S., & Tolle, K. M. (Eds.). (2009). The fourth paradigm: data-intensive scientific discovery.]

  • Opportunities and challenges of the fourth paradigm of science based on data-intensive computing
  • Accessability and „the cloud“ as a base for data-intensive science, three basic activities capture, curation and analysis
  • Permanent archiving of data as the main goal to improve scientific research
  • eScience as where IT meets science, a new paradigm of science, need for improving the tools for data capturing, analysis and visualization, science happens online (Jim Gray on eScience: a transformed scientific method)
  • Four areas of application: (1) Earth and Environment, (2) Health and Wellbeing, (3) Scientific Infrastructure, (4) Scholarly Communication and (5) Final Thoughts

The field of Learning Analytics

[Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. JEDM-Journal of Educational Data Mining, 1(1), 3-17.]

  • Different terms, e.g. Data Analytics for Learning, Learning Analytics, Educational Data Mining (EDM) & Knowledge Discovery in Databases (KDD)
  • Making sense of data in learning environments by discovering effective methods to interpret them, thus providing imediate feedback to improve students performance and course quality
  • Methods and Key Applications
    • Improvement of student models (how students act within a learning environment and how this environment can respond)
    • Discovering/ improvement of domain’s knowledge structure (data can be used by automated approaches to discover accurate domain structure models)
    • Studying pedagogical support and determine relative effectiveness
    • Supporting research on educational theories / phenomena by delivering empircal evidence
  • Important trends
    • application for online-courses, sensitive and effective e-learning, new areas of study: gaming the system, tools for datamining, student modeling, from relationship mining to prediction, discovery with models
  • Provided data become more public: e.g. through online course environments, broader application possible and check easier

[Baker, R., & Siemens, G. (2014). Educational data mining and learning analytics. Cambridge Handbook of the Learning Sciences.]

  • Learning Analytics (LA) & Educational Data Mining (EDM) to conduct research that benefits the learner and the research community, guided by theories from learning science and education, data mining and analytics &  psychometrics and educational measurement as main sources
  • EDM: (1) automated methods (prediction), (2) specific constructs and their relationship, theoretical approaches, (3) application in automated adaption
  • LA: (1) human-led methods (understanding), (2) understanding the system of the constructs, theories to understand systems as a whole / that take situationalist approaches, (3) inform and empower learner & instructor
  • Growing field because of
    • increasing data quantity (public archives and open online courses), improved data formats (standardized formats for data logging), advances in computing, increased sophistication in tools available (Map Reduce, Apache Hadoop)
  • Methods
    • prediction methods: as in Baker/Yacef still most prominent, to infer predicted variable from predictor variables, three types
      • classifiers (predicted variable binary or categorical)
      • regressors (predicted variable continuous)
      • latent knowledge estimation (as a special typ of a classifier))
    • structure discovery: as oposite to prediction, because no priori idea of a predicted variable, 4 common approaches
      • clustering: find data points that naturally group together, most useful when cluster are known in advance
      • factor analysis: closely related to clustering, find clusters and split variable set in latent factor set (not directly observable)
      • social network analysis: reveal structure of interaction by analysing the relationship between individual actors
      • domain structure discovery: finding knowledge structure in educational environment
    • relationship mining: discover unexpected but meaningful relationships between items of a large data set
      • association rule mining (find if-then rules for a data set)
      • correlation mining (find positive/negative correlations between variables)
      • sequential pattern mining (find temporal associations between events)
      • causal data mining (find cause for event or observed construct)
    • distillation of data for human jugdment: analyse data for immediate feedback of research/practitioners (e.g. through heat maps, learning curves and learnograms)
    • discovery with models: use the results of one data analysis within another data analysis (but also cluster analysis or knowledge engineering as input approaches)
  • Tools
    • General purpose (e.g. RapidMiner, R, Weka, KEEL, SNAPP) vs. special purposes tools (e.g. DataShop)
    • Open source (e.g. R, Weka) vs. comercial tools (e.g. IBM Cognos, SAS, analytics offerings by Blackboard, Ellucian)
  • Impact on Learning Sciences
    • Research on disengagement
    • Student learning in various collaborative settings
  • Impact on Practise
    • impact of social dimensions of learning and the impact of learning environment design on subsequent learning success
    • networked learning systems vs. more centralized platforms (e.g. LMS)
  • Outlook
    • Growing data sources
    • Expanding range of application: computer games, argumentation, computer-supported collaborative learning, learning in virtual worlds & teacher learning

Learning Analytics Tool Matrix

By clicking on the above headline, my adapted tool matrix can be accessed. It is my point of departure, as I am still working on it. I want to specify the tools I added (printed in Italic), visualize the different phases the tools belong to and work on a better layout. Furthermore I want to add content from the course weeks still to come and some experiences when using the tools.