Social Network Analysis – AnalyticsWeek

During the AnalyticsWeek in Boston, March 24 – 28, 2014, Klurig Analytics, as a Silver sponsor, handled the social media and social media analytics for the event.  This primarily involved keeping a live tweet stream including both text and pictures of speakers, moderators and panelists as well as handling the social media analytics. In two previous posts, we first discussed the event on a day per day basis, and secondly, we used twitter analysis and text mining on the Twitter feed of the event.

In this post, we will use Social Network Analysis to dive deep into the Twitter followers and friends of 17 of the speakers and panelist of the week-long event.  The reason for 17 is simply that out of 30 people, only 17 supplied a Twitter handle. We also added the handles for some of the organizers including myself (@dagh) for a total of 22 Twitter handles. Below is the graph, and below the graph are the details.

AnalyticsWeek - Network Graph
AnalyticsWeek (click to enlarge – 8mb)

Some quick observations:

  1. KDnuggets with 18,400 followers rocks!
  2. The main clusters are based on the 22 initial seed users. That makes sense because that is where we started.
  3. VishalTx has a large following, specially if you consider that Vishal Kumar is the CEO of Cognizeus and therefore is running three Twitter handles in @VishalTx, @Cognizeus and @AnalyticsWeek
  4. Other prominent users are @gretaroberts, @paulsonderegger and @DeborahMCooper.

The included Twitter handles and their respective names are:

  1. PaulSonderegger – Paul Sonderegger – Oracle
  2. kdnuggets – Gregory Piatetsky – KDnuggets
  3. chazard – Chip Hazard – Flybridge Capital
  4. molecularist – Charlie Schick – IBM
  5. greels1 – Michael Greely – Foundation Medical Partners
  6. Lynch_BigData – Christopher Lynch – Atlas Ventures
  7. DeborahMCooper – Deborah Cooper – DeborahmCooper.com
  8. eureqa – Michael Schmidt – Nutonian
  9. judah – Judah Phillips – Smart Current
  10. clarkjacker – Ben Clark – Wayfair
  11. analyticsraj – Raj Aggarwal – Localytics
  12. rama100 – Rama Ramakrishnan – CQuotient
  13. cesarbrea – Cesar Brea – Force Five Partners
  14. dxsimmons – Bill Simmons – DataXu
  15. suthoff – Brian Suthoff – Localytics
  16. gretaroberts – Greta Roberts – Talent Analytics
  17. imdaviddietrich – David Dietrich – EMC
  18. AnalyticsWeek – run by Cognizeus – tweets by @dagh during the Analytics Week
  19. VishalTx – Vishal Kumar – Cognizeus
  20. skbhate – Sachin Kumar Bhate – Cognizeus
  21. cognizeus – twitter handle for Cognizeus
  22. dagh – Dag Holmboe (myself) – Klurig Analytics

Through this analysis, our goal was to combine the 22 twitter handles into a social network analytics chart to see if there are individual users or clusters of users, who seem to be more prominent, perhaps have a bigger influence on the Boston and national and international analytics community.

There are different ways to measure influence using Twitter as well as using standard social network analytics.  Within Twitter, we often look at five measurements:

  1. Followers – sometimes is mostly a popularity contest but it still gives an indication of influence. A person with a million followers wields probably more influence than a person with 10 followers
  2. Mentions – the more you are mentioned, the more influence you have? Not necessarily but it is an indication.
  3. Retweets – retweets is probably the most reliable measure of influence. Because a retweet is a reflection of ourselves, we often only retweets things that we think are good.  So, someone who is often being retweeted, often tweets good tweets and can therefore be considered an influencer
  4. Recency – is the person recent? Is the person still tweeting periodically? If a person is not recent, the person lose influence quickly
  5. Topic – a person might be an influencer in one topic but perhaps not the topic that I am interested, thus for me, the person is not an influencer

We can also look at measuring influence using social network analytics centrality measures such as closeness, betweeness and eigenvectors.

  1. Closeness measures how close you are to all other nodes in a network. The person who is closest to everyone else could be called an influencer because if I want to distribute a message to the whole network, I can go this person because this person can, in theory, do it most quickly and cheapest.
  2. Betweeness measures on how many shortest paths you are on.  The person who is on most shortest paths could be considered an influencer because for any person to reach any other person in the network, they have to go through this person more than any other person.
  3. Eigenvector is similar to the Google Pagerank in that your importance is partially measured based on the importance of your network neighbors.

Even though we have used the above techniques many times in the past, in this case, we did not use them.  Instead, we were simply interested in creating an exploratory social network graph.  To do that, using R and the twitteR library, we downloaded the profile for each of our 22 users, as well as the profile of each of their followers and friends.  Using profiles, we ranked them based on topic (analytics), removed profiles who did not seem to be analytics inclined. We started out with 49,136 users and wilted them down to 3,639 users connected via 5,468 edges.

Using the igraph library, we exported the network data to Gephi where we did some additional pruning and created the AnalyticsWeek network graph.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: