Monday, 29 June 2009

Getting data for my project

As from my last post, in order to start a social network analysis project, the first step is to get hold of some data so that I can generate diagrams and so on.

Good news is that all the person description data for the ECS members are all in readily usable RDF form, I can easily use a sparql frontend (at http://southampton.rkbexplorer.com/sparql/)to query and get out the description about a person. These descriptions include person's name, group he belongs to and some provides interests.

Bad news is all other schools do not have these information readily available.

Next up, I need to think about:
* What programming language to use for processing and visulise the data
* Think about in what form I want the data in for :
-generating graph using graphviz
-to do some processing on the data, (how many hops an author is linked to anthoer)

Question to ask Dr Carr:
*How can I get the author cloud and author graph working for University ECS

Thursday, 25 June 2009

University, school, people, data

After a bit of reading around, in order to carry out any research and analysis, I need a set of data that I can work on. In this particular case, it would be the university staff's names, roles and interests.

So today, I carried out a survey on the university structure, school and staff, found out where are those informations located, and start thinking the ways to extract those information automaticly.

Southampton university splits into three faculties -- Enginnering Science and Mathematics; Law, Arts and Social Science.; Medicine Health and Life Science, each have a few schools in there. Each school have their own web site, in which they tend to publish all the staff and what they do. Since these web page are developped independently, the layout and the information contained varies greatly. At this point, I have to say, the semantic web technology -- the linked and annotated data is so useful, I immediately know how to extract those from our ECS site, which is fully annotated.

I think, from what I explored today, except the ECS site, the fastest way to extract those people data from each web site is by type them out, how sad is this. But I'll keep an open mind in how to obtain these data.

Tuesday, 23 June 2009

Project direction amendment

From the moment I get this project's subject, I wasn't fully comprehend the purpose and the object of this project.

After the meeting with my supervisor Dr Carr, I think I am beginning to see where this project could go and what are the areas I should spend my time on.

First of all, forget about most of my eprints efforts and my programming plan, this is not a programming project, there is no big software package to deliver at the end. The focus should be on the understanding of the subject.

There are two possible threads I can follow and develop on for this particular project:
  1. Apply the general social network analysis (e.g. small world experiment, and its follow ups), into a university, academic network, does the "hub", "small world" still holds? How are the links form across the schools?
  2. The weak tie argument: It is the weak ties that compressed the Internet into a such small network system, it is the weak ties spread out information. What is the role of the weak ties (eg. cross discipline relationships between lectures) in supporting the research in a university? How important is it to encourage interdisciplinary research, which allows these cross discipline relationships to form?
I will first tackle the 1st thread, so my tasks before next meeting :
  • Find out and read about the researches, experiments carried out in the area of analysing social networks, what's their data source? How do they conducted the experiment? What did they found out? Shortlist several activities that I can potentially do in this project and the problems and challenges that I need help from my supervisor.
  • update my project plan to reflect this change.

Monday, 22 June 2009

Installing the eprints software

Last Friday, I tried to install the eprints software on one of my computers.

My first attempt was to use the live cd provided by the official website, it is suggested by the developper as the fastest way to get started. I created a virtual machine (used virtualbox) and booted from the live cd. Accoding to the guide, there supposed to have a shortcut on the desktop to initiate the installation, but there is not, nor anywhere else. A web page was opened after boot, greeting that I am successfully installed eprints. But I am sure nothing was installed on to the harddrive yet. After a while of explorering the live system, and with limited documentation, I decide to install eprints software inside a running linux system instead.

I have a virtual machine with kubuntu 8.04 running, downloaded the eprints3.31 deb package file from the eprints site and executed it. It started to download dependency packages from the web, totally 68. Unfortunately, it claimed not be able to download one of the package. I re-tried many times with no luck. I decided to install using the apt-get.

Following the guide, I added the eprints source and started to download the packages. This time, it successfully finished installing with no error. But just as I was following the next bit of guide to get started with the eprints, the guide became ambiguous and inaccurate again - the eprint user was not found.

I need to spend more time on this another time.

Thursday, 18 June 2009

Some investigation into the subject

Yesterday, I had a chance to look at the current development in the area of social networking. I conclude the following:

One way of doing a project in social networking can be build a software to extract the network relations between people in a repository/database, hence use those extracted information to facilitate expert finding, visualisation of the repository and the academic network.

In doing so, there are many challenges: find and extracting a researcher's profile from a web page, researcher name disambiguation, topic modelling

There are some one off development for the social network extraction from a particular database, ArnetMiner(www.arnetminer.com) is an example.

In the scope of this project, I can develop an extension to the eprints repository providing similar functionality to arnetminer.

Wednesday, 17 June 2009

Master Project topic decided

The project was officially started from this Monday.

Tuesday afternoon, we gather together with our supervisor and discussed the potential projects that our web science-interested students can do.

10 of us got given 9 topics to choose from, here's the ones I am interested in:

Twitter investigation
University social networking
Life Guide data visualisation
Google Wave understanding

Since everyone wants to do the interested ones, there were many overlaps in the choices. I have to give up my top choice to go for social networking one. Hot projects were split into small ones to accommodate the demand.

My choice -- university social network, had split into three projects. One doing an interface for different age group, one utilising the external social network sources to form a university social network, I am using the internal resource, including the university eprint, to form the social network.

There is a project description and project plan due this Friday.