Skip to content

Visualising Software Development

February 10, 2010

I’ve just been passed a couple of links to the code-swarm system. It’s a method a visualising the development process of a software project – and it’s pretty amazing looking! What this system does is use the commit process (aka check-in) of software development to track the additions to a software project. This is where a developer takes a copy from the central control one of the source files and adds to it, then places it back into the repository. This is part of a system of revision control that most software projects have today – it allows the developers to revise what they see and roll-back to an earlier version if the newer code has broken the current build of the project. In this instance it has also proved to be a great way of tracking the work flow on one project:

This visualization, called code_swarm, shows the history of commits in a software project. A commit happens when a developer makes changes to the code or documents and transfers them into the central project repository. Both developers and files are represented as moving elements. When a developer commits a file, it lights up and flies towards that developer. Files are colored according to their purpose, such as whether they are source code or a document. If files or developers have not been active for a while, they will fade away. A histogram at the bottom keeps a reminder of what has come before.

Here’s an example of the system being used to visualise the development of a Eclipse:

code_swarm of Eclipse

At first glance it seems to look like a galaxy swirling in the ether – but step past this and it seems to me to be more like the activity of an ant swarm or cells interacting within a body. It’s very biological. It also shows a key facet of what we observe from evolution; the trend towards increasing complexity.

Also worth seeing it chart the development of Twitter and be applied to poetry with T S Eliot’s The Love Song of J. Alfred Prufrock.

(Hat-tip to Ben for the link!)

When is a Network not a Network?

February 2, 2010

When it is a real network…I have been using Actor Network Theory (aka ANT, not to be confused with ANTS, the very interesting p2p project) in my research (I’d recommend this and this if you are interested) and yet the ‘networks’ of ANT are not necessarily networks at all, but are networks in the other sense of the word, as in a real network, but are in intangible associations of human (and non-human) activity, . Here’s a brief discussion of the difference:

One of the weaknesses of a Latourian sense of the world as Networking is that though such sociological analytic takes its traction from the fashion in which actual networks have been coming to dominate our communications and industry, it is not really the Internet or other literal networks Latour and ANTS are talking about. Indeed, everything is to be explained by the transformations of networks, and it may be that literal networks are some of the things that are least explainable in such terms.

Looking at Peer-to-Peer Optimization Methods (an update)

January 29, 2010

One of the authors of the p2p paper I looked at in my last posting emailed me with an update of thier work worth sharing with you..

… Please note that in the meantime we actually looked at a more sophisticated scenario in the context of hyperheuristics:

http://portal.acm.org/citation.cfm?doid=1569901.1570081

where, quite surprisingly (or maybe not…), what we found is again if you have lots of nodes, then the best is to use them pretty much independently and start several kinds of optimization algorithms on subsets of these  nodes. Although it is of course non-trivial as always what is “best”, etc.

Looking at Peer-to-Peer Optimization Methods

January 27, 2010

ResearchBlogging.orgP2P algorithms can offer robustness and communication efficiency over more centralised GRID methods. So authors compared to p2p algorithms performance searching in large-scale and unreliable networks. They compared two methods; a distributed particle swarm optimization algorithm (PSO, a class of direct search methods used to find an optimal solution to a type of function that determines how good a solution is) and a novel P2P branch-and-bound (B&B) algorithm.

The B&B works by basically following this flow;

  • Find a promising interval (an interval is a set of real numbers with the property that any number that lies between two numbers in the set is also included in the set) then cut this set of number into two.
  • Then it takes 8 or more random samples from it. Then it calculates a fixed set of items.
  • The queue of items is ordered based on the lower set. It then culls sets who cut across a pre-set minimum.
  • Then it uses gossip-based load balancing.

It seems to me that this process acts as a kind on slice-and-dice to zoom in on promising sets of data then, and only when it’s got the lowest set possible, can the sharing of the data over the peers happen. Also this use of gossip to load-balance rather than the more traditional (and centralised) method of shared memory.

So what did they find? First of they found an interesting application of the B&B algorithm; Because of the culling behaviour described above it does not need to expand into the whole network to solve a given problem; “The interesting effect we can discover is that the B&B approach ‘refuses’ to utilize the entire network, because it cannot generate enough promising intervals (pruning is ‘too’ efficient) and therefore it can deliver optimal solutions irrespective of network size, but at the cost of longer running times … Depending on the context, this effect can be very advantageous but harmful as well.”

Some counter-intuitive findings are presented also: for example, failures in the network can in fact significantly improve the performance of p2p PSO under some conditions; “For P2P PSO, increasing the network size is equivalent to increasing the population size. Interestingly, a non-zero churn rate introduces a restarting operator for PSO, that can in fact increase performance on at least some types of problems.”

Balázs Bánhelyi, Marco Biazzini, Alberto Montresor, and Márk Jelasity (2009). Peer-to-Peer Optimization in Large Unreliable Networks with Branch-and-Bound and Particle Swarms Lecture Notes In Computer Science, 5484, 87-92

The History of the World in 100 Objects

January 25, 2010

The BBC’s new series, The History of the World in 100 Objects, is very cool.  It’s especially good for somebody like me who has been studying the evolution of technology.  Well worth a listen – I will be very interested in the later programs to see if they will take on the issue of virtual object – such as software – still things that we have made – but now digital rather than physical…

Human P2P Networks

January 16, 2010

Defining what a network is, is a huge topic.  It is one I engage with to some extent in my research and you can boil a network down to two components – links and nodes.  The beginnings and ends of the network is a more complex matter.  For example with the Internet, it is less one big network and more a series of networks united by common protocols.  (There is a good discussion of mapping networks using Actor-Network Theory in chapter 4 of Murdoch’s book Post-Structuralist Geography.  But networks are also more than just wires or WiFi linkages; Bannister (2000) coined a term I like to use for the people in a network;

“Networked media are like a coral reef in that there is no centre or core.  If one cell dies, it doesn’t affect the whole.  Individual humanodes within the telacorpus act of their own volition, or at least they think they do.  The collective of humanodes forms the ‘reef’ of the telacorpus, but unlike natural corals, each humanode can connect to any other.” (Bannister 2000:115)

This term – humanode – came to mind recently reading an article on the next battle ground over piracy, off-line file sharing;

The Strategic Advisory Board for Intellectual Property (Sabip), a body set up to advise the government, has been looking into “offline” copyright infringement after its research last year into online piracy threw up questions about how consumers get films, music and games for free. “There’s a whole big question here around what is happening offline digitally, the swapping of discs and data in that world. There’s a lot of it going on,” said Sabip board member Dame Lynne Brindley. Brindley, chief executive of the British Library, said existing research did not give a clear picture of consumer behaviour. While there was some data on the proportion of people buying counterfeit CDs, DVDs and video games – estimated at between 7% and 16% of the population – Sabip was concerned that more needed to be known about other copyright breaches, such as hard-drive swapping and files being shared by wireless Bluetooth connections.

Again, technology is enabling this process to occur easier and faster – you can get 300+Gb portable hardrives that don’t need an external power supply and are about the size of an iPhone.  Easy to carry, easy to connect and given that most films on p2p networks seems to be under 1Gb – there is plenty of space.  What is means is that the person becomes both a node and a connection – more than a humanode.  While this is less distributed that online p2p as the geographic boundaries are once more an issue – it shows that there are multiple paths that a p2p network uses and again going into the issues of the technical difficulties of stopping copying (see here and here).

Bias in Measuring p2p Networks

January 11, 2010

ResearchBlogging.org

In a past post I looked at a recent report about the supposed decline in p2p traffic and also talked about the difficulties in measuring p2p networks.  There is an interesting paper by Stutzbach et al entitled ‘On Unbiased Sampling for Unstructured Peer-to-Peer Networks‘ – now the paper has some fairly technical bits in it, but you can still get a good guide as to how difficult this whole area is.  For starters the authors dip into sociology to look at the issues with getting to grips with ‘hidden’ populations.  Hidden populations are ones that we don’t really know the boundaries or size of the population and those within it prefer to remain anonymous.  A classic example is the population of drug users.  The authors pick one method to draw from – respondent-driven sampling – which is interesting because the method for this is p2p-like within its own structure; you start with a small seed of respondents then you get the respondents to identify more respondents and so on.

Gathering unbiased sample data of a p2p network is hard.  The current methods have bias within them.  The authors look to sampling tools that are based around a static system and adapt it to work with a dynamic network.  As p2p networks are very dynamic (they tend to have a few peers have long sessions while the majority have very short sessions), there is a problem sampling.  The authors illustrate this with an example;

“Suppose we wish to observe the number of files shared by peers. In this example system, half the peers are up all the time and have many files, while the other peers remain for around 1 minute and are immediately replaced by new short-lived peers who have few files. The technique used by most studies would observe the system for a long time and incorrectly conclude that most of the peers in the system have very few files.”

What they are saying is that if you discount the long-standing peers from being re-sampled, as they have already been covered once, then in each snap-shot of the system you sample, it will contain more and more short-lived peers, and so mess the results up.  (Though it seems to me all this assumes a large population relative to the sample size, with most p2p networks are.) Thus the problem also presents the solution; the authors used a method that allows the same peer to be sampled at different points in time – they decoupled the sample from the session lengths:

“[O]ur approach will correctly select long-lived peers half the time and short-lived peers half the time. When the samples are examined, they will show that half of the peers in the system at any given moment have many files while half of the peers have few files, which is exactly correct.”

Which addressed one of the issues with sampling – but how did they adapt the static methods to a dynamic environment?  A clever adaptation whereby they introduce backtracking into an the existing methodology of Metropolized Random Walk – and not surprisingly call the result, Metropolized Random Walk with Backtracking.  Here’s how it works:

“We make an adaptation by maintaining a stack of visited peers. When the walk chooses a new peer to query, we push the peer’s address on the stack. If the query times out, we pop the address off the stack, and choose a new neighbour of the peer that is now on top of the stack. If all of a peer’s neighbours time out, we re-query that peer to get a fresh list of its neighbours. If the re-query also times out, we pop that peer from the stack as well, and so on. If the stack underflows, we consider the walk a failure. We do not count timed-out peers as a hop for the purposes of measuring the length of the walk.”

The authors then go on to present a pretty robust analysis of their method in action.  In all the paper is both an interesting account of the difficulties of getting good data on p2p networks coupled with some inventive solutions to the problems therein.

Disclosure note: the authors cite research as being supported by National Science Foundation and Cisco Systems.

Stutzbach, D., Rejaie, R., Duffield, N., Sen, S., & Willinger, W. (2009). On Unbiased Sampling for Unstructured Peer-to-Peer Networks IEEE/ACM Transactions on Networking, 17 (2), 377-390 DOI: 10.1109/TNET.2008.2001730

Why are game sequels often good and and film sequels often bad? Iteration!

January 5, 2010
tags: ,

It is an oft quoted true-ism that the original of a film is the best and sequels often fail to capture the magic of the original. Examples like Blues Brothers 🙂 then Blues Brothers 2000 😦 or Matrix 🙂 then Reloaded 😦 spring to mind. It is easy to think of films who’s sequel/s was worse than the original and quite a challenge to think of films who sequel is equal or indeed better than the original.

Following this thought-experiment on – it is the opposite with games. It is easy to think of games who’s sequel is equal or indeed better than the original – Fallout to Fallout 2, GTA to Vice City and beyond, Call of Duty to Modern Warfare 2 and so on. I think this has to do with the fact that a game sequel is much more of a total iteration than a film. The next film, with a few rare exceptions (talkies, colour, 3D?) is going to be the same technical format as the last one – sure the special effects will have improved, but the main thing the makers have to iterate with is the narrative.

With a game, it is a bit different – we can iterate the whole experience, improving the controls, better graphics, tighten the gameplay, more of what worked and less of what didn’t. This is a building block in the evolution of the idea/software as a whole.  In short the second time you make a game there are lots of opportunities to really improve whereas a film is not so lucky.

Discuss.

Torrents into 2010

January 4, 2010

The blog TorrentFreak has got a few interesting stories up both looking back at the last decade and also forward into 2010.  A couple of things caught my eye…

One was the article talking about ways that users in France may use to avoid the new ‘3 strikes’ law;

French senator Michel Thiolliere has told the BBC that the so-called Hadopi legislation will have the desired effect, with nearly everyone warned a second time abandoning illegal file-sharing for good.

“What we think is that after the first message… about two-thirds of the people (will) stop their illegal usages of the internet,” he explained

“After the second message more than 95% will finish with that bad usage.”

It is, however, much more likely that after getting a first warning, or even before, French Internet users will try to find a way round this system. They will discover that it’s surprisingly easy.

The other article is about predictions for p2p in the coming year.  There are a number of predictions in the article (so it is worth reading the whole thing) but I’m going to comment on only a couple…

Prediction 1: The Pirate Bay will cease to offer torrent links

After closing its tracker in 2009, The Pirate Bay will further evolve by removing all torrents from its index in the new year. The site will be reduced to a BitTorrent platform that no longer stores torrent files. Users will still be able to submit torrents through a third party service such as Torrage, but instead of linking to these torrent files, The Pirate Bay will list only Magnet links.

During the second half of 2010, The Pirate Bay four will appear before the Appeal Court. They will be found ‘not guilty’ and walk away free. Shortly after this victory in court, Pirate Bay’s YouTube killer The Video Bay will be released to the public.

This case will be, assuming the music industry wins, a pyrrhic victory because the technology behind p2p is rapidly moving away from having a web-site that hosts the torrents to more de-centralised systems.

Next is…

Prediction 4: BitTorrent (live) streaming will take off

Advances in technology and growing broadband penetration have brought us to a point where BitTorrent-powered streaming solutions have become reality. BitTorrent inventor Bram Cohen is working on a streaming implementation and experiments have shown that it is possible to stream high definition content.

In the second half of 2010, the first BitTorrent-powered YouTube competitors will be launched. These new BitTorrent sites will mainly offer streams of pirated movies and TV-shows. Live BitTorrent streaming will gain worldwide traction during the 2010 soccer world cup in South Africa. In the second half of the year, commercial implementations will follow, allowing broadcasters to stream live content at zero cost.

I agree with this – to some extents (e.g. with iPlayer) it has already been happening, but we will see an acceleration in content bandwidth management using p2p – as the content (e.g. HD) size increases, so does the demands on the pipes that send it around the Internet and new software means of sending are the easiest point of rapid improvement, thus p2p is a natural way of doing this.

Talk at Virt3c@Hull 2010

January 3, 2010

I’m happy to say that I am going to be talking at the 2010 Virt3c@Hull, at Hull University.  Keynote speakers include; Gabriella Coleman on ‘Cabals, Crisis, and Conflict on the Virtual Frontier’ (Friday) and Mathieu O’Neil on ‘Theory and Practice of Online Research: Power, Expertise, Critique’ (Sat).  My talk is part of the session entitled ‘Conflicts in Open & Free Software Communities’ on Sat 20th March, 12.00- 1.45:

Conflicts in Open & Free Software Communities

  • Merten, Stefan: ‘Conflicts and the governance model of Free Software’, www.oekonux.de
  • Rawlings, Thomas: ‘Evolutionary p2p Systems’, www.fluffylogic.net.
    Fernando, Suresh: ‘OpenKollab: Inherent Conflicts Arising within Generative Collaboration Spaces’, http://openkollab.com
  • Dafermos, George: ‘An empirical study of division of labour in free & open source software development: the case of FreeBSD’, Delft University of Technology, Holland.

The poster for the event can be found here and there is much more info on the official website; virt3c.wordpress.com.

PS – Here’s a great summary of the event.