Friday, June 5, 2015

Filled Under: ,

Using Clusters to grasp internet Traffic

How Data is influenced
There area unit plenty of things that influence the quantity of traffic a website receives, furthermore as however engaged its users area unit. during this post we have a tendency to’ll take a glance at one in every of the numerous techniques we use to assist perceive a site’s traffic: cluster.

What is clustering?

At a high level, cluster may be a machine learning technique that puts similar things into constant bucket. this may be drained a supervised or unsupervised  fashion. supervised cluster is like sorting coins supported denomination; you already apprehend precisely what your clusters area unit. In observe, you’re typically coping with dirty or broken coins, therefore it’s not forthwith obvious what the denomination is, and thence why you would like some machine learning. unsupervised  cluster may be a variety of cluster wherever things area unit lumped along mechanically supported however similar they're. usually you have got to specify what number clusters you wish your formula to spit out at the tip, and there’s invariably a break that these clusters won’t be significantly obvious (For example, your formula would possibly say, “Hey, I found a bunch of coins lined in inexperienced mud!”).

We Americae unsupervised  cluster to assist us comprehend what the subject of a website is, and the way that topic influences its traffic.

Identifying Clusters

Now let’s circle back to however we have a tendency to really determine these topic clusters. We’re not essentially curious about however you’d cluster sites supported solely browsing their content. Instead, we’re curious about sites that have similar traffic patterns, that conjointly provides United States of America data concerning what sites square measure concerning.


Let’s take a random web site that we all know nothing concerning, foobar.com, as an example. From my panel i'd notice that folks WHO visit foobar.com square measure far more seemingly to go to foo.com and bar.com than people who ne'er attend foobar.com. This tells American state 2 things: 1) foobar.com, foo.com and bar.com square measure most likely concerning one thing similar, and 2) these sites most likely receive comparable amounts and varieties of traffic. That second piece of data is absolutely vital. If I knew what proportion traffic foobar.com really receives, I may leverage that data to grant you AN correct estimate of what proportion traffic foo.com and bar.com receive. the same statement are often created concerning links between sites