WebWordz Statistics Logo
username: password:
 

WebWordz Stats Vs Log File Analysis

rule
WebWordz's audience-analysis technology and traditional log-file analysis tools may produce greatly different traffic statistics for the same site. This study shows that the differences in traffic statistics are to a great extent the result of difference in data-collection methods and inaccuracies inherent in log-file analysis can produce significant errors even for typical sites.

The difference in page-view counts may be considerable: log-file analysis tools often report 400% more page views than browser based analysis tools such as WebWordz.

Influences Source off error Effect Significance
Page views & visitors count WebWordz Undercount +
Page views Log file analysis Overcount +++
Page views Log file analysis Undercount ++
Visitors Count Log file analysis Overcount ++
Page views Log file analysis Overcount ++
Page views & visitors count Log file analysis Overcount ++
Page views & visitors count WebWordz Undercount +
Page views WebWordz Undercount +

To go straight to the conclusion on WebWordz Stats Vs Log File Analysis, please click here

THE TRACKING CODE

To implement WebWordz on a site, the site owner inserts a section of HTML/Javascript code in the HTML for each page to be monitored. When a page containing this code is displayed on a user's browser, the code collects page-view and other traffic data.
Two issues regarding the code are relevant: pages that do not contain code, and the location of the code within each page.

PAGES WITHOUT CODE
When implementing WebWordz on a site, the site owner may choose to include the code on certain pages and omit it from others. WebWordz collect statistics only for those pages that include the code.
In contrast, log-file analysis tools usually provide statistics for all pages unless configured otherwise. This can be a significant - and often overlooked - source of differences between statistics from WebWordz and log-file analysis.

Location of code
In order to improve accuracy, for slow loading pages it is desirable to insert the WebWordz code near the beginning of the HTML for the page. This ensures that the code is executed whenever the page is displayed. If the code is located later in the HTML - especially after a large graphic or other slow-loading element - a user might enter and leave the page before the code is executed. In this event, the page view would not be recorded, resulting in an undercount.
>> Back To Top

HTML FRAMES

Frames are independently controllable areas on a web page, used to provide added flexibility in display and functionality. Typically, there is a separate HTML file for the page itself and for each frame on the page.
Frames can be a problem for log-file analysis. When a user requests a page containing one or more frames, the typical server log records one request for the page itself, plus one additional request for each frame. Log-file analysis tools generally count each request as a separate page view even though only one page is displayed to the user, resulting in a significant overcount.

WebWordz does not have this problem. When implemented correctly, the code appears just once in the HTML for the page and all its frames, so it is executed only once for the entire page. As a result, WebWordz records the entire page as a single page view regardless of the number of frames it contains.

Studies show that this effect turns out to be the largest source of overcount by log-file analysis tools.
>> Back To Top

CACHED PAGES

Many ISPs maintain proxy servers that store millions of pages copied from the Web. When a user requests a page stored on a proxy, the ISP delivers the page quickly from the proxy rather than using the Web server to actually retrieve the page from the Web, which can take much longer. Surveys indicate that proxies serve 15 to 20 percent of the page views for a typical site. Log-file analysis cannot detect all page views served by proxies, resulting in significant undercounting of page views. WebWordz detects all displayed pages regardless of the source (incl. pages served by the browser's cache), giving accurate page-view counts.
>> Back To Top

IP ADDRESS POOLS

Many ISPs have a pool of IP addresses that are dynamically assigned to individual users. In this situation, a single user may use multiple IP addresses over time - even during a single visit to a site. Since log-file analysis identifies individual users by their IP addresses, it cannot track a user whose IP address changes.

As a result, counts of unique users and measurements of how long users spend on a site and on individual pages may be grossly inaccurate. In contrast, WebWordz utilizes an internal session cache and does not only depend on the IP address to identify individual users, so it provides correct values for these statistics.
>> Back To Top

FALSE PAGE VIEWS

Web surfers following familiar paths often jump between pages very quickly without viewing their content. Log-file analysis cannot detect this situation, and consequently counts each jump as a page view even though the user does not actually view the page, potentially resulting in significant overcounting of page views. WebWordz provides a unique alternative solution to this problem: The site owner inserts the tracking code at the end at the HTML for a page, or following key content. If a user leaves the page too quickly, the code will not be loaded, in which case WebWordz will not record a page view. This technique makes it possible to obtain more realistic page-view counts in this problematic situation.
>> Back To Top

ARTIFICIAL TRAFFIC

Another common source of excess page-view counts by log-file analysis is artificial traffic - automated programs that request Web pages from servers but do not display those pages to users. Log-file analysis tools generally count such requests as page views even though they are not true views by users, resulting in an overcount. We examined the effects of two types of artificial traffic: monitoring tools and robots.

Monitoring tools
Many sites use proprietary or commercial tools to monitor various aspects of site performance. Such tools may request pages from the Web server. Log-file analysis tools incorrectly count these requests as page views.

WebWordz does not count such requests as page views. This is because the code is technically an image. Since monitoring tools typically do not execute image code, WebWordz correctly disregard the page view for the pages they request.

Robots
Robots (also called "spiders" or "crawlers") are programs that surf the Web automatically, following hypertext links and scanning site content. Since robots are not actual users, their activities need to be excluded from traffic statistics.

This is difficult with log-file analysis: in order to identify the activity of a robot, a log-file analysis tool needs to know about the robot, much as anti-virus software needs to know about a virus in order to detect it. Since there are thousands of robots - and new ones appear every day - log-file analysis tools cannot identify every one. In fact, recent studies have identified hundreds of robots not detected by popular log-file analysis tools.

WebWordz does not have this problem. Like the monitoring tools described above, robots do not execute the WebWordz code. As a result, WebWordz automatically excludes robots' activities from its traffic statistics without the need to identify specific robots.
>> Back To Top

LIMITED-DISPLAY DEVICES

PDAs and other limited-display devices such as text browsers are often configured not to display images. This does not affect log-file page-view counts - the log file records each page request by such devices just as it would for any device, and the log-file analysis tool counts each request as a page view. In contrast, since the WebWordz code is technically an image, it is not executed when the page is displayed without images. As a result, WebWordz does not count page views for such devices.

This difference cannot be unequivocally identified as an overcount by log-file analysis or an undercount by WebWordz; this depends on the requirements of the site owner. If site owner's intent is to display advertising (usually in image form) to users - which is typical for commercial sites - then WebWordz is correct to omit these non-image page views from the count.
>> Back To Top

INTERNET CONNECTIVITY

Although many users enjoy high-speed access to the Web, others contend with slow access due to low-speed connections, congested networks and other impediments.
In extreme cases, conditions like these may prevent WebWordz from recording traffic statistics for these unfortunate users. Although we did not observe clear evidence of this effect in our study, it may have contributed to the difference between the two techniques.
>> Back To Top

CONCLUSION

Our comparison of WebWordz' browser based tracking and a popular log-file analysis tool underscores two important points:

This study is by no means an exhaustive comparison of the two techniques. There are many other differences, including accuracy of other statistics, level of detail, speed, accessibility, reliability, ease of operation - and ultimately, value to the site owner. Future studies will compare WebWordz and log-file analysis in some of these other important areas.
>> Back To Top

Copyright © 2000-2006 webwordz.co.uk, All rights reserved. Legal Agreement | Privacy Policy