CNStats - the best solution for the site statistics problems
The article is dedicated to means and ways of site statistics acquisition.
The site statistics, as used here, is the visitors' data accumulation along with an instrument for the attendance analysis.
Your web-site visitors are subdivided into two large categories: users and search bots.
|We'd like many interested people to visit our web-site.|
Users are people visiting your site by means of browsers. As a rule, users download the whole web-pages, view pictures and work with java-script. These are the most profitable customers; you should possess the detailed information about them.
|We'd like our site to be found easily by means of search engines (key words, first outcome). Therefore it's very significant for us to track robots activity on our web-site. We want to create the site SEO - Search Engine Optimization in order to optimize the site for search bots.|
Search bots (robots or crawlers) are applications, performing tasks of search engines and directories. Bots attend all sites for the purpose of updating the search index: they download your web-site pages and due to them your site can be found in Google, for instance.
The main peculiarity of the bots is as follows: they don't download pictures, as the latter are not required for search index.
There are three ways of the attendance statistics accumulation:
- Web-server log-files;
- Data accumulation in the local database (CNStats);
- Data accumulation in an off-site statistics server.
Data accumulation in an off-site statistics server
There are two key-words: "counter" and "off-site". "Off-site" means that all the information will be stored on a remote server (security problems arise); statistics precision will depend on communication channels reliability and the server software. "Counter" denotes that you should install html-code displaying the picture, located at the remote server, onto your server. Therefore robots are automatically excluded from the list of our visitors.
Thus, a remote statistics server can be useful only in the following situations:
- Participation in the rating of sites with similar subjects (attraction of visitors viewing ratings);
- Impossibility to install your own statistics accumulation and analysis system.
Note: Some servers try to substitute a picture for various inclusions (for instance, php-code may be added to your code). This is a good tendency, but you should be careful about the server being an off-site one. It means that the slightest failure of the remote server can disable your own server.
The key concept is as follows: log-files can never be waste. Generally speaking, it's the only correct way for data statistics storage for a long period of time (a year or more). However, log-file is not a site statistics; it's just source data. An application is required for these files. There are two types of such applications:
- Programs running on a web-server, where the site is located;
- Programs, which require log-file to be downloaded to Windows computer for local analysis.
General fault of these applications is the impossibility to fulfill on-line monitoring of the site functioning.
File downloading is rather complicated and disadvantageous.
There is also an apt variant of setting the logs rotation on the server according to the desired storage period and current contents, and then using free log analyzer on the server side. Free analyzer should be able to function efficiently, to set analysis time periods and fulfill the condition search; this will be quite enough.
Note: you shouldn't store all data for all the period. In practice, logs just occupy space at your disks. There are a lot of useless information; for instance, pictures downloading data. 30-60 days for a storage period will be quite enough in 99.999% cases.
Thus, log-file should be used in case it's necessary to store all the requests data for all the period of the time functioning.
Data accumulation in the local database
This is the only way which allows counting both robots and users, as well as monitoring and analysis of their functions right at the moment of action taking place. Instant access is available to any attendance data, stored in the database.
There seems to be a subtle moment - database efficiency and advanced complexity of the contents. However, it is enough to set the system once and for all. As far as database efficiency is concerned: if your web-site works with this database, then the statistics will also function as a part of the site.
Thus, data accumulation in the local database is a very convenient way for the following sites:
- Commercial sites, where on-line monitoring of the visitors is significant;
- Newly created sites;
- Small and average sites (about 10 000 unique hosts per day) using database for their core function.
We have considered here only means of statistic data accumulation; the statistics applications functions will be examined further in the next article.
Finally, we'd like to mention commercial constituent of your site. In any case, you spend money on the site. The site statistics should earn you profit. Here are the questions more useful than any explanations:
External counter, off-site statistics servers: Whom do you bring profit to, using external counters? Whom does the picture advertise? Whom do you raise citation index to? Whom do you pay and what do you get in return?
Server log-files: What do you store gigabytes of logs for? What is the profit of occupying the server place? How often do you have to search logs more than of monthly term? Is it convenient? Are log processing applications efficient enough? Do you always respond visitors' actions immediately?
Data accumulation in the local database: Does you database experience any critical overloads or stay idle? Do you need on-line visitors monitoring? Is it sufficient for you to analyze robots activity in the site?