Webalizerを使うための設定例(apache,logrotate,cron,webalizer)

2011年11月18日 2017年8月10日

ご利用のブラウザは、JavaScript が無効となっていませんか？
このサイトでは、コンテンツの一部が非表示、あるいは、コメント、お問い合わせの投稿ができない、検索ができないことがあります。

Webalizerの日本語化(UTF8化)する , Webalizerの最新版を日本語化(UTF8化)にするでは、Webalizerの日本語化と簡単な設定ファイル( webalizer.conf )の設定例について解説しました。

ここでは、Webalizerを利用するためのApacheの設定、ログローテーションの設定、Webalizerの設定、自動化のためのcronの設定までを、一つの設定例として簡単に解説してみます。

以降の設定例では、
www.exmaple.com というサイトについて Webalizer を毎日使うための設定を行ってみました。

目次
履歴

2011年11月18日初版

Apacheの設定ファイル( /etc/httpd/conf/httpd.conf )

Webalizerを利用するために設定ファイル( httpd.conf )へログファイルの出力を設定する設定例を簡単に解説してみます。

...
# 以下にロギングしない条件を設定します。
SetEnvIf Request_URI "default\.ida" nolog
SetEnvIf Request_URI "cmd\.exe" nolog
SetEnvIf Request_URI "root\.exe" nolog
SetEnvIf Request_URI "Admin\.dll" nolog
SetEnvIf Request_URI "NULL\.IDA" nolog
SetEnvIf Request_URI "\.(gif)|(jpg)|(png)|(ico)|(css)$" nolog
# WordPress用の設定（管理画面へのアクセスは全てログしない）
SetEnvIf Request_URI "^/wp-admin(.*)" nolog
SetEnvIf Request_URI "^/wp-content(.*)" nolog
SetEnvIf Request_URI "^/wp-includes(.*)" nolog
SetEnvIf Request_URI "^/wp-cron.php(.*)" nolog
SetEnvIf Request_URI "^/wp-login.php(.*)" nolog
SetEnvIf Request_URI "^/wp-comments(.*)" nolog
...
<VirtualHost *:80>
    ServerName wwww.example.com
    DocumentRoot "/var/www/html"
    ServerAlias wwww.example.com
    ErrorLog /var/log/httpd/example-error.log
    # ログの出力先、ファイル名、出力フォーマット (combined) を指定します。
    CustomLog /var/log/httpd/example-access.log combined env=!nolog
</VirtualHost>
...

Apacheの設定ファイル( /etc/httpd/conf.d/webalizer.conf )

Webalizerに出力ディレクトリを公開するための設定ファイル( webalizer.conf )を作成・設定する設定例を簡単に解説してみます。

#
# This configuration file maps the webalizer log analysis
# results (generated daily) into the URL space.  By default
# these results are only accessible from the local host.
#
Alias /usage /var/www/usage

<Location /usage>
    Order deny,allow
    Deny from all
    
    # サーバー自身からのアクセスを許可します
    Allow from 127.0.0.1
    Allow from ::1
    # ここで許可するクライアントのIPアドレスを設定します
    Allow from xxx.xxx.xxx.xxx
    # Allow from .example.com
</Location>

誰でもアクセスできようにしたなら、
Allow from all
を指定すればOKです。
ただ、一般的に、誰でもアクセスできてしまうのは、問題も多いので、ここでは、IPアドレスによる制限を加えています。

この設定の後、必ず、出力先ディレクトリを作成しておきます。所有権もここではapacheに変更しておきます。

$ mkdir /var/www/usage
$ mkdir /var/www/usage/example
$ chown -fR apache. /var/www/usage

/var/www/usage/example は、
以降のwebalizerの設定で出てきますが、サイト毎にディレクトリを分けておきたいので、 wwww.exmaple.com 用のwebalizerの出力先ディレクトリとして/var/www/usage/exampleを作成しておきます。

これで全ての設定を終えたので、apacheの再起動しておきます。

$ /etc/init.d/httpd restart

ログローテーションの設定ファイル( /etc/logrotate.d/httpd )

Webalizerを利用するログファイルは、今ロギングしているファイルでなく、ログローテーションした最新ファイルをつかうのが一般的です。

毎日、ログローテーションし、
ロゴローテーションの実施後、
webalizerで前日のログファイルからアクセスを解析する

・・・という流れになります。

以下は、その設定例です。

/var/log/httpd/*.log {
    # 毎日ログローテーションを実施します。
    daily
    missingok
    rotate 90
    # ログローテーションしたファイルはgzipで圧縮します。
    compress
    # 空でもログローテーションを実施します。
    ifempty
    create 660 root wheel
    sharedscripts
    postrotate
        /sbin/service httpd graceful
    endscript
}

Webalizerの設定ファイル( /etc/webalizer.conf )

Webalizerの設定ファイル( webalizer.conf )の設定例を簡単に解説してみます。

#
# Sample Webalizer configuration file
# Copyright 1997-2009 by Bradford L. Barrett
#
# Distributed under the GNU General Public License.  See the
# files "Copyright" and "COPYING" provided with the webalizer
# distribution for additional information.
#
# This is a sample configuration file for the Webalizer (ver 2.21)
# Lines starting with pound signs '#' are comment lines and are
# ignored.  Blank lines are skipped as well.  Other lines are considered
# as configuration lines, and have the form "ConfigOption  Value" where
# ConfigOption is a valid configuration keyword, and Value is the value
# to assign that configuration option.  Invalid keyword/values are
# ignored, with appropriate warnings being displayed.  There must be
# at least one space or tab between the keyword and its value.
#
# As of version 0.98, The Webalizer will look for a 'default' configuration
# file named "webalizer.conf" in the current directory, and if not found
# there, will look for "/etc/webalizer.conf".


# LogFile defines the web server log file to use.  If not specified
# here or on on the command line, input will default to STDIN.  If
# the log filename ends in '.gz' (a gzip compressed file), or '.bz2'
# (bzip2 compressed file), it will be decompressed on the fly as it
# is being read.

# httpdで指定したログファイル名の最新のログローテーションファイル名を指定します。
# -- ログローテーションしたファイルはgzipで圧縮されるので拡張子に注意します。
#LogFile        /var/lib/httpd/logs/access_log
LogFile        /var/log/httpd/example-access.log.1.gz

# LogType defines the log type being processed.  Normally, the Webalizer
# expects a CLF or Combined web server log as input.  Using this option,
# you can process ftp logs (xferlog as produced by wu-ftp and others),
# Squid native logs or W3C extended format web logs. Values can be 'clf',
# 'ftp', 'squid' or 'w3c'.  The default is 'clf'.

# httpdでcombinedを指定したので、ここは、そのままでOK
#LogType	clf

# OutputDir is where you want to put the output files.  This should
# should be a full path name, however relative ones might work as well.
# If no output directory is specified, the current directory will be used.

# 先に作成した出力先ディレクトリを指定します。
#OutputDir      /var/lib/httpd/htdocs/usage
OutputDir      /var/www/usage/example

# HistoryName allows you to specify the name of the history file produced
# by the Webalizer.  The history file keeps the data for previous months,
# and is used for generating the main HTML page (index.html). The default
# is a file named "webalizer.hist", stored in the output directory being
# used.  The name can include a path, which will be relative to the output
# directory unless absolute (starts with a leading '/').

# 履歴ファイル名を指定します。
# -- 出力先ディレクトリをサイト毎に指定しているのでデフォルトでも可。
#HistoryName	webalizer.hist
HistoryName	example_webalizer.hist

# Incremental processing allows multiple partial log files to be used
# instead of one huge one.  Useful for large sites that have to rotate
# their log files more than once a month.  The Webalizer will save its
# internal state before exiting, and restore it the next time run, in
# order to continue processing where it left off.  This mode also causes
# The Webalizer to scan for and ignore duplicate records (records already
# processed by a previous run).  See the README file for additional
# information.  The value may be 'yes' or 'no', with a default of 'no'.
# The file 'webalizer.current' is used to store the current state data,
# and is located in the output directory of the program (unless changed
# with the IncrementalName option below).  Please read at least the section
# on Incremental processing in the README file before you enable this option.

# 積算ファイルを元に、前日のデータに積算します。
#Incremental	no
Incremental	yes

# IncrementalName allows you to specify the filename for saving the
# incremental data in.  It is similar to the HistoryName option where the
# name is relative to the specified output directory, unless an absolute
# filename is specified.  The default is a file named "webalizer.current"
# kept in the normal output directory.  If you don't specify "Incremental"
# as 'yes' then this option has no meaning.

# 積算ファイルを指定します。
# -- 出力先ディレクトリをサイト毎に指定しているのでデフォルトでも可。
#IncrementalName	webalizer.current
IncrementalName	example_webalizer.current

# ReportTitle is the text to display as the title.  The hostname
# (unless blank) is appended to the end of this string (seperated with
# a space) to generate the final full title string.
# Default is (for english) "Usage Statistics for".

#ReportTitle    Usage Statistics for

# HostName defines the hostname for the report.  This is used in
# the title, and is prepended to the URL table items.  This allows
# clicking on URLs in the report to go to the proper location in
# the event you are running the report on a 'virtual' web server,
# or for a server different than the one the report resides on.
# If not specified here, or on the command line, webalizer will
# try to get the hostname via a uname system call.  If that fails,
# it will default to "localhost".

# ホスト名（サイト名）を指定します。
#HostName	www.webalizer.org
HostName	www.example.com

# HTMLExtension allows you to specify the filename extension to use
# for generated HTML pages.  Normally, this defaults to "html", but
# can be changed for sites who need it (like for PHP embeded pages).

#HTMLExtension  html

# PageType lets you tell the Webalizer what types of URLs you
# consider a 'page'.  Most people consider html and cgi documents
# as pages, while not images and audio files.  If no types are
# specified, defaults will be used ('htm*', 'cgi' and HTMLExtension
# if different for web logs, 'txt' for ftp logs).

PageType	htm*
PageType	cgi
#PageType	phtml
#PageType	php3
#PageType	pl
# ページとして認識する拡張子 ( .shtml , .php )を追加します。
# -- セキュアなページとしてshtmlの拡張子を使用している場合のみ設定します。
PageType	shtml
# -- 動的ページとしてphpの拡張子を使用している場合のみ設定します。
PageType	php

# PagePrefix allows all requests with a specified prefix to be
# considered as 'pages'. If you want everything under /documents
# to be treated as pages no matter what their extension is. Also
# useful if you have cgi-scripts with PATH_INFO.

#PagePrefix	/documents
#PagePrefix	/mycgi/parameters

# OmitPage lets you tell the Webalizer that certain URLs do not
# contain any pages.  No URL matching an OmitPage value will be
# counted as a page, even if it matches a PageType above or has
# no extension (e.g., a directory).  They will still be counted
# as a hit.

#OmitPage	/render

# UseHTTPS should be used if the analysis is being run on a
# secure server, and links to urls should use 'https://' instead
# of the default 'http://'.  If you need this, set it to 'yes'.
# Default is 'no'.  This only changes the behaviour of the 'Top
# URLs' table.

#UseHTTPS       no

# HTAccess allows the generation of a default .htaccess file in the
# output directory.  If enabled, a default .htaccess file will be
# created (with a single "DirectoryIndex" directive), unless one
# already exists.  Values may be 'yes' or 'no', with 'no'
# being the default (don't write .htaccess files).

#HTAccess	no

# StripCGI determines if URL CGI variables should be striped or not.
# Historically, the Webalizer stripped all CGI variables from the end
# of URLs to improve accuracy.  Some sites may prefer to keep the CGI
# variables in place, particularly those with highly dynamic pages.
# Values may be 'yes' or 'no', with the default being 'yes'.

#StripCGI	yes

# The TrimSquidURL option only has effect on squid type log files.
# When analyzing a squid log, it is usually desirable to have less
# granularity on the URLs.  TrimSquidURL = n where n is a number > 0
# causes all URLs to be truncated after the nth '/' after the http://
# portion.  Setting TrimSquidURL to one (1) will cause all URLs to be
# summarized by domain only.  The default is zero (0), which disables
# any such truncation and preserve the URLs as they are in the log.

# TrimSquidURL	0

# DNSCache specifies the DNS cache filename to use for reverse DNS lookups.
# This file must be specified if you wish to perform name lookups on any IP
# addresses found in the log file.  If an absolute path is not given as
# part of the filename (ie: starts with a leading '/'), then the name is
# relative to the default output directory.  See the DNS.README file for
# additional information.

# DNSキャッシュファイルを指定します。
# -- 出力先ディレクトリをサイト毎に指定しているのでデフォルトでも可。
#DNSCache	dns_cache.db
DNSCache	exmaple_dns_cache.db

# DNSChildren allows you to specify how many "children" processes are
# run to perform DNS lookups to create or update the DNS cache file.
# If a number is specified, the DNS cache file will be created/updated
# each time the Webalizer is run, immediately prior to normal processing,
# by running the specified number of "children" processes to perform
# DNS lookups.  If used, the DNS cache filename MUST be specified as
# well.  The default value is zero (0), which disables DNS cache file
# creation/updates at run time.  The number of children processes to
# run may be anywhere from 1 to 100, however a large number may effect
# normal system operations.  Reasonable values should be between 5 and
# 20.  See the DNS.README file for additional information.

# DNS子プロセス数を指定します。
#DNSChildren	0
DNSChildren	10

# CacheIPs allows unresolved IP addresses to be cached in the DNS
# database.  Normally, only resolved addresses are saved.  At some
# sites, particularly those with a large number of unresolvable IP
# addresses visiting, it may be useful to enable this feature so
# those addresses are not constantly looked up each time the program
# is run.  Values can be 'yes' or 'no', with 'no' being the default.

#CacheIPs	no

# CacheTTL specifies the time to live (TTL) value for cached DNS
# entries, in days.  This value may be anywhere between 1 and 100
# with the default being 7 days (1 week).

#CacheTTL	7

# The GeoDB option enables or disabled the use of the native
# Webalizer GeoDB geolocation services.  This is the preferred
# geolocation option.  Values may be 'yes' or 'no', with 'no'
# being the default.

#GeoDB		no

# GeoDBDatabase specifies an alternate database to use.  The
# default database is /usr/share/GeoDB/GeoDB.dat (however the
# path may be changed at compile time; use the -vV command
# line option to determine where).  If a different database is
# to be used, it may be specified here.  The name is relative
# to the output directory being used unless an absolute name
# (ie: starts with a leading '/') is specified.

#GeoDBDatabase	/usr/share/GeoDB/GeoDB.dat

# The GeoIP option enables or disables the use of geolocation
# services provided by the GeoIP library (http://www.maxmind.com),
# if available.  Values may be 'yes' or 'no, with 'no' being the
# default.  Note: if GeoDB is enabled, then this option will have
# no effect (GeoDB will be used regardless of this setting).

#GeoIP		no

# GeoIPDatabase specifies an alternate database filename to use by the
# GeoIP library.  If an absolute path is not given as part of the name
# (ie: starts with a leading '/'), then the name is relative to the
# default output directory. This option should not normally be needed.

#GeoIPDatabase	/usr/share/GeoIP/GeoIP.dat

# HTMLPre defines HTML code to insert at the very beginning of the
# file.  Default is the DOCTYPE line shown below.  Max line length
# is 80 characters, so use multiple HTMLPre lines if you need more.

#HTMLPre <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

# HTMLHead defines HTML code to insert within the <HEAD></HEAD>
# block, immediately after the <TITLE> line.  Maximum line length
# is 80 characters, so use multiple lines if needed.

# 出力するHTMLファイルのヘッダ情報を指定します。
#HTMLHead <META NAME="author" CONTENT="The Webalizer">
#HTMLHead <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
# -- ここでは、文字コード UT-8 としています。webalizerがデフォルトで出力する文字コードを指定してください。
HTMLHead <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

# HTMLBody defined the HTML code to be inserted, starting with the
# <BODY> tag.  If not specified, the default is shown below.  If
# used, you MUST include your own <BODY> tag as the first line.
# Maximum line length is 80 char, use multiple lines if needed.

#HTMLBody <BODY BGCOLOR="#E8E8E8" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000">

# HTMLPost defines the HTML code to insert immediately before the
# first <HR> on the document, which is just after the title and
# "summary period"-"Generated on:" lines.  If anything, this should
# be used to clean up in case an image was inserted with HTMLBody.
# As with HTMLHead, you can define as many of these as you want and
# they will be inserted in the output stream in order of apperance.
# Max string size is 80 characters.  Use multiple lines if you need to.

#HTMLPost 	<BR CLEAR="all">

# HTMLTail defines the HTML code to insert at the bottom of each
# HTML document, usually to include a link back to your home
# page or insert a small graphic.  It is inserted as a table
# data element (ie: <TD> your code here </TD>) and is right
# alligned with the page.  Max string size is 80 characters.

#HTMLTail <IMG SRC="msfree.png" ALT="100% Micro$oft free!">

# HTMLEnd defines the HTML code to add at the very end of the
# generated files.  It defaults to what is shown below.  If
# used, you MUST specify the </BODY> and </HTML> closing tags
# as the last lines.  Max string length is 80 characters.

#HTMLEnd </BODY></HTML>

# The LinkReferrer option determines if entries in the referrer table
# should be plain text or a HTML link to the referrer.  Values can be
# either 'yes' or 'no', with 'no' being the default.

#LinkReferrer	no

# The Quiet option suppresses output messages... Useful when run
# as a cron job to prevent bogus e-mails.  Values can be either
# "yes" or "no".  Default is "no".  Note: this does not suppress
# warnings and errors (which are printed to stderr).

# cronで実行するため出力する情報は極力抑えたいので、エラー、警告のみとします。
#Quiet		no
Quiet		yes

# ReallyQuiet will supress all messages including errors and
# warnings.  Values can be 'yes' or 'no' with 'no' being the
# default.  If 'yes' is used here, it cannot be overriden from
# the command line, so use with caution.  A value of 'no' has
# no effect.

#ReallyQuiet	no

# TimeMe allows you to force the display of timing information
# at the end of processing.  A value of 'yes' will force the
# timing information to be displayed.  A value of 'no' has no
# effect.

#TimeMe		no

# GMTTime allows reports to show GMT (UTC) time instead of local
# time.  Default is to display the time the report was generated
# in the timezone of the local machine, such as EDT or PST.  This
# keyword allows you to have times displayed in UTC instead.  Use
# only if you really have a good reason, since it will probably
# screw up the reporting periods by however many hours your local
# time zone is off of GMT.

#GMTTime	no

# Debug prints additional information for error messages.  This
# will cause webalizer to dump bad records/fields instead of just
# telling you it found a bad one.   As usual, the value can be
# either "yes" or "no".  The default is "no".  It shouldn't be
# needed unless you start getting a lot of Warning or Error
# messages and want to see why.  (Note: warning and error messages
# are printed to stderr, not stdout like normal messages).

#Debug		no

# FoldSeqErr forces the Webalizer to ignore sequence errors.
# This is useful for Netscape and other web servers that cache
# the writing of log records and do not guarentee that they
# will be in chronological order.  The use of the FoldSeqErr
# option will cause out of sequence log records to be treated
# as if they had the same time stamp as the last valid record.
# Default is to ignore out of sequence log records.  The use
# of this feature is strongly discouraged and rarely needed.
# (the webalizer already compensates for up to 60 minutes of
# difference between records).

#FoldSeqErr	no

# VisitTimeout allows you to set the default timeout for a visit
# (sometimes called a 'session').  The default is 30 minutes,
# which should be fine for most sites.
# Visits are determined by looking at the time of the current
# request, and the time of the last request from the site.  If
# the time difference is greater than the VisitTimeout value, it
# is considered a new visit, and visit totals are incremented.
# Value is the number of seconds to timeout (default=1800=30min)

#VisitTimeout	1800

# IgnoreHist shouldn't be used in a config file, but it is here
# just because it might be usefull in certain situations.  If the
# history file is ignored, the main "index.html" file will only
# report on the current log files contents.  Usefull only when you
# want to reproduce the reports from scratch.  USE WITH CAUTION!
# Valid values are "yes" or "no".  Default is "no".

#IgnoreHist	no

# IgnoreState also shouldn't be used, but is here anyway.  It is
# similar to the IgnoreHist option, but for the incremental data
# file.  If this is set to 'yes', any existing incrememtal data
# will be ignored and a new data file will be written at the end
# of processing.  USE WITH CAUTION.  By ignoring an existing
# incremental data file, all previous processing for the current
# month will be lost, and those logs must be re-processed.
# Valid values are "yes" or "no".  Default is "no".

#IgnoreState	no

# CountryGraph allows the usage by country graph to be disabled.
# Values can be 'yes' or 'no', default is 'yes'.

#CountryGraph	yes

# CountryFlags allows flags to be displayed in the top country
# table in monthly reports.  Values can be 'yes' or 'no', with
# the default being 'no'.

#CountryFlags	no

# FlagDir specifies the location of flag graphics which will be
# used in the top country table.  If not specified, the default
# is to look in the 'flags' directory directly under the output
# directory being used for the reports.  If this option is used,
# the display of flag graphics will be enabled by default.

#FlagDir	flags

# DailyGraph and DailyStats allows the daily statistics graph
# and statistics table to be disabled (not displayed).  Values
# may be "yes" or "no". Default is "yes".

#DailyGraph	yes
#DailyStats	yes

# HourlyGraph and HourlyStats allows the hourly statistics graph
# and statistics table to be disabled (not displayed).  Values
# may be "yes" or "no". Default is "yes".

#HourlyGraph	yes
#HourlyStats	yes

# GraphLegend allows the color coded legends to be turned on or off
# in the graphs.  The default is for them to be displayed.  This only
# toggles the color coded legends, the other legends are not changed.
# If you think they are hideous and ugly, say 'no' here

#GraphLegend	yes

# GraphLines allows you to have index lines drawn behind the graphs.
# I personally am not crazy about them, but a lot of people requested
# them and they weren't a big deal to add.  The number represents the
# number of lines you want displayed.  Default is 2, you can disable
# the lines by using a value of zero ('0').  [max is 20]
# Note, due to rounding errors, some values don't work quite right.
# The lower the better, with 1,2,3,4,6 and 10 producing nice results.

#GraphLines	2

# IndexMonths defines the number of months to display in the main index
# (yearly summary) table.  Value can be between 12 and 120, with the
# default being 12 months (1 year).

#IndexMonths	12

# YearHeaders enables/disables the display of year headers in the main
# index (yearly summary) table.  If enabled, year headers will be shown
# when the table is displaying more than 16 months worth of data. Values
# can be 'yes' or 'no', with 'yes' being the default.

#YearHeaders	yes

# YearTotals enables/disables the display of yearly totals in the main
# index (yearly summary) table.  If enabled, year totals will be shown
# when the table is displaying more than 16 months worth of data.  Values
# can be 'yes' or 'no', with 'yes' being the default.

#YearTotals	yes

# GraphMonths defines the number of months to display in the main index
# (yearly summary) graph.  Value can be between 12 and 72 months, with
# the default being 12 months.

#GraphMonths	12

# The "Top" options below define the number of entries for each table.
# Defaults are Sites=30, URLs=30, Referrers=30 and Agents=15, and
# Countries=30. TopKSites and TopKURLs (by KByte tables) both default
# to 10, as do the top entry/exit tables (TopEntry/TopExit).  The top
# search strings and usernames default to 20.  Tables may be disabled
# by using zero (0) for the value.

#TopSites        30
#TopKSites       10
#TopURLs         30
#TopKURLs        10
#TopReferrers    30
#TopAgents       15
#TopCountries    30
#TopEntry        10
#TopExit         10
#TopSearch       20
#TopUsers        20

# The All* keywords allow the display of all URLs, Sites, Referrers
# User Agents, Search Strings and Usernames.  If enabled, a seperate
# HTML page will be created, and a link will be added to the bottom
# of the appropriate "Top" table.  There are a couple of conditions
# for this to occur..  First, there must be more items than will fit
# in the "Top" table (otherwise it would just be duplicating what is
# already displayed).  Second, the listing will only show those items
# that are normally visable, which means it will not show any hidden
# items.  Grouped entries will be listed first, followed by individual
# items.  The value for these keywords can be either 'yes' or 'no',
# with the default being 'no'.  Please be aware that these pages can
# be quite large in size, particularly the sites page,  and seperate
# pages are generated for each month, which can consume quite a lot
# of disk space depending on the traffic to your site.

# 全情報をいつでも確認できるようにAll* を全て有効に指定します。
#AllSites	no
#AllURLs	no
#AllReferrers	no
#AllAgents	no
#AllSearchStr	no
#AllUsers       no
AllSites      yes
AllURLs	      yes
AllReferrers  yes
AllAgents     yes
AllSearchStr  yes
AllUsers      yes

# The Webalizer normally strips the string 'index.' off the end of
# URLs in order to consolidate URL totals.  For example, the URL
# /somedir/index.html is turned into /somedir/ which is really the
# same URL.  This option allows you to specify additional strings
# to treat in the same way.  You don't need to specify 'index.' as
# it is always scanned for by The Webalizer, this option is just to
# specify _additional_ strings if needed.  If you don't need any,
# don't specify any as each string will be scanned for in EVERY
# log record... A bunch of them will degrade performance.  Also,
# the string is scanned for anywhere in the URL, so a string of
# 'home' would turn the URL /somedir/homepages/brad/home.html into
# just /somedir/ which is probably not what was intended.

#IndexAlias     home.htm
#IndexAlias	homepage.htm

# The DefaultIndex option is used to enable/disable the use of
# "index." as the default index name to be stripped off the end of
# a URL (as described above).  Most sites will not need to use this
# option, but some may, such as those whose default index file name
# is different, or those that use "index.php" or similar URLs in a
# dynamic environment.  Values can be 'yes' or 'no', with the default
# being 'yes'.  This option does not effect any names added using the
# IndexAlias option, and those names will still function as described
# regardless of this setting.

#DefaultIndex	yes

# The Hide*, Group* and Ignore* and Include* keywords allow you to
# change the way Sites, URLs, Referrers, User Agents and Usernames
# are manipulated.  The Ignore* keywords will cause The Webalizer to
# completely ignore records as if they didn't exist (and thus not
# counted in the main site totals).  The Hide* keywords will prevent
# things from being displayed in the 'Top' tables, but will still be
# counted in the main totals.  The Group* keywords allow grouping
# similar objects as if they were one.  Grouped records are displayed
# in the 'Top' tables and can optionally be displayed in BOLD and/or
# shaded. Groups cannot be hidden, and are not counted in the main
# totals. The Group* options do not, by default, hide all the items
# that it matches.  If you want to hide the records that match (so just
# the grouping record is displayed), follow with an identical Hide*
# keyword with the same value.  (see example below)  In addition,
# Group* keywords may have an optional label which will be displayed
# instead of the keywords value.  The label should be seperated from
# the value by at least one 'white-space' character, such as a space
# or tab.  If the match string contains whitespace (spaces or tabs),
# the string should be quoted with either single or double quotes.
#
# The value can have either a leading or trailing '*' wildcard
# character.  If no wildcard is found, a match can occur anywhere
# in the string. Given a string "www.yourmama.com", the values "your",
# "*mama.com" and "www.your*" will all match.

# Your own site should be hidden
#HideSite	*webalizer.org
#HideSite	localhost

# Your own site gives most referrals
#HideReferrer	webalizer.org/

# This one hides non-referrers ("-" Direct requests)
#HideReferrer	Direct Request

# Usually you want to hide these
HideURL		*.gif
HideURL		*.GIF
HideURL		*.jpg
HideURL		*.JPG
HideURL		*.png
HideURL		*.PNG
HideURL		*.ra
# URLリストから隠す拡張子の、css , ico を追加します。
HideURL		*.css
HideURL		*.CSS
HideURL		*.ico
HideURL		*.ICO

# Hiding agents is kind of futile
#HideAgent	RealPlayer

# You can also hide based on authenticated username
#HideUser	root
#HideUser	admin

# Grouping options
#GroupURL	/cgi-bin/*	CGI Scripts
#GroupURL	/images/*	Images

#GroupSite	*.aol.com
#GroupSite	*.compuserve.com

#GroupReferrer	yahoo.com/	Yahoo!
#GroupReferrer	excite.com/     Excite
#GroupReferrer	infoseek.com/   InfoSeek
#GroupReferrer	webcrawler.com/ WebCrawler
# リファラーのグループ定義を指定します。
# -- 検索エンジンからの参照が多いか簡単に確認することができます。
GroupReferrer	yahoo.co.jp/	Yahoo!Japan
GroupReferrer	google.co.jp/	GoogleJapan
GroupReferrer	rakuten.co.jp/	InfoSeek(Rakuten)Japan
GroupReferrer	goo.ne.jp/	Goo
GroupReferrer	bing.com/	MSNJapan

#GroupUser      root            Admin users
#GroupUser      admin           Admin users
#GroupUser      wheel           Admin users

# The following is a great way to get an overall total
# for browsers, and not display all the detail records.
# (You should use MangleAgent to refine further...)

#GroupAgent	Opera/		Opera
#HideAgent	Opera/
#GroupAgent	"MSIE 7"	Microsoft Internet Exploder 7
#HideAgent	MSIE 7
#GroupAgent	"MSIE 6"	Microsoft Internet Exploder 6
#HideAgent	MSIE 6
#GroupAgent	"MSIE "		Older Microsoft Exploders
#HideAgent	MSIE 
#GroupAgent	Firefox/2.	Firefox 2
#HideAgent	Firefox/2.
#GroupAgent	Firefox/1.	Firefox 1.x
#HideAgent	Firefox/1.
#GroupAgent	Konqueror	Konqueror
#HideAgent	Konqueror
#GroupAgent	Safari		Safari
#HideAgent	Safari
#GroupAgent	Lynx*		Lynx
#HideAgent	Lynx*
#GroupAgent	Wget/		WGet
#HideAgent	Wget/
#GroupAgent	(compatible;	Other Mozilla Compatibles
#HideAgent	(compatible;
#GroupAgent	Mozilla*	Mozilla/Netscape
#HideAgent	Mozilla*

# HideAllSites allows forcing individual sites to be hidden in the
# report.  This is particularly useful when used in conjunction
# with the "GroupDomain" feature, but could be useful in other
# situations as well, such as when you only want to display grouped
# sites (with the GroupSite keywords...).  The value for this
# keyword can be either 'yes' or 'no', with 'no' the default,
# allowing individual sites to be displayed.

#HideAllSites	no

# The GroupDomains keyword allows you to group individual hostnames
# into their respective domains.  The value specifies the level of
# grouping to perform, and can be thought of as 'the number of dots'
# that will be displayed.  For example, if a visiting host is named
# cust1.tnt.mia.uu.net, a domain grouping of 1 will result in just
# "uu.net" being displayed, while a 2 will result in "mia.uu.net".
# The default value of zero disable this feature.  Domains will only
# be grouped if they do not match any existing "GroupSite" records,
# which allows overriding this feature with your own if desired.

#GroupDomains	0

# The GroupShading allows grouped rows to be shaded in the report.
# Useful if you have lots of groups and individual records that
# intermingle in the report, and you want to diferentiate the group
# records a little more.  Value can be 'yes' or 'no', with 'yes'
# being the default.

#GroupShading	yes

# GroupHighlight allows the group record to be displayed in BOLD.
# Can be either 'yes' or 'no' with the default 'yes'.

#GroupHighlight	yes

# The Ignore* keywords allow you to completely ignore log records based
# on hostname, URL, user agent, referrer or username.  I hesitated in
# adding these, since the Webalizer was designed to generate _accurate_
# statistics about a web servers performance.  By choosing to ignore
# records, the accuracy of reports become skewed, negating why I wrote
# this program in the first place.  However, due to popular demand, here
# they are.  Use the same as the Hide* keywords, where the value can have
# a leading or trailing wildcard '*'.  Use at your own risk Please
# remember, the use of these will MAKE YOUR STATS INACCURATE and you
# should consider using an equivalent 'Hide*' keyword instead.

#IgnoreSite	bad.site.net
#IgnoreURL	/test*
#IgnoreReferrer	file:/*
#IgnoreAgent	RealPlayer
#IgnoreUser     root

# The Include* keywords allow you to force the inclusion of log records
# based on hostname, URL, user agent, referrer or username.  They take
# precidence over the Ignore* keywords.  Note: Using Ignore/Include
# combinations to selectivly process parts of a web site is _extremely
# inefficent_!!! Avoid doing so if possible (ie: grep the records to a
# seperate file if you really want that kind of report).

# Example: Only show stats on Joe User's pages...
#IgnoreURL	*
#IncludeURL	~joeuser*

# Or based on an authenticated username
#IgnoreUser     *
#IncludeUser    someuser

# The MangleAgents allows you to specify how much, if any, The Webalizer
# should mangle user agent names.  This allows several levels of detail
# to be produced when reporting user agent statistics.  There are six
# levels that can be specified, which define different levels of detail
# supression.  Level 5 shows only the browser name (MSIE or Mozilla)
# and the major version number.  Level 4 adds the minor version number
# (single decimal place).  Level 3 displays the minor version to two
# decimal places.  Level 2 will add any sub-level designation (such
# as Mozilla/3.01Gold or MSIE 3.0b).  Level 1 will attempt to also add
# the system type if it is specified.  The default Level 0 displays the
# full user agent field without modification and produces the greatest
# amount of detail.  User agent names that can't be mangled will be
# left unmodified.

#MangleAgents    0

# The SearchEngine keywords allow specification of search engines and
# their query strings on the URL.  These are used to locate and report
# what search strings are used to find your site.  The first word is
# a substring to match in the referrer field that identifies the search
# engine, and the second is the URL variable used by that search engine
# to define it's search terms.

#SearchEngine	.google.   	q=
#SearchEngine	yahoo.com	p=
#SearchEngine	altavista.com	q=
#SearchEngine   aolsearch.      query=
#SearchEngine   ask.co          q=
#SearchEngine	eureka.com	q=
#SearchEngine	lycos.com	query=
#SearchEngine	hotbot.com	MT=
#SearchEngine	msn.com		q=
#SearchEngine	infoseek.com	qt=
#SearchEngine	excite		search=
#SearchEngine	netscape.com	query=
#SearchEngine	mamma.com	query=
#SearchEngine	alltheweb.com	q=
#SearchEngine	northernlight.com  qr=

# 検索エンジンのクエリ定義を指定します。
# -- おおよそ以下の検索エンジンの定義で日本国内であれば95%を超えていると思います。
SearchEngine    .yahoo.         p=
SearchEngine    .google.        q=
SearchEngine    infoseek.co.jp  qt=
SearchEngine    rakuten.co.jp   qt=
SearchEngine    bing.com        q=
SearchEngine    goo.ne.jp       MT=
SearchEngine    biglobe.ne.jp   q=
SearchEngine    nifty.com       q=
SearchEngine    excite.co.jp    search=
SearchEngine    excite.com      q=
SearchEngine    livedoor.com    q=
SearchEngine    naver.jp        q=
SearchEngine    aol.jp          q=


# Normally, search strings are converted to lower case in order to
# increase accuracy.  The SearchCaseI option allows them to maintain
# case sensitivity, useful for some sites.  The value can be 'yes'
# or 'no', with 'yes' (case insensitive) being the default.

#SearchCaseI	yes

# The Dump* keywords allow the dumping of Sites, URLs, Referrers
# User Agents, Usernames and Search strings to seperate tab delimited
# text files, suitable for import into most database or spreadsheet
# programs.

# DumpPath specifies the path to dump the files.  If not specified,
# it will default to the current output directory.  Do not use a
# trailing slash ('/').

#DumpPath	/var/lib/httpd/logs

# The DumpHeader keyword specifies if a header record should be
# written to the file.  A header record is the first record of the
# file, and contains the labels for each field written.  Normally,
# files that are intended to be imported into a database system
# will not need a header record, while spreadsheets usually do.
# Value can be either 'yes' or 'no', with 'no' being the default.

#DumpHeader	no

# DumpExtension allow you to specify the dump filename extension
# to use.  The default is "tab", but some programs are pickey about
# the filenames they use, so you may change it here (for example,
# some people may prefer to use "csv").

#DumpExtension	tab

# These control the dumping of each individual table.  The value
# can be either 'yes' or 'no'.. the default is 'no'.

#DumpSites	no
#DumpURLs	no
#DumpReferrers	no
#DumpAgents	no
#DumpUsers	no
#DumpSearchStr  no

# The custom graph colors are defined here. Declare them 
# in the standard hexadecimal way (as HTML, without the '#') 
# If none are given, you will get the standard default colors.

#ColorHit	00805c
#ColorFile	0040ff
#ColorSite	ff8000
#ColorKbyte	ff0000
#ColorPage	00e0ff
#ColorVisit	ffff00
#ColorMisc      00e0ff

#PieColor1	800080
#PieColor2	80ffc0
#PieColor3	ff00ff
#PieColor4	ffc080

# End of configuration file...  Have a nice day!

cronの設定ファイル( /etc/crontab )

Webalizerを毎日実行するようにcronに登録します。

まず、dailyの実施時刻を/etc/crontabで確認しておきます。

...
# run-parts
01 * * * * root run-parts /etc/cron.hourly
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly
...

上記の設定であれば、毎日4:02に実施されることになっています。
これによりログローテーションも毎日4:02に実施されることになります。

ログローテーション実施後にwebalizerを実行したので、少し余裕をもって、30分後にwebalizerを実行するように /etc/crontab に登録してみます。

...
# webalizer
32 4 * * * root /usr/bin/webalizer -c /etc/webalizer.conf
...

ここまでできたら、1日後に /usage/exmaple/ へアクセスしてみてください。
webalizerで出力されたグラフ、表が見えたらOKです。

もし、うまくいかなかったら、ここで記載している内容を見直ししてみてください。

(amazon )	インフラエンジニア教本2――システム管理・構築技術解説 (SoftwareDesign別冊)に寄稿しました。「ログを読む技術」の再掲載になります。 8月号を見逃された方は、是非、ご一読くださいませ。
(amazon )	Software Design 8月号に寄稿しました。「ログを読む技術」について寄稿しました。興味のある方は、是非、ご一読くださいませ。
また、執筆や当サイトにおける広告のご依頼などございましたら、お問い合わせページよりご一報ください。

Webalizerを使うための設定例(apache,logrotate,cron,webalizer)

Apacheの設定ファイル( /etc/httpd/conf/httpd.conf )

Apacheの設定ファイル( /etc/httpd/conf.d/webalizer.conf )

ログローテーションの設定ファイル( /etc/logrotate.d/httpd )

Webalizerの設定ファイル( /etc/webalizer.conf )

cronの設定ファイル( /etc/crontab )

最近投稿の記事

さくらのVPS 全プランリニューアルです。（石狩(北海道)も選択可）

カテゴリ

Serverman@VPS 完全1ヶ月無料キャンペーン実施中です。

リンク

KVM採用 ConoHa VPSは、時間単位で借りれる便利なVPSです。

KVM採用お名前.com VPS(KVM) 2G プラン初期設定費無料キャンペーン実施です。

Webalizerを使うための設定例(apache,logrotate,cron,webalizer)

Apacheの設定ファイル( /etc/httpd/conf/httpd.conf )

Apacheの設定ファイル( /etc/httpd/conf.d/webalizer.conf )

ログローテーションの設定ファイル( /etc/logrotate.d/httpd )

Webalizerの設定ファイル( /etc/webalizer.conf )

cronの設定ファイル( /etc/crontab )

最近投稿の記事

さくらのVPS 全プラン リニューアルです。（石狩(北海道)も選択可）

カテゴリ

Serverman@VPS 完全1ヶ月無料 キャンペーン実施中です。

リンク

KVM採用 ConoHa VPSは、時間単位で借りれる便利なVPSです。

KVM採用 お名前.com VPS(KVM) 2G プラン 初期設定費無料 キャンペーン 実施です。

さくらのVPS 全プランリニューアルです。（石狩(北海道)も選択可）

Serverman@VPS 完全1ヶ月無料キャンペーン実施中です。

KVM採用お名前.com VPS(KVM) 2G プラン初期設定費無料キャンペーン実施です。