About this article
This article is meant to be an introduction guide of Algorithm::LibLinear, a Perl binding to the famous LIBLINEAR machine learning toolkit.
I've once written an article titled "Algorithm::LibLinear の紹介" ("Introduction to Algorithm::LibLinear,") in Japanese. Today, although some part of the article is outdated, Blogger's access analytics reported me that the article is still popular, and fairly large number of visitors are from English-speaking country. Thus I guessed I should prepare an updated tutorial in English.
Notice that what I try to describe here is library usage, not a machine learning methodology. If you are new to machine learning, I recommend to read a practical guide by Chih-Wei Hsu, et al and try LIBSVM/LIBLINEAR using CLI commands at first.
As you might see my English skill is not so great. Please don't hesitate to point/correct unclear part of this article and help me to fix it.
Installation
Algorithm::LibLinear is an XS library. So a compiler is needed for compiling C/C++ dependencies.
Nov 2, 2015 at present, the latest version of Algorithm::LibLinear is v0.16 (based on LIBLINEAR 2.1) and available on CPAN. You can install the library using cpan
or cpanm
command (since dependencies to be compiled are bundled in the distribution, no additional instruction should be required ):
cpanm Algorithm::LibLinear
Class overview
You should consider only 4 main classes:
Algorithm::LibLinear
- Trainer class. Holds training setting and generates trained model.Algorithm::LibLinear::DataSet
- Dataset.Algorithm::LibLinear::FeatureScaling
- Utility class for scaling feature range.Algorithm::LibLinear::Model
- Trained classifier (classification) / Estimated function (regression.)
Note that all the classes are immutable. Once created there's no method to modify it.
Executing training
On training, first you prepare a training dataset as Algorithm::LibLinear::DataSet
and regulate it using Algorithm::LibLinear::FeatureScaling
object:
use Algorithm::LibLinaer; # This also loads Algorithm::LibLinear::{DataSet,Model} for convinence.
use Algorithm::LibLinear::FeatureScaling; # FeatureScaling class is sometimes unused. So load it manually when you use.
# |A::LL::DataSet#load| loads LIBSVM format data from string/file.
my $data_set = Algorithm::LibLinear::DataSet->load(string => <<EOS);
+1 1:0.708333 2:1 3:1 4:-0.320755 5:-0.105023 6:-1 7:1 8:-0.419847 9:-1 10:-0.225806 12:1 13:-1
-1 1:0.583333 2:-1 3:0.333333 4:-0.603774 5:1 6:-1 7:1 8:0.358779 9:-1 10:-0.483871 12:-1 13:1
+1 1:0.166667 2:1 3:-0.333333 4:-0.433962 5:-0.383562 6:-1 7:-1 8:0.0687023 9:-1 10:-0.903226 11:-1 12:-1 13:1
-1 1:0.458333 2:1 3:1 4:-0.358491 5:-0.374429 6:-1 7:-1 8:-0.480916 9:1 10:-0.935484 12:-0.333333 13:1
...
EOS
# Scale all the data for ensuring each value is within {-1, +1}.
my $scaler = Algorithm::LibLinear::FeatureScaling->new(
data_set => $data_set,
lower_bound => -1,
upper_bound => +1,
);
# Save scaling parameter for scaling test data later.
$scaler->save(filename => '/path/to/scaling_parameter_file');
# Since A::LL::DataSet is immutable, |scale| method creates a new scaled instance.
$data_set = $scaler->scale(data_set => $data_set);
Historical note: As of v0.08, Algorithm::LibLinear::ScalingParameter
was provided instead of Algorithm::LibLinear::FeatureScaling
class. It was removed from v0.09+ due to its complex interface.
Then you set up an Algorithm::LibLinear
instance with training parameter:
my $learner = Algorithm::LibLinear->new(
# |solver| determines learning algorithm and type of trained object ("SVC" is for SVM classification).
solver => 'L2R_L2LOSS_SVC_DUAL',
# Training parameters are problem-dependent.
cost => 1,
epsilon => 0.01,
);
At last, you give the dataset to the trainer then take a trained Algorithm::LibLinear::Model
object:
# This process may take several minutes (depends on dataset size.)
my $model = $learner->train(data_set => $data_set);
# Save the model for later use.
$model->save(filename => '/path/to/model_file');
After that, trainer and dataset are no longer required. So you can undef
them for increasing free memory.
Using trained model
Now you have a trained classifier model. You can predict
a class label which a given feature to belong:
my %unknown_feature = (
1 => 0.875,
2 => -1,
3 => -0.333333,
4 => -0.509434,
5 => -0.347032,
6 => -1,
7 => 1,
8 => -0.236641,
9 => 1,
10 => -0.935484,
11 => -1,
12 => -0.333333,
13 => -1,
);
my $scaled_feature = $scaler->scale(feature => \%unknown_feature);
my $class_label = $model->predict(feature => $scaled_feature);
Features are represented as HashRefs which having integer (> 0) keys, as same as training dataset. Note that feature scaling with same parameter as training is important.
Before you go
Git repository is on GitHub. Please report any issues / send patches to there, not to CPAN RT (I rarely watch it).
For more detail on API, refer perldoc Algorithm::LibLinear
. And LIBLINEAR's README file which describes equivalent C API might be help.
コメント
コメントを投稿