MeCab

Page content

Install

Basics install

pip install -U mecab-python3

Download dictionary

https://pypi.org/project/mecab-python3/

These wheels include an internal (statically linked) copy of the MeCab library, and a copy of the mecab-ipadic dictionary (using UTF-8 text encoding), which is automatically used by default. If you wish to use a different dictionary, you will need to install it yourself, write a mecabrc file directing MeCab to use it, and set the environment variable MECABRC to point to this file.

For more sophisicated tokenization, I add a additional dictionary NEologd. There is an installation manual In this GitHub page.

brew install mecab mecab-ipadic xz git curl
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
cd mecab-ipadic-neologd
./bin/install-mecab-ipadic-neologd -n
...
[install-mecab-ipadic-NEologd] : mecab-ipadic-NEologd will be install to /usr/local/lib/mecab/dic/mecab-ipadic-neologd

Use MeCab in Python

mecab = MeCab.Tagger("-Ochasen -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd/")