MeCab

Install Basics install pip install -U mecab-python3 Download dictionary https://pypi.org/project/mecab-python3/ These wheels include an internal (statically linked) copy of the MeCab library, and a copy of the mecab-ipadic dictionary (using UTF-8 text encoding), which is automatically used by default. If you wish to use a different dictionary, you will need to install it yourself, write a mecabrc file directing MeCab to use it, and set the environment variable MECABRC to point to this file.

Word2vec

This is a draft. Most important in NLP I think In 2020, NLP is a study to try to understand human languages with computers. From my common sense, it is pragmatically impossible for human to “understand” our languages in numbers. Yes, of course, scientism peopls could say “our world constitutes of quantum mechanical particles, and if we have enough machine power we can simulate a humam. So language is understandable with numbers.

Install GiNZA on macOS (April 2020)

Install on 3.8.1 - failed As of April 2020, I tried to install ginza on my macOS laptop in Python 3.8.1 (pyenv-virtualenv). pip install -U ginza but it was failed and returned an error like, ... Collecting ja_ginza_dict<3.2.0,>=3.1.0 Using cached ja_ginza_dict-3.1.0-1.tar.gz (44.8 MB) ERROR: Command errored out with exit status 1: command: /Users/atlex/.pyenv/versions/3.8.1/envs/nlp/bin/python3.8 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/hf/kz2h5f215jx73h3vl_d423pr0000gn/T/pip-install-_gi9fc1z/ja-ginza-dict/setup.py'"'"'; __file__='"'"'/private/var/folders/hf/kz2h5f215jx73h3vl_d423pr0000gn/T/pip-install-_gi9fc1z/ja-ginza-dict/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/hf/kz2h5f215jx73h3vl_d423pr0000gn/T/pip-install-_gi9fc1z/ja-ginza-dict/pip-egg-info cwd: /private/var/folders/hf/kz2h5f215jx73h3vl_d423pr0000gn/T/pip-install-_gi9fc1z/ja-ginza-dict/ Complete output (19 lines): Traceback (most recent call last): File "/Users/atlex/.

Manage AWS Route53 with Ansible

I need to automate AWS Route 53 operation with Ansible, and here is a note. (As Ansible always does, most useful informations are in the official document.) Set up environment Install boto According to the Ansible official document, We need to install boto (AWS SDK for Python). pip install -U boto Get AWS API keys and export it boto uses two keys in order to use AWS API under the hood.

Loop in Ansible

Loop https://docs.ansible.com/ansible/latest/user_guide/playbooks_loops.html If you are falifilar with any of program language, you can understand loop in Ansible as iteration like for. while, or etc.. Sample Loop lines in a file Answer from Cristian was the answer. https://stackoverflow.com/questions/33541870/how-do-i-loop-over-each-line-inside-a-file-with-ansible/33544101 directory structure ├── files │ └── list.txt └── tasks └── main.yml In tasks/main.yml, --- - debug: msg: "{{ item }}" loop: "{{ lookup('file', 'files/list.txt').splitlines() }}" In files/list.txt, This is the first line. I'm second line.

Snippets

Open file with with statement with open("path/to/file", "r") as in_file blabla... Loop with index words = ["You", "are", "a", "genious", "!"] for i,word in enumerate(words): print(f"{i:{10}}{word:{12}}") Exit loop break

Py2PDF - read text from PDF file

Concept There are manuy useful PDF documents in the internet. These are very useful when we get data for training. PyPDF2 is a library for manipulating PDF files via Python. PyPDF2 Official Documentation Install You can install PyPDF2 via pip. pip install PyPDF2 How to use - read a PDF file PdfFileReader Class - Official PyPDF2 document We should open a file with mode rb. Read the file with PyPDF2.PdfFileReader(file_object) import PyPDF2 with open("sample.

SpaCy

What is spaCy In README of the GitHub project, there is a discription what is spaCy. spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It’s built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pretrained statistical models and word vectors, and currently supports tokenization for 50+ languages. It features state-of-the-art speed, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration.

Reference your own contents in Hugo

https://gohugo.io/content-management/cross-references/#use-ref-and-relref Refer another post Please delete a backquote (\) in the follwoing snippet. I added it because of a Hugo rendering issue. [text to be hyperlinked]({\{< ref "hugo/article.md" >}}) You don’t need to append contents in the reference. When you open this hugo document in local hugo server the link is referred as http://localhost:1313/hugo/article. When you open this hugo document on the hosted server, the link is referred as http://{{ your_base_URL_in_config.

String manipulation with Python (for NLP)

f-strings Concept The official name of f-strings is “formatted string literal.” f-string is a “modern” way to put values of variables into the strings (so fat in Feb. 2020.) Before fstring appears, we were using format method. For me, fstring is very intuitive than format method. Formatted string literals - Python official document format - Python official document Simple example Both print lines in following code print the string “It’s me, Mario”.