Python

Py2PDF - read text from PDF file

Concept There are manuy useful PDF documents in the internet. These are very useful when we get data for training. PyPDF2 is a library for manipulating PDF files via Python. PyPDF2 Official Documentation Install You can install PyPDF2 via pip. pip install PyPDF2 How to use - read a PDF file PdfFileReader Class - Official PyPDF2 document We should open a file with mode rb. Read the file with PyPDF2.PdfFileReader(file_object) import PyPDF2 with open("sample.

SpaCy

What is spaCy In README of the GitHub project, there is a discription what is spaCy. spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It’s built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pretrained statistical models and word vectors, and currently supports tokenization for 50+ languages. It features state-of-the-art speed, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration.

String manipulation with Python (for NLP)

f-strings Concept The official name of f-strings is “formatted string literal.” f-string is a “modern” way to put values of variables into the strings (so fat in Feb. 2020.) Before fstring appears, we were using format method. For me, fstring is very intuitive than format method. Formatted string literals - Python official document format - Python official document Simple example Both print lines in following code print the string “It’s me, Mario”.

NLP tools in Python

Libraries I need a few libraries for NLP and each of them are very powerful. I downloaded all of these libraries via pip, like pip install -U {package}. In the last section, I summaraized the libraries and I can install them at once later. spaCy: Open source NLP library. NLTK: Natural Language ToolKit. It is older than spaCy (spaCy 2015~, NLTK 2001~). gensim: NLP tools. I installed it for Doc2Vec. TensorFlow: For custom models of machine learning including Keras.

pyenv and pyenv-virtualenv - Intro

Installing stacks You may be confused at first the differences between, pyenv virtualenv pyenv-virtualenv Especially, virtualenv sounds like Linux virtual environment, but it isn’t at all. Here is the good answer about that. Conclusion: pyenv-virtualenv is the best choice. pyenv-virtualenv official: https://github.com/pyenv/pyenv-virtualenv Install pyenv Install pyenv in macOS (or other Linux environments). # Download source under ~/.pyenv git clone https://github.com/yyuu/pyenv.git ~/.pyenv # Set PATH and another variable echo -e '\n export PYENV_ROOT=$HOME/.