Concept There are manuy useful PDF documents in the internet. These are very useful when we get data for training.
PyPDF2 is a library for manipulating PDF files via Python.
PyPDF2 Official Documentation
Install You can install PyPDF2 via pip.
pip install PyPDF2 How to use - read a PDF file PdfFileReader Class - Official PyPDF2 document
We should open a file with mode rb. Read the file with PyPDF2.PdfFileReader(file_object) import PyPDF2 with open("sample.
What is spaCy In README of the GitHub project, there is a discription what is spaCy.
spaCy: Industrial-strength NLP
spaCy is a library for advanced Natural Language Processing in Python and Cython. It’s built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pretrained statistical models and word vectors, and currently supports tokenization for 50+ languages. It features state-of-the-art speed, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration.
f-strings Concept The official name of f-strings is “formatted string literal.” f-string is a “modern” way to put values of variables into the strings (so fat in Feb. 2020.) Before fstring appears, we were using format method. For me, fstring is very intuitive than format method.
Formatted string literals - Python official document
format - Python official document
Simple example Both print lines in following code print the string “It’s me, Mario”.
Libraries I need a few libraries for NLP and each of them are very powerful. I downloaded all of these libraries via pip, like pip install -U {package}. In the last section, I summaraized the libraries and I can install them at once later.
spaCy: Open source NLP library. NLTK: Natural Language ToolKit. It is older than spaCy (spaCy 2015~, NLTK 2001~). gensim: NLP tools. I installed it for Doc2Vec. TensorFlow: For custom models of machine learning including Keras.
Installing stacks You may be confused at first the differences between,
pyenv virtualenv pyenv-virtualenv Especially, virtualenv sounds like Linux virtual environment, but it isn’t at all. Here is the good answer about that.
Conclusion: pyenv-virtualenv is the best choice.
pyenv-virtualenv official: https://github.com/pyenv/pyenv-virtualenv
Install pyenv Install pyenv in macOS (or other Linux environments).
# Download source under ~/.pyenv git clone https://github.com/yyuu/pyenv.git ~/.pyenv # Set PATH and another variable echo -e '\n export PYENV_ROOT=$HOME/.