Aside from the core functionality in Python, which allows for a wide range of computing tasks, there is also a large number of packages (sets of functions) that can be downloaded and installed via PyPI, the Python Package Index.
Our installation of Python 3 includes pip, which is a package manager used to install packages from PyPI. With our homebrew installed Python 3, 'pip3' is the alias to be used when we want pip to add packages to the Python 3 installation. If you changed the aliases in the end of part 1 of this tutorial, 'pip' on your computer will always point to pip3 (which is good).
Note that packages are installed at the bash prompt ('$') and not in Python ('>>>'). A key package for doing data science in Python is pandas.
To install pandas, enter this at the bash prompt:
$ pip install pandas
Whenever we are working in Python, and need a package that we don't have installed, we must install it like this, using pip3 in the bash shell. Every now and then it is also good to upgrade pip:
$ pip install --upgrade pip
If you get error messages about not having sufficient access to install a package, then use 'sudo' (superuser "do") and enter your macOS password when prompted:
$ sudo pip install pandas
$ pip install jupyter $ pip install gensim $ pip install tweepy $ pip install networkx $ pip install fb_scrape_public
An especially useful package for text analysis is NLTK (Natural Language Toolkit), which demands an extra setup step. First do the usual:
$ pip install nltk
Then, type 'python' in Terminal to launch Python. In Python, import nltk, then run its download script. In the graphical user interface, choose to install the 'popular' bits. You can run the downloader again whenever you need to add something.
$ python >>> import nltk >>> nltk.download()