Summary and Setup
Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data.
This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the JupyterLab interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, a brief introduction to plotting, and how to work with databases directly from Python. Time permitting, the final episode encourages applying learned techniques to a real world dataset meaningful to the learner.
Getting Started
Data Carpentry’s teaching is hands-on, so participants are encouraged
to use their own computers to insure the proper setup of tools for an
efficient workflow.
These lessons assume no prior knowledge
of the skills or tools.
To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.
Data
The sample data used in this lesson is from the fiction holdings of an academic library along with available information about the ethnicity and gender identity for available years.
You can download these example files as a compressed zip file. You should save it to a memorable location, and then unzip the data within. If you are not already comfortable with file paththing, we recommend you store the data folder on your desktop to closely match the examples provided. On the other hand, if you are already comfortable with file pathing, you are free to store it at your discretion.
Software
Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of the scientific packages we use in this lesson individually can be a bit cumbersome, and therefore we recommend the all-in-one installer anaconda.
Regardless of how you choose to install it, please make sure you get a recent version of Python 3. If you are not using the latest version, consult Status of Python versions to ensure your version is still getting security updates.
Install Anaconda or Miniconda & Required Packages
Anaconda installation
Anaconda will install the workshop packages for you. Download and install Anaconda. Remember to download and install the installer for Python 3.x.
Miniconda installation
Skip this step if you used the Anaconda installer recommended above. Miniconda is a “light” version of Anaconda. If you install and use Miniconda you will also need to install the packages for the workshop listed below for informational purposed. The actual commands to install them are further down.
Download and install Miniconda
Download and install Miniconda following the instructions. Remember to download and run the installer for the Python 3 version.
Launch a JupyterLab
After installing either Anaconda or Miniconda and the workshop packages, launch a JupyterLab by typing this command from the terminal:
jupyter lab
The notebook should open automatically in your browser. If it does not or you wish to use a different browser, open this link: http://localhost:8888.
Overview of the Jupyter lab (Optional)
How the Jupyter notebook works
After typing the command jupyter lab
, the following
happens:
A JupyterLab server is automatically created on your local machine.
The JupyterLab server runs locally on your machine only and does not use an internet connection.
The JupyterLab server opens the JupyterLab client, also known as the notebook user interface, in your default web browser.
The JupyterLab server will be dependent on the terminal window from which it was launched. Leave it open for as long as you want to use JupyterLab, as closing the window will terminate the server and lead to errors in the browser client. Information will be logged to the terminal window as you use JupyterLab. This is expected behavior and can be ignored under normal circumstances.
When you can create a new notebook and type code into the browser, the web browser and the JupyterLab server communicate with each other.
The Jupyter Notebook server does the work and calculations, and the web browser renders the notebook.
The JupyterLab interface has several advantages:
- You can easily type, edit, and copy and paste blocks of code.
- Tab completion allows you to easily access the names of things you are using and learn more about them.
- It allows you to annotate your code with links, different sized text, bullets, etc. to make information more accessible to you and your collaborators.
- It allows you to display figures next to the code that produces them to tell a complete story of the analysis.
How the notebook is stored
- The notebook file is stored in a format called JSON and has the
suffix
.ipynb
. - Just like HTML for a webpage, what’s saved in a notebook file looks different from what you see in your browser.
- But this format allows Jupyter to mix software (in several languages) with documentation and graphics, all in one file.
Notebook modes: Control and Edit
The notebook has two modes of operation: Control and Edit. Control mode lets you edit notebook level features; while, Edit mode lets you change the contents of a notebook cell. Remember a notebook is made up of a number of cells which can contain code, markdown, html, visualizations, and more.