Summary and Schedule
Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data.
This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the JupyterLab interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, a brief introduction to plotting, and how to work with databases directly from Python. Time permitting, the final episode encourages applying learned techniques to a real world dataset meaningful to the learner.
Getting Started
Data Carpentry’s teaching is hands-on, so participants are encouraged
to use their own computers to insure the proper setup of tools for an
efficient workflow.
These lessons assume no prior knowledge
of the skills or tools.
To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Short Introduction to Programming in Python |
What is Python? Why should I learn Python? |
Duration: 00h 30m | 2. Starting With Data |
How can I import data in Python? What is Pandas? Why should I use Pandas to work with data? |
Duration: 01h 30m | 3. Indexing, Slicing and Subsetting DataFrames in Python |
How can I access specific data within my data set? How can Python and Pandas help me to analyse my data? |
Duration: 02h 30m | 4. Data Types and Formats |
What types of data can be contained in a DataFrame? Why is the data type important? |
Duration: 03h 15m | 5. Combining DataFrames with pandas |
Can I work with data from multiple sources? How can I combine data from different data sets? |
Duration: 04h 00m | 6. Data workflows and automation |
Can I automate operations in Python? What are functions and why should I use them? |
Duration: 05h 30m | 7. Plotting |
Can I use Python to create plots? How can I customize plots generated in Python? |
Duration: 06h 15m | 8. Accessing SQLite Databases Using Python & Pandas | |
Duration: 07h 00m | 9. Putting It All Together |
What common issues might be encountered with real world data. How can plotting and other techniques help with exploratory analysis and getting to know my data? |
Duration: 07h 45m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Data
The sample data used in this lesson is from the fiction holdings of an academic library along with available information about the ethnicity and gender identity for available years.
You can download these example files as a compressed zip file. You should save it to a memorable location, and then unzip the data within. If you are not already comfortable with file paththing, we recommend you store the data folder on your desktop to closely match the examples provided. On the other hand, if you are already comfortable with file pathing, you are free to store it at your discretion.
Software
Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of the scientific packages we use in this lesson individually can be a bit cumbersome, and therefore we recommend the all-in-one installer anaconda.
Regardless of how you choose to install it, please make sure you get a recent version of Python 3. If you are not using the latest version, consult Status of Python versions to ensure your version is still getting security updates.
Install Anaconda or Miniconda & Required Packages
Anaconda installation
Anaconda will install the workshop packages for you. Download and install Anaconda. Remember to download and install the installer for Python 3.x.
Miniconda installation
Skip this step if you used the Anaconda installer recommended above. Miniconda is a “light” version of Anaconda. If you install and use Miniconda you will also need to install the packages for the workshop listed below for informational purposed. The actual commands to install them are further down.
Download and install Miniconda
Download and install Miniconda following the instructions. Remember to download and run the installer for the Python 3 version.
Launch a JupyterLab
After installing either Anaconda or Miniconda and the workshop packages, launch a JupyterLab by typing this command from the terminal:
jupyter lab
The notebook should open automatically in your browser. If it does not or you wish to use a different browser, open this link: http://localhost:8888.
Overview of the Jupyter lab (Optional)
How the Jupyter notebook works
After typing the command jupyter lab
, the following
happens:
A JupyterLab server is automatically created on your local machine.
The JupyterLab server runs locally on your machine only and does not use an internet connection.
The JupyterLab server opens the JupyterLab client, also known as the notebook user interface, in your default web browser.
The JupyterLab server will be dependent on the terminal window from which it was launched. Leave it open for as long as you want to use JupyterLab, as closing the window will terminate the server and lead to errors in the browser client. Information will be logged to the terminal window as you use JupyterLab. This is expected behavior and can be ignored under normal circumstances.
When you can create a new notebook and type code into the browser, the web browser and the JupyterLab server communicate with each other.
The Jupyter Notebook server does the work and calculations, and the web browser renders the notebook.
The JupyterLab interface has several advantages:
- You can easily type, edit, and copy and paste blocks of code.
- Tab completion allows you to easily access the names of things you are using and learn more about them.
- It allows you to annotate your code with links, different sized text, bullets, etc. to make information more accessible to you and your collaborators.
- It allows you to display figures next to the code that produces them to tell a complete story of the analysis.
How the notebook is stored
- The notebook file is stored in a format called JSON and has the
suffix
.ipynb
. - Just like HTML for a webpage, what’s saved in a notebook file looks different from what you see in your browser.
- But this format allows Jupyter to mix software (in several languages) with documentation and graphics, all in one file.
Notebook modes: Control and Edit
The notebook has two modes of operation: Control and Edit. Control mode lets you edit notebook level features; while, Edit mode lets you change the contents of a notebook cell. Remember a notebook is made up of a number of cells which can contain code, markdown, html, visualizations, and more.