All keyboard shortcuts listed below are for Mac. Substitute Cmd
for Ctrl
for the Windows equivalent
My prefered Python scripting environment for data analysis is Jupyter Lab, which is included with Anaconda. To initialize a Jupyter Lab session, simply open your terminal and enter the command jupyter lab
.
That should launch a new tab in your web browser. This tab is where all of my coding will take place for the day(!). The advantages of Jupyter Lab is that:
Cmd + B
and allows drag/drop for adding files(!). (Note: the root of this file browser is from the directory that you launched Jupyter Lab from)Creating a new notebook is easy, just click the +
icon in the upper left to open a Launcher
tab and then select Python 3
under Notebooks
. You can then easily name the new notebook by right clicking the new tab and selecting Rename Notebook...
.
From the Launcher
you can also create a Terminal
session under Other
. I find this extremely useful because it centralizes all of my work. Now you can work on notebooks and do any terminal operations from the same browser tab (terminal commands can over course be used directly in Jupyter cells with !<normal terminal command>
, however many operations are usually more easily executed with a terminal session).
Notebook format, in my opinion, is one the most overlooked facets of scientific computing. An organized LINEAR notebook is the main difference between reproducible and irreproducible code. Each cell should be sequentially runnable. If you're finding that you have to run cells out of order, it may be best to split the work in that notebook into two.
I always start my notebooks with the following cells:
Some formatting tips:
#
for main tasks and ##
for subtasks and so on."""Triple quoted description"""
. These will show up in red. Use #
comments for each subsequent step within a celldf.head()
.Here's how I start my notebooks
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import cycle
def randomThinger(x):
"""
Doing some cool stuff!
Inputs:
| x <int>: The number of cool things
Outputs:
| res <int>: An interpretation of those cool things
"""
return x * np.random.rand()
Thinking about random things
"""20 random things"""
# Generate random things
xs = np.random.rand(20)
labels = cycle('abc')
data = [{'x': x, 'y': randomThinger(x), 'label': next(labels)} for x in xs]
# Save
df = pd.DataFrame(data)
df.head()
x | y | label | |
---|---|---|---|
0 | 0.127869 | 0.015900 | a |
1 | 0.151444 | 0.091064 | b |
2 | 0.762696 | 0.284927 | c |
3 | 0.255079 | 0.101684 | a |
4 | 0.204099 | 0.078531 | b |
"""Visualize"""
# Plot data
sns.scatterplot(x='x', y='y', hue='label', data=df)
# Move legend
plt.legend(bbox_to_anchor=(1, 1), title='Categories')
plt.show()
THESE WILL CHANGE YOUR LIFE! At least they did for me.
Cmd + Enter
. Use Shift + Enter
to run a cell and advance to the next onedf.<Shift+Tab>
will pull up a dropdown menu of available methods (also technically an iPython shortcut)A
to create a new cell above or B
to create a new cell belowC
to copy that cell or X
to cut that cell and then V
to paste itZ
to undo the cutting of a cell