The Ultimate Databricks dbutils Cheatsheet
You're switching between browser tabs again, hunting for that
one dbutils command you used last month. I've
distilled the most essential, high-impact commands into a single
screen - the 20% you'll use 80% of the time.
Bookmark this for future reference
If you want, leave your email so I can notify you about new learning materials for Databricks (Playbooks, Cheatsheets, Hands-on Labs and more!)
1. Filesystem Operations (dbutils.fs)
Essential File Operations
List files and directories:
# List contents of a directory
dbutils.fs.ls("/mnt/data/")
dbutils.fs.ls("s3://bucket-name/folder/")
# Recursive listing
dbutils.fs.ls("/mnt/data/", True)
File and directory management:
# Create directory
dbutils.fs.mkdirs("/mnt/data/new_folder/")
# Copy files/directories
dbutils.fs.cp("/mnt/source/file.csv", "/mnt/destination/file.csv")
dbutils.fs.cp("/mnt/source/", "/mnt/destination/", recurse=True)
# Move/rename files
dbutils.fs.mv("/mnt/old_location/file.csv", "/mnt/new_location/file.csv")
# Delete files/directories
dbutils.fs.rm("/mnt/data/unwanted_file.csv")
dbutils.fs.rm("/mnt/data/unwanted_folder/", recurse=True)
File content operations:
# Read file content (small files only)
content = dbutils.fs.head("/mnt/data/config.txt")
print(content)
# Write content to file
dbutils.fs.put("/mnt/data/output.txt", "Hello World", overwrite=True)
Storage Mounting (Legacy)
Note: Unity Catalog External Locations are now the recommended best practice over manual mounting.
Mount Azure Data Lake Storage:
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": dbutils.secrets.get("key-vault", "client-id"),
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get("key-vault", "client-secret"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/{tenant-id}/oauth2/token"
}
dbutils.fs.mount(
source="abfss://container@storage.dfs.core.windows.net/",
mount_point="/mnt/storage",
extra_configs=configs
)
Unmount storage:
dbutils.fs.unmount("/mnt/storage")
2. Secrets Management (dbutils.secrets)
Accessing Secrets
# List available secret scopes
dbutils.secrets.listScopes()
# List secrets in a scope
dbutils.secrets.list("my-secret-scope")
# Get secret value (will be redacted in output)
password = dbutils.secrets.get("my-secret-scope", "database-password")
3. Notebook Workflows (dbutils.notebook)
Running Other Notebooks
# Run another notebook and get its exit value
result = dbutils.notebook.run("/Shared/data-processing", 60)
# Run with parameters
result = dbutils.notebook.run(
"/Shared/etl-pipeline",
timeout_seconds=1800,
arguments={"date": "2024-01-15", "env": "prod"}
)
Notebook Exit and Results
# Exit notebook with a custom result
dbutils.notebook.exit("Processing completed successfully")
# Exit with a JSON result for programmatic use
import json
result = {"status": "success", "records_processed": 1000}
dbutils.notebook.exit(json.dumps(result))
4. Library Management (dbutils.library)
Installing Libraries
# Install PyPI package
dbutils.library.installPyPI("requests")
dbutils.library.installPyPI("pandas", version="1.5.0")
# Install and restart Python to make library available
dbutils.library.installPyPI("new-package")
dbutils.library.restartPython()
5. Widgets & Parameters (dbutils.widgets)
Creating and Using Widgets
# Create various types of input widgets
dbutils.widgets.text("environment", "dev", "Environment")
dbutils.widgets.dropdown("region", "us-east-1", ["us-east-1", "us-west-2"])
dbutils.widgets.multiselect("tables", "customers", ["customers", "orders"])
# Get widget values in your code
env = dbutils.widgets.get("environment")
region = dbutils.widgets.get("region")
# Remove widgets
dbutils.widgets.remove("environment")
dbutils.widgets.removeAll()
6. Job & Cluster Information
Getting Runtime Information
# Get context about the notebook's environment
context = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
# Get current user, notebook path, etc.
user = context.userName().get()
path = context.notebookPath().get()
# Get cluster ID from Spark config
cluster_id = spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
7. Quick Reference Commands
# File operations
dbutils.fs.ls("/mnt/data/")
dbutils.fs.cp("/source/", "/destination/", recurse=True)
dbutils.fs.rm("/path/to/delete/", recurse=True)
# Secrets
secret = dbutils.secrets.get("scope-name", "secret-key")
# Run notebook
result = dbutils.notebook.run("/path/to/notebook", 3600, {"param": "value"})
# Install package & restart
dbutils.library.installPyPI("package-name")
dbutils.library.restartPython()
# Widgets
dbutils.widgets.text("param_name", "default_value", "Label")
param_value = dbutils.widgets.get("param_name")
dbutils.widgets.removeAll()
8. Common Pitfalls & Best Practices
❌ Common Mistakes
Not handling exceptions with file operations, and hardcoding secrets.
# BAD: Will crash if file doesn't exist
dbutils.fs.rm("/path/that/might/not/exist")
# GOOD: Handle exceptions
try:
dbutils.fs.rm("/path/that/might/not/exist")
except Exception as e:
print(f"File deletion failed: {e}")
# BAD: Never hardcode secrets
password = "mypassword123"
# GOOD: Use secret management
password = dbutils.secrets.get("db-secrets", "password")
✅ Best Practices
Use descriptive widget labels, validate inputs, and use consistent naming conventions.
# Use clear, descriptive labels for widgets
dbutils.widgets.text("processing_date", "2024-01-01", "Data Processing Date (YYYY-MM-DD)")
# Validate inputs from widgets
import datetime
date_input = dbutils.widgets.get("processing_date")
try:
datetime.datetime.strptime(date_input, "%Y-%m-%d")
except ValueError:
raise ValueError("Date must be in YYYY-MM-DD format")