The Ultimate Databricks dbutils Cheatsheet

You're switching between browser tabs again, hunting for that one dbutils command you used last month. I've distilled the most essential, high-impact commands into a single screen - the 20% you'll use 80% of the time.

Bookmark this for future reference

If you want, leave your email so I can notify you about new learning materials for Databricks (Playbooks, Cheatsheets, Hands-on Labs and more!)

1. Filesystem Operations (dbutils.fs)

Essential File Operations

List files and directories:

# List contents of a directory
dbutils.fs.ls("/mnt/data/")
dbutils.fs.ls("s3://bucket-name/folder/")

# Recursive listing
dbutils.fs.ls("/mnt/data/", True)

File and directory management:

# Create directory
dbutils.fs.mkdirs("/mnt/data/new_folder/")

# Copy files/directories
dbutils.fs.cp("/mnt/source/file.csv", "/mnt/destination/file.csv")
dbutils.fs.cp("/mnt/source/", "/mnt/destination/", recurse=True)

# Move/rename files
dbutils.fs.mv("/mnt/old_location/file.csv", "/mnt/new_location/file.csv")

# Delete files/directories
dbutils.fs.rm("/mnt/data/unwanted_file.csv")
dbutils.fs.rm("/mnt/data/unwanted_folder/", recurse=True)

File content operations:

# Read file content (small files only)
content = dbutils.fs.head("/mnt/data/config.txt")
print(content)

# Write content to file
dbutils.fs.put("/mnt/data/output.txt", "Hello World", overwrite=True)

Storage Mounting (Legacy)

Note: Unity Catalog External Locations are now the recommended best practice over manual mounting.

Mount Azure Data Lake Storage:

configs = {
    "fs.azure.account.auth.type": "OAuth",
    "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    "fs.azure.account.oauth2.client.id": dbutils.secrets.get("key-vault", "client-id"),
    "fs.azure.account.oauth2.client.secret": dbutils.secrets.get("key-vault", "client-secret"),
    "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/{tenant-id}/oauth2/token"
}

dbutils.fs.mount(
    source="abfss://container@storage.dfs.core.windows.net/",
    mount_point="/mnt/storage",
    extra_configs=configs
)

Unmount storage:

dbutils.fs.unmount("/mnt/storage")

2. Secrets Management (dbutils.secrets)

Accessing Secrets

# List available secret scopes
dbutils.secrets.listScopes()

# List secrets in a scope
dbutils.secrets.list("my-secret-scope")

# Get secret value (will be redacted in output)
password = dbutils.secrets.get("my-secret-scope", "database-password")

3. Notebook Workflows (dbutils.notebook)

Running Other Notebooks

# Run another notebook and get its exit value
result = dbutils.notebook.run("/Shared/data-processing", 60)

# Run with parameters
result = dbutils.notebook.run(
    "/Shared/etl-pipeline", 
    timeout_seconds=1800,
    arguments={"date": "2024-01-15", "env": "prod"}
)

Notebook Exit and Results

# Exit notebook with a custom result
dbutils.notebook.exit("Processing completed successfully")

# Exit with a JSON result for programmatic use
import json
result = {"status": "success", "records_processed": 1000}
dbutils.notebook.exit(json.dumps(result))

4. Library Management (dbutils.library)

Installing Libraries

# Install PyPI package
dbutils.library.installPyPI("requests")
dbutils.library.installPyPI("pandas", version="1.5.0")

# Install and restart Python to make library available
dbutils.library.installPyPI("new-package")
dbutils.library.restartPython()

5. Widgets & Parameters (dbutils.widgets)

Creating and Using Widgets

# Create various types of input widgets
dbutils.widgets.text("environment", "dev", "Environment")
dbutils.widgets.dropdown("region", "us-east-1", ["us-east-1", "us-west-2"])
dbutils.widgets.multiselect("tables", "customers", ["customers", "orders"])

# Get widget values in your code
env = dbutils.widgets.get("environment")
region = dbutils.widgets.get("region")

# Remove widgets
dbutils.widgets.remove("environment")
dbutils.widgets.removeAll()

6. Job & Cluster Information

Getting Runtime Information

# Get context about the notebook's environment
context = dbutils.notebook.entry_point.getDbutils().notebook().getContext()

# Get current user, notebook path, etc.
user = context.userName().get()
path = context.notebookPath().get()

# Get cluster ID from Spark config
cluster_id = spark.conf.get("spark.databricks.clusterUsageTags.clusterId")

7. Quick Reference Commands

# File operations
dbutils.fs.ls("/mnt/data/")
dbutils.fs.cp("/source/", "/destination/", recurse=True)
dbutils.fs.rm("/path/to/delete/", recurse=True)

# Secrets
secret = dbutils.secrets.get("scope-name", "secret-key")

# Run notebook
result = dbutils.notebook.run("/path/to/notebook", 3600, {"param": "value"})

# Install package & restart
dbutils.library.installPyPI("package-name")
dbutils.library.restartPython()

# Widgets
dbutils.widgets.text("param_name", "default_value", "Label")
param_value = dbutils.widgets.get("param_name")
dbutils.widgets.removeAll()

8. Common Pitfalls & Best Practices

❌ Common Mistakes

Not handling exceptions with file operations, and hardcoding secrets.

# BAD: Will crash if file doesn't exist
dbutils.fs.rm("/path/that/might/not/exist")

# GOOD: Handle exceptions
try:
    dbutils.fs.rm("/path/that/might/not/exist")
except Exception as e:
    print(f"File deletion failed: {e}")

# BAD: Never hardcode secrets
password = "mypassword123"

# GOOD: Use secret management
password = dbutils.secrets.get("db-secrets", "password")

✅ Best Practices

Use descriptive widget labels, validate inputs, and use consistent naming conventions.

# Use clear, descriptive labels for widgets
dbutils.widgets.text("processing_date", "2024-01-01", "Data Processing Date (YYYY-MM-DD)")

# Validate inputs from widgets
import datetime
date_input = dbutils.widgets.get("processing_date")
try:
    datetime.datetime.strptime(date_input, "%Y-%m-%d")
except ValueError:
    raise ValueError("Date must be in YYYY-MM-DD format")
↑ Top