Downloading Emails from Gmail with Python

Some weeks ago I wanted to analyze my spending habits using the data of one of the supermarket chains I shop at frequently. Each purchase at this particular chain sends a receipt to my email address. By a happy accident, I’ve been archiving these receipts for years. Therefore, I could download them from my Gmail account and run some code to analyze them.

There was a small problem: I had collected more than 300 receipts over the years. As much as I like splicing random data, one thing I knew for sure was that I did not want to download receipts manually. Fortunately, Gmail has an API that allows you to read emails programmatically. Instead of downloading the receipts manually, I could take twice the amount of time and write a script to do it for me. Perfect.

Accessing said API was not entirely straightforward. Given that I’m pretty sure I’ll think of other uses for this API in the near future, I wanted to document the process in detail. After all, I tend to forget things the moment I turn away from them, thus some guidance the next time I get around it would be nice. Hopefully, it will turn out useful for others, too.

In short, the process of downloading email messages consists of a few steps:

  • Setting up Google Cloud project
  • Setting up OAuth and getting credentials for the project
  • Writing some code

Naturally, the first two steps are the most complicated.

Setting Up the Project

Right off the bat, I recommend going to Google’s official guide. It should be considered the source of truth. This post will eventually get outdated, whereas Google (presumably) will keep their guides up to date.

On the other hand, you’ll find a step-by-step guide here, which can be quicker than wading through reams of documentation.

That being said, let’s begin!

Setting Up Google Cloud Project

  1. The first step you’ll likely need to do is to create a Google Cloud Project. There are a couple of ways to do it:
    • Navigate to Google Cloud Console > Click on a burger menu, hover over IAM & Admin, and select Create a Project (near the bottom of a list).
    • Use a direct create a project link (the same one that the guide has on a button Go to Create a Project button). This is my preferred option since finding anything on Google Cloud Console is a Herculean task.
  2. Enter the project name (e.g., gmail-example) on the next page. Location is not mandatory.
  3. Click Create and wait for everything to initialize.

If all goes well, you’ll see a bunch of lists, graphs, tables, and whatnot on your Dashboard. Congratulations! The Google Cloud Project is set up.

Enabling Gmail API

Now that we have a project to play with, we need to enable Gmail API for it. If you like feeling your brain slowly oozing through your eyes, you can try to find it in a list of APIs on the API dashboard. For the rest of us who are too weak of spirit for such endeavors:

  1. Click APIs & Services > Library
  2. Search for gmail
  3. Click on it
  4. Click Enable and wait for a few seconds

Great success, we’re ready for the meat of the configuration.

Setting Up Credentials

Now that the API has been enabled for the project, you’ll be urged to create credentials.

  1. If you get a notification that you need to create credentials, click Create credentials button. If you did not, you can navigate to Credentials on the menu on the left side, and click Create credentials at the top of the page.
  2. We’ll be accessing user emails, so choose User data
  3. In scopes, click Add or remove scopes
  4. Go to the page that has Gmail scopes
  5. Select View your email messages and settings and click Update.
    • You can add more scopes. However, it’s a good idea to keep the permissions as restrictive as possible. This helps to prevent disasters, such as deleting emails when you thought you were just reading them. Or allowing people using your computer/malware to do so.
  6. The next step will ask to setup OAuth Client ID
    1. Choose Desktop app
    2. Enter the client’s name, for example, python-client. Note that this is the name used to identify the client in the console, users would not see it)
    3. Click Create
  7. If all goes well, the credentials should be generated and you’ll be presented with an option to download them. Do so by clicking Download. Credentials will be downloaded as a JSON file.

Now that you have the credentials generated, you can find them under the Credentials option on the menu. If you ever lose the file, you can download them again from there.

The final step is to set up OAuth consent screen. This is what the users will see when prompted to give access to their Gmail. For local projects it’s probably not as relevant since you’ll be the only one using it, but set it we must nevertheless.

To do so:

  1. Select OAuth consent screen on the left side menu
  2. Depending on when you read this, you might get prompted to try the new experience. Or you might get it immediately. In the former case, click Go to new experience - no reason to learn the old UI.
  3. Click on Get Started
  4. Enter app information:
    1. App-name: {your app's name}
    2. User support email: {your email}
  5. In Audience step, select External audience. I believe Internal would also work.
  6. In Contact Information screen, enter your contact details
  7. In Finish screen, agree to the usage policy

Once the creation process finishes, you can inspect the related information. One that we’re most interested in now is Audience, and more specifically, Test Users section at the bottom of the screen. We want to add a test user - this will be your Gmail account that you want to get email messages from.

With all that done, what remains is a small matter of programming the email download logic.

Invoking Gmail API

The code is based on Python 3.13, although it should run even on much older Python versions. Before starting, I recommend having a look at Python quickstart from Google itself, which has a basic setup outlined. A summary of how to make the first contact with Gmail is provided below.

First of all, we need to pull some libraries from Google:

pip install -U google-api-python-client google-auth-httplib2 google-auth-oauthlib

Then, create an auth.py file. Copy paste the code from Google’s example:

# Taken verbatim from https://developers.google.com/gmail/api/quickstart/python
# Accessed on 2024-12-22

import os.path

from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# If modifying these scopes, delete the file token.json.
SCOPES = ["https://www.googleapis.com/auth/gmail.readonly"]

def main():
"""Shows basic usage of the Gmail API.
Lists the user's Gmail labels.
"""
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists("token.json"):
creds = Credentials.from_authorized_user_file("token.json", SCOPES)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
"credentials.json", SCOPES
)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open("token.json", "w") as token:
token.write(creds.to_json())

try:
# Call the Gmail API
service = build("gmail", "v1", credentials=creds)
results = service.users().labels().list(userId="me").execute()
labels = results.get("labels", [])

if not labels:
print("No labels found.")
return
print("Labels:")
for label in labels:
print(label["name"])

except HttpError as error:
# TODO(developer) - Handle errors from gmail API.
print(f"An error occurred: {error}")

if __name__ == "__main__":
main()

Inspecting the code, you’ll see that it expects either a token.json or a credentials.json file. Remember the credentials file you’ve downloaded a few minutes ago? This will be the credentials.json file. Find the downloaded file, rename it and move it to the same directory as auth.py. If you’re wondering about token.json, it will be created automatically once we run the code and allow our code to access our Gmail account.

Talking about running the code, run the code:

python auth.py

This should open a browser window. Choose the account with the email address which you’ve added to Test users when setting up OAuth screen. Google will warn you that the app has not yet been verified. This is expected, as we indeed have not verified the app. We’re just using it for our own needs, so there’s no need to do so.

Click Continue. On the next page, permissions requested by the app will be detailed, urging you to either cancel or continue. Click Continue once more. You should see a message The authentication flow has completed. You may close this window. Feel free to close the window.

Note that you’ll see token.json created in the same folder that the auth.py resides. It will be used until the token within expires. Until it does, you won’t need to re-authenticate with Google. Obviously, this file should be kept secret, along with credentials.json.

Aside from the generated token file, you should see Gmail labels printed out in the console. If that is so - congratulations are in order. You’ve just accessed your Gmail inbox!

Reading Emails

Now that we know the code works, we can improve it to read emails, not labels. To do so, we’ll first extract authentication code into a separate module. Then, we’ll code a basic Gmail client, and use it to read some emails.

Let’s start with authentication. First, let’s create an auth package. Then, let’s move token.json and credentials.json to it. Finally, create an auth.py file and paste the following code there:

import os.path

from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from pathlib import Path

# If modifying these scopes, delete the file token.json.
SCOPES = ["https://www.googleapis.com/auth/gmail.readonly"]

FILE_PATH = os.path.dirname(os.path.realpath(__file__))
TOKEN_PATH = str(Path(FILE_PATH, "token.json"))
CREDENTIALS_PATH = str(Path(FILE_PATH, "credentials.json"))

def authenticate():
print("\n=============== Authenticating for Gmail: start ===============")
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists(TOKEN_PATH):
creds = Credentials.from_authorized_user_file(TOKEN_PATH, SCOPES)

# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_PATH, SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open(TOKEN_PATH, "w") as token:
token.write(creds.to_json())

print("\n=============== Authenticating for Gmail: end ===============")
return creds

This will let us keep the authentication separate from the Gmail client. This way, if authentication is broken, we’ll know where to look for issues.

With this done, we can start implementing Gmail client. For this demonstration, we’ll assume we want to find emails sent by a specific sender (for more options, see Gmail’s API).

Create a file client.py in the top level directory. Paste the following code there:

from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

from auth.auth import authenticate

class GmailApi:
def __init__(self):
# alternatively, we could pass credentials to constructor
# to decouple the code.
creds = authenticate()
self.service = build("gmail", "v1", credentials=creds)

def find_emails(self, sender: str):
print("\n=============== Find Emails: start ===============")
request = (
self.service.users()
.messages()
.list(userId="me", q=f"from:{sender}", maxResults=200)
)

result = self._execute_request(request)
try:
messages = result["messages"]
print(f"Retrieved messages matching the '{sender}' query: {messages}")
except KeyError:
print(f"No messages found for the sender '{sender}'")
messages = []

print("=============== Find Emails: end ===============")

return messages

@staticmethod
def _execute_request(request):
try:
return request.execute()
except HttpError as e:
print(f"An error occurred: {e}")
raise RuntimeError(e)

Then, create a main.py file:

from client import GmailApi

def main():
client = GmailApi()

emails = client.find_emails(sender="your sender here: either email or sender name")

if __name__ == "__main__":
main()

Now you can run it with some sender name/email that you have in your inbox:

> python main.py

In your console you should see something like

Retrieved messages matching the '{sender}' query: {'messages': [{'id': '{messageId}', 'threadId': '{threadId}'}], 'resultSizeEstimate': {matching count}}

Great success! But why do we see only some obscure ids? Well, that’s because we’re listing messages. To actually get their content, we’ll need to call a different API with a desired message id. This means we need to adjust our Gmail client a bit.

Replace the client.py with the following code:

import base64

from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

from auth.auth import authenticate

class GmailApi:
def __init__(self):
creds = authenticate()
self.service = build("gmail", "v1", credentials=creds)

def find_emails(self, sender):
print("\n=============== Find Emails: start ===============")
request = (
self.service.users()
.messages()
.list(userId="me", q=f"from:{sender}", maxResults=200)
)

result = self._execute_request(request)

print(f"Retrieved messages matching the '{sender}' query: {result}")
print("=============== Find Emails: end ===============")

return result["messages"]

# >>>>> NEW BIT: START <<<<<
def get_email(self, email_id: str):
print("\n=============== Get Email: start ===============")

request = self.service.users().messages().get(userId="me", id=email_id)
result = self._execute_request(request)
content = result["payload"]["parts"][0]["body"]["data"]
content = content.replace("-", "+").replace("_", "/")
decoded = base64.b64decode(content).decode("utf-8")

print(f"Retrieved email with email_id={email_id}: {result}")
print("=============== Get Email: end ===============")

return decoded
# >>>>> NEW BIT: END <<<<<

@staticmethod
def _execute_request(request):
try:
return request.execute()
except HttpError as e:
print(f"An error occurred: {e}")
raise RuntimeError(e)

Note two things:

  • First, when retrieving an email message, you get a pretty complicated object. We need to reach quite far down into it to retrieve the email message. You may want to explore the API and data structure to understand what’s happening there.
  • Second, the content that we get is encoded in Base64. However, it’s not encoded in a way that Python’s base64 can decode right off the bat. It appears that Python uses a slightly different Base64 standard for the email decoding than whatever Google uses to encode it. The fix is simple - replacing - with + and _ with / so that Python can understand it.

Now that we have that out of the way, adjust the main.py:

from client import GmailApi

def main():
client = GmailApi()

sender = "your sender here: either email or sender name"
emails = client.find_emails(sender)
email_ids = [email["id"] for email in emails]
contents = [client.get_email(email_id) for email_id in email_ids]

print(f"Content of the emails matching sender '{sender}':")
for content in contents:
print(content)

if __name__ == "__main__":
main()

And there you have it! You should now see the content of your email printed to the console. Note that for more complicated emails you may need to use something like beautifulsoup to parse the HTML and/or extract data from it.

Source Code

One GitHub repo is worth a thousand words. With that in mind, you can find fully working code on GitHub. Note that you’ll need to provide your on credentials.json file.

Sources