Full Gmail API Guide: How to retrieve email metadata to use it in your app

Nathan Ganser
Nat - Personal CRM
Published in
5 min readJan 12, 2021

--

some code you’ll see in this article

This is an article I would have loved to find when we started to work on our Personal CRM business: Nat.app. One of our key differentiating features is our tight integration with Gmail.

But instead of asking full access to our user’s inboxes, we’ve decided to only access they’re Gmail metadata. While this is a great decision from a privacy perspective, the usage of Gmail’s metadata scope is very restricted and poorly documented, which is why I’m writing this article.

I’ll cover the following points:

  • How to set up Google OAuth to access the refresh and access tokens.
  • How to use those tokens to set up a Gmail service for the metadata scope
  • How to access user data and do a full inbox sync
  • How to keep data up to date with ongoing Gmail syncs

I’ll be using python but you can of course do this with any programming language. This article will be super useful if:

  • You need to access user’s email data on an ongoing basis (for a CRM or contact management app)

I’ll share snippets of code throughout this post but you can always find the whole code here.

Keep in mind that Gmail metadata does not let you use the Gmail API’s search feature! Which means that the only way to use it is to import all the data before searching it.

How to get user’s Gmail credentials

Here is a very basic implementation of an HTML page with a button that the user can click to give your application access to his Gmail metadata

<!DOCTYPE html>
<html lang="en">
<head>
<title>Gmail</title>
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@400;500&display=swap" rel="stylesheet">
<link href="style.css" rel="stylesheet" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!-- BEGIN Pre-requisites -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">
</script>
<script src="https://apis.google.com/js/client:platform.js?onload=start" async defer>
</script>
<!-- END Pre-requisites -->
<!-- Continuing the <head> section -->
<script>
function start() {
gapi.load('auth2', function () {
auth2 = gapi.auth2.init({
client_id: 'YOUR_GOOGLE_CLIENT_ID',
// Scopes to request in addition to 'profile' and 'email'
scope: 'https://www.googleapis.com/auth/gmail.metadata'
});
});
}
</script>
<!-- Last part of BODY element in file index.html -->
<script>
function signInCallback(authResult) {
if (authResult['code']) {
console.log(authResult['code'])
var code = authResult['code'];
// Hide the sign-in button now that the user is authorized, for example:
$('#signinButton').attr('style', 'display: none');

// Send the code to the server
$.ajax({
type: 'POST',

url: 'YOUR_API',
// Always include an `X-Requested-With` header in every AJAX request,
// to protect against CSRF attacks.
contentType: 'application/json',
success: function (result) {

location.href = 'URL_TO_REDIRECT_USERS';
},
headers: {
'Authorization': 'Basic SECRET'
},
processData: false,
dataType: 'json',
data: JSON.stringify({ "code": code })
});
} else {
// There was an error.
}
}
</script>
</head>


<body>
<!-- Add where you want your sign-in button to render -->
<!-- Use an image that follows the branding guidelines in a real app -->

<a href="#" id="signinButton"><img src="images/button.svg" /></a>
<script>
$('#signinButton').click(function (e) {
if (e) e.preventDefault();

// signInCallback defined in step 6.
auth2.grantOfflineAccess().then(signInCallback);
});
</script>
</body>
</html>

The key thing to notice is that Google does not give you the refresh and access tokens straight away but rather a code that you can use to access them, so let’s do that now!

But before we do, you’ll notice that you’re still missing something in order to implement the above index.html page: the client_id! So let’s do that quickly.

Create your Google project

You’ll need to create a google project in GCP. Once you’ve created a project, visiting this page should let you create new credentials that you will need.

When asked what Application type you’ll be creating, select Web Application.

And as Authorized JavaScript origins, type in the url were you’ll host the index.html page. Same for the Authorized redirect URIs.

You can now add your client id to the html page. Besides the client id, you will also be able to download a json file called client_secret_... You will need this as well. Just add it to your repository.

Retrieve your user’s refresh & access token

Here is a simple python implementation.

from oauth2client import clientdef create_gmail_data(auth_code):
print('getting the credentials files')

CLIENT_SECRET_FILE = "your_client_secret.json"

# Exchange auth code for access token & refresh token

credentials = client.credentials_from_clientsecrets_and_code(
CLIENT_SECRET_FILE,
['https://www.googleapis.com/auth/gmail.metadata', 'profile',
'email'],
auth_code)

# Get profile info from token
print('got the data from Google')
refresh_token = credentials.refresh_token
access_token = credentials.access_token
email = credentials.id_token['email']
name = credentials.id_token['name']

Note that you will need the oauth2client library that you can add to your requirements.txt file. You will also need the google & gmail python client so why not add them as well right now. Here they are:

  • google-api-python-client
  • oauth2client
  • google-auth-httplib2

You now have the user’s name, email, refresh & access token! You’re ready to access user data.

Perform a full Gmail metadata sync

Create a service

First, you need to create a service using the credentials. Here is a basic implementation in python:

from oauth2client.client import Credentials
from googleapiclient.discovery import build
import json


def create_service():
# metadata scope
SCOPES = ['https://www.googleapis.com/auth/gmail.metadata']

# add your own data to the file
json_credential_dict = {"token_expiry": None, "user_agent": None, "invalid": False,
"client_id": "YOUR_CLIENT_ID",
"token_uri": "https://oauth2.googleapis.com/token",
"client_secret": "YOUR_CLIENT_SECRET", "_module": "oauth2client.client",
"_class": "OAuth2Credentials", "scopes": SCOPES,
'refresh_token': "REFRESH_TOKEM",
'access_token': "ACCESS_TOKEN"}

cred = Credentials.new_from_json(json.dumps(json_credential_dict))

service = build("gmail", "v1", credentials=cred, cache_discovery=False)

# try it!
message = service.users().messages().list(userId="me").execute()
mes = message.get('messages')[0]['id']
if mes:
print(f'Connection with API established successfully!')
return service

You can now access a user’s data!

Full Gmail Sync

Here is a basic implementation of a full gmail sync using some custom built functions that you can find in the publicly accessible repo.

def initial_gmail_sync(service):
pageToken = None
messages_left = True

# Get messages
while messages_left:
messages = service.users().messages().list(userId="me", pageToken=pageToken).execute()
pageToken = messages.get('nextPageToken')
# do something with the messages! Importing them to your database for example
for message in messages:
email_id = message['id']
dirty_message = service.users().messages().get(userId="me", id=email_id, format='metadata').execute()
clean_message = extract_data(dirty_message)
# and now you can do what you want
# clean_message contains 'from', 'subject', 'to', 'date', 'id' and 'history_id'

if not pageToken:
messages_left = False
# you've reached the end of the inbox!

The key things to note is that as long as Gmail returns a pageToken, there is more to be processed.

Also, the messages().get() function does not return the message in a json that is directly usable, so you’ll need to do some data cleaning, that’s what the extract_data() function is for.

Ongoing gmail sync

After having done a full sync, you want of course to keep data up to date, and that’s where the ongoing sync comes into play. Here is some starter code:

from get_clean_message_data import extract_datadef ongoing_gmail_sync(service):
pageToken = None
history_id = "some recent history id"

history = service.users().history().list(userId="me", historyTypes='messageAdded',
startHistoryId=history_id, pageToken=pageToken).execute()

if history:
current_history_id = history_id
last_history_id = history.get('historyId')
if last_history_id > current_history_id:
for change in history:
email_id = change['messagesAdded'][0]['message']['id']
thread_id = change['messagesAdded'][0]['message']['threadId']
import_me = 'INBOX' or 'SENT' in change.get('messagesAdded')[0].get('message').get('labelIds')
if import_me:
message = service.users().messages().get(userId="me", id=email_id, format='metadata').execute()
clean_message = extract_data(message)

You could also use Google Cloud’s Pub/Sub solution but that will be a lot of work compared to this. And depending on your needs, might not be totally crucial.

I hope this was helpful! Feel free to ask any follow-up questions in the comments!

--

--