Hands-on tutorial for managing Google Drive files with Python
Jun 3, 2020 00:00 · 831 words · 4 minute read
This is a tutorial of how to use Python to manage Google Drive files.
1. Introduction
Google Drive is awesome! Not only because it provides a easy way of uploading, managing and sharing files, but also because it’s free within certain storage limit. For some edu users, the storage is not only free but also unlimited. Have you wondered how to fully utilize this free cloud storage services from a data science perspective? Actually, it’s not that difficult. One simple question to start with is how we access and manage Google Drive files using Python (the most popular data science programming language).
2. Get Authentication for Google Service API
First, we need to get the authentication files for Google Service API, so our Python code can access to the Google Drive. To do that, we need to:
1) Create a new project in Google Developer Console by clicking “CREATE PROJECT” as following.
You can give your project a name or leave it as default.
2) Enable APIs and Services by clicking the “ENABLE APIS AND SERVICES” as indicated by the red circle in following picture.
That will bring you the the API library as below.
Search “Google Drive” in the API library (indicated by red circle in about picture). You’ll get the following snapshot.
Click the “Google Drive API” icon and it will bring you to next step as following.
Then click “ENABLE”, which will enable your Google Drive API service. You’ll get to the next step as following.
3) Create credentials by clicking the “CREATE CREDENTIALS” icon (indicated by red circle in above snapshot). Here’s what you’ll get.
In above snapshot, we need to click “client ID” as that’s the Python program needs. Then click “CREATE” and download the JSON file as shown by the following snapshots.
The downloaded JSON file is the one we need for our Python code to access to Google Drive.
3. Use PyDrive
Once we have the JSON file to access Google Drive, we can install a Python library - PyDrive using pip install pydrive
.
The following code will do authentication and list all files in your Google Drive. Note that every time you run the following program, the code will open a web browser to ask you to input your Google account and password.
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
# Rename the downloaded JSON file to client_secrets.json
# The client_secrets.json file needs to be in the same directory as the script.
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
# List files in Google Drive
fileList = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file1 in file_list:
print('title: %s, id: %s' % (file1['title'], file1['id']))
To avoid inputting password every time, we can create a settings.yaml
file to save all the credientials. The details can be find from the PyDrive official document. The yaml file is like the following.
client_config_backend: settings
client_config:
client_id: your_client_id
client_secret: your_client_secret
save_credentials: True
save_credentials_backend: file
save_credentials_file: credentials.json
get_refresh_token: True
oauth_scope:
- https://www.googleapis.com/auth/drive.file
The client_id
and client_secret
can be found by clicking the editing icon in following snapshot.
Rerun the above Python code, the program will ask you the input your Google password again. Then it will create a credientials.json
file. Next time, Python will just pick up that file to finish authentication automatically. Therefore, you don’t need to type your password again.
Now, we can upload local files to Google Drive folder, such as
# Upload files to your Google Drive
upload_file_list = ['google_console1.png', 'google_console2.png']
for upload_file in upload_file_list:
gfile = drive.CreateFile({'parents': [{'id': '1pzschX3uMbxU0lB5WZ6IlEEeAUE8MZ-t'}]})
# Read file and set it as a content of this instance.
gfile.SetContentFile(upload_file)
gfile.Upload() # Upload the file.
The above code upload my two local files google_console1.png
and google_console2.png
to my Google Drive folder test/
. To do that, the pydrive library will create two files in Google Drive and then read and upload the two files to corresponding folder. Note that we need to provide the id
of the corresponding Google Drive folder. In this example, the test
folder’s ID is 1pzschX3uMbxU0lB5WZ6IlEEeAUE8MZ-t
. You can get the Google Drive folder ID from browser. For example, when I open the test
folder in my Google Drive, the browser shows the address as https://drive.google.com/drive/folders/1pzschX3uMbxU0lB5WZ6IlEEeAUE8MZ-t
. Then the corresponding ID for the test
folder is the part after the last \
symbol, which is 1pzschX3uMbxU0lB5WZ6IlEEeAUE8MZ-t
.
Similarly, we can also write file directly to Google Drive using the following code:
file1 = drive.CreateFile({
'parents': [{'id': '1pzschX3uMbxU0lB5WZ6IlEEeAUE8MZ-t'}],
'title': 'Hello.txt'}) # Create GoogleDriveFile instance with title 'Hello.txt'.
file1.SetContentString('Hello World!') # Set content of the file from given string.
file1.Upload()
Of course, we can also read the file directly from Google Drive.
file2 = drive.CreateFile({'id': file1['id']})
file2.GetContentString('Hello.txt')
4. Summary
Today, we learned how to manage Google Drive files directly using PyDrive. Remember the major steps: * Set up Google Drive API and create credientials * Install PyDrive and set authentication * Manage Google Drive files using Python (e.g. upload and read) * More file management functionality can be found from the PyDrive official website.