Unlock The Gemini 1.5 Pro API (+ File API )

Google introduced Gemini 1.5 Pro in Google AI Studio, offering enhanced capabilities such as a 1 million context window and an API for programmatically interacting with the model. The new File API allows users to upload various types of content like images, videos, audio, and files for prompts, with methods for managing files and a 48-hour automatic deletion feature for storage management.

Google recently announced the availability of Gemini 1.5 Pro, which can be accessed in Google AI Studio by selecting the model. This version allows users to upload various types of content such as images, videos, audio, files, and folders, along with a 1 million context window for enhanced capabilities. While previously users were limited to manual drag-and-drop uploads, the new Gemini 1.5 Pro API enables users to interact with the model programmatically.

To begin using Gemini 1.5 via the API, users need to import the google-generativeai package and initialize the model with default settings. Text generation is similar to earlier versions, where users can generate content by passing a prompt string to the model. Responses are provided in markdown format, allowing for formatted outputs like titles and bold text.

The new File API in Gemini 1.5 Pro facilitates document uploads for use in prompts. Users can upload files and obtain metadata, which remains in memory for the model to access during processing. The API provides methods for uploading, retrieving metadata, and deleting files, with automatic deletion after 48 hours to manage storage.

Users can experiment with prompts containing images, enabling the model to generate descriptions based on the uploaded files. The API supports passing multiple images for comparisons or detailed descriptions. Additionally, users can check the token counts to understand the model’s limitations, with a maximum output of 8,000 tokens per request.

The API extends to audio files as well, allowing users to upload and process audio content with prompts for summaries and emotion detection. The model provides insights into the spoken content, emotions detected, and corresponding time codes. Users are encouraged to explore the capabilities of Gemini 1.5 Pro before potential limitations take effect in the future.