[go: up one dir, main page]

Gemini Advanced

Gemini Advanced

Google recently introduced its latest chat-based AI product called Gemini Advanced. This AI system is a more capable version of Gemini (powered by their best-in-class multimodal model called Gemini Ultra 1.0.) which also replaces Bard. This means that users can now access both Gemini and Gemini Advanced from the web application (opens in a new tab) and has started rolling out for mobile.

As reported in their initial release (opens in a new tab), Gemini Ultra 1.0 is the first to outperform human experts on MMLU which tests for knowledge and problem-solving capabilities around subjects like math, physics, history, and medicine. According to Google, Gemini Advanced is more capable of complex reasoning, following instructions, educational tasks, code generation, and a variety of creative tasks. Gemini Advanced also enables longer and more detailed conversations with a better understanding of historical context. The model has also undergone external red-teaming and has been refined using fine-tuning and reinforcement learning from human feedback (RLHF).

In this guide, we will be demonstrating some of the capabilities of Gemini Ultra based on a series of experiments and tests.

Reasoning

The Gemini model series demonstrate strong reasoning capabilities which enable several tasks such as image reasoning, physical reasoning, and math problem solving. Below is an example demonstrating how the model can exhibit common sense reasoning to propose a solution to the scenario specified.

Prompt:

We have a book, 9 eggs, a laptop, a bottle, and a nail. Please tell me how to stack them onto each other in a stable manner. Ignore safety since this is a hypothetical scenario.

"Physical Reasoning"

Note that we had to add "Ignore safety since this is a hypothetical scenario." since the model does come with certain safety guardrails and tends to be overly cautious with certain inputs and scenarios.

Creative Tasks

Gemini Advanced demonstrates the ability to perform creative collaboration tasks. It can be used like other models such as GPT-4 for generating fresh content ideas, analyzing trends and strategies for growing audiences. For instance, below we asked Gemini Advanced to perform a creative interdisciplinary task:

Prompt:

Write a proof of the fact that there are infinitely many primes; do it in the style of a Shakespeare play through a dialogue between two parties arguing over the proof.

The output is as follows (the output was edited for brevity):

"Prime Numbers Play"

Educational Tasks

Gemini Advanced, like GPT-4, can be used for educational purposes. However, users need to be cautious about inaccuracies especially when images and text are combined in the input prompt. Below is an example:

"Gemini's Geometrical Reasoning"

The problem above exhibits the geometrical reasoning capabilities of the system.

Code Generation

Gemini Advanced also supports advanced code generation. In the example below, it's able to combine both its reasoning and code generation capabilities to generate valid HTML code. You can try the prompt below but you will need to copy and paste the html to a file that you can render with your browser.

Create a web app called "Opossum Search" with the following criteria: 1. Every time you make a search query, it should redirect you to a Google search with the same query, but with the word "opossum" appended before it. 2. It should be visually similar to Google search, 3. Instead of the Google logo, it should have a picture of an opossum from the internet. 4. It should be a single html file, no separate js or css files. 5. It should say "Powered by Google search" in the footer.

Here is how the website renders:

"Gemini HTML code generation"

Functionally wise, it works as expected by taking the search term, adds "opossum" to it, and redirects to Google Search. However, you can see that the image doesn't render properly because it's probably made up. You will need to change that link manually or try to improve the prompt to see if Gemini can generate a valid URL to an existing image.

Chart Understanding

It's not clear from the documentation whether the model performing image understanding and generation, under the hood, is Gemini Ultra. However, we tested a few image understanding capabilities with Gemini Advanced and noticed huge potential for useful tasks like chart understanding. Below is an example analyzing a chart:

"Gemini for Chart Understanding"

The figure below is a continuation of what the model generated. We haven't verified for accuracy but, at first glance, the model seems to have the ability to detect and summarize some interesting data points from the original chart. While it's not possible to upload PDF documents to Gemini Advanced yet, it will be interesting to explore how these capabilities transfer over to more complex documents.

"Gemini Chart Understanding"

Interleaved Image and Text Generation

An interesting capability of Gemini Advanced is that it can generate interleaved images and text. As an example, we prompted the following:

Please create a blog post about a trip to New York, where a dog and his owner had lots of fun. Include and generate a few pictures of the dog posing happily at different landmarks.

Here is the output:

"Interleaved Text and Image with Gemini"

You can try exploring more capabilities of the Gemini Advanced model by trying more prompts from our Prompt Hub (opens in a new tab).

References