Main page • Blog •Reviews

13.12.2023

•

3615

Google's Gemini: a New Type of Artificial Intelligence

Author at ApiX-Drive

Reading time: ~10 min

While experts are conducting heated discussions about experiments with artificial intelligence, the acceleration of its development, and the level of threat, Google decided not to waste time. In December 2023, it announced the launch of a new AI model, Gemini. By the way, the first mentions of it were made back in May, at I/O 2023.

Content:

1. Gemini: a Revolutionary Innovation in AI

2. What is Unique About the New AI?

3. Training Models

4. Gemini Applications

5. Google Gemini Versions

6. Gemini VS GPT-4

7. Integrations

8. Problems and Disadvantages

9. Let's sum it up

***

Last year, Google was behind its main competitor, OpenAI, in the AI field. But now it has a good chance to prove that its product is not only the best in its category but also has the potential to fundamentally change the way we interact with AI. The developers plan to cover almost the entire business of the search giant with the capabilities of the new product. They claim that Gemini has surpassed the popular GPT-4 model from OpenAI and even human experts in numerous intelligence tests.

Gemini: a Revolutionary Innovation in AI

The Gemini AI model is an innovative product that has the unique ability to process information of different types: text, video, audio and program code. At the same time, it copes with audio and video just as well as with text.

Basic skills:

draws conclusions based on the studied data, translates texts, and conducts dialogue;
solves tasks using mathematical thinking;
generates program code and creates documentation;
recognizes and understands images, video and audio.

This artificial intelligence produces more complex thoughts, answers tough questions, and understands much more nuanced information than its predecessor, Bard. By multitasking, it can extract the most valuable and important data from hundreds of thousands of documents. In addition, Gemini 1.0 is equipped with an updated AlphaCode 2 tool, thanks to which the model understands, explains and generates high-quality program code in the most common languages: Java, C++, Python and Go. It demonstrates excellent results in solving programming problems that go beyond simple coding and include elements of theoretical computer science and higher mathematics. All this gives Google good reason to believe that their model will help make breakthroughs in various fields, from science to economics and finance.

According to company representatives, Gemini was initially trained to work with different information formats. As an example of how the new product works, they presented a video where the Bard chatbot, based on Gemini, helps students complete their physics homework. As input, the student uploads photos of questions written on a piece of paper. After studying them, the AI gives step-by-step answers with equations. One of the competitive advantages of Gemini artificial intelligence is its high adaptability to any device. It can be used almost anywhere, from a simple smartphone to large data centers.

What is Unique About the New AI?

Google representatives claim that Gemini is an innovative AI model, the potential of which, as we have already mentioned, will allow it to outstrip GPT-4 by OpenAI and living experts. The entire range of its capabilities is based on two main features: multimodality and humanity.

Creating multimodal AI that is truly effective and engaging for users can only be achieved by merging different artificial intelligence models. The language model, computer vision, graph and sound processing, and programming — all this needs to be integrated and properly coordinated to achieve complete synergy. This is a very difficult and monumental task, and Google managed to solve it by creating Gemini. Moreover, the corporation is going to go even further and take this concept to an unprecedented level.

We've sorted out multimodality; now let's talk about humanity. The reason for the stunning success of almost any generative AI is the machine's ability to imitate what a human does. What exactly are we talking about? People do not fragment their activities into tasks that are independent of each other: communication, coding, report writing, and graphic creativity. They can do all of this at the same time. For example, in the process of creating a drawing, you call a colleague and clarify some details of the image, after which you write to your manager via messenger and send a report on the work done for the month. The human brain is capable of simultaneously perceiving, interpreting and understanding data in different formats: text, speech, sounds and images. Thanks to this, we are aware of the surrounding environment, respond to stimuli and incentives, and also find innovative and non-standard ways to solve problems. Google's Gemini has the same ability, bringing it one step closer to humans.

Training Models

To train Gemini, Google deployed record-breaking computing power using the most advanced TPU v5 training chips. Its TPU v5p tensor processor system is the only technology in the world that can run 16,384 chips simultaneously. This ultra-powerful AI accelerator is designed for data centers where large-scale generative models are trained and run. It gave Google the opportunity to endow such a massive product as Gemini with maximum knowledge and skills.

The basis for training any AI model is not only the power of chips and their numbers but also data. Nothing will work without them. But in this area, Google has practically no equal. According to consulting firm SemiAnalysis, the corporation's code-only data collection is valued at approximately 40 trillion tokens. This amount is equivalent to hundreds of petabytes (for clarity, you can imagine the text of millions of books). One such Google set is 4 times the volume of all the data that was used to train ChatGPT-4.

The Chief Executive Officer of Alphabet Inc. and Google, Sundar Pichai and the CEO of Google subsidiary DeepMind, Demis Hassabis consider the emergence of Gemini a huge leap in the development of AI, which will affect almost all the corporation's products.

Gemini Applications

Artificial intelligence today is being actively implemented in many areas: industry, technology, education, science, business. Gemini will find application in the following areas:

Computer vision (object and anomaly detection, processing and understanding of 3D scenes).
Geospatial data science (24/7 monitoring, combining information from multiple sources, analyzing and structuring it).
Health care (preventive medicine, personalization of the healthcare system, biosensors).
Computer-integrated and intelligent technologies (LLM, data synthesis, transfer of subject knowledge to systems, expanding the range of data-based decision-making capabilities).

Google Gemini Versions

We have already noted that Gemini is a flexible model that has the ability to work on any device, from a huge data center to a regular smartphone. To achieve this scalability, Google released it in 3 versions, differing in size and functionality:

Nano;
Pro;
Ultra.

Nano

Gemini Nano is the smallest model. It is best suited for solving tasks that require AI assistance directly on the device without connecting to an external server. Examples of such tasks: summarizing a text and suggesting an answer in a chat application. In addition to convenience, this AI model ensures that users maintain the confidentiality of their data.

Connect applications without developers in 5 minutes!

Facebook and ActiveCampaign Integration: Automatic Contacts Download

Webflow and Notion Integration: Automatic Data Transfer

Nano is designed for smartphones and comes in two versions. One has 1.8 billion parameters and is intended for slower devices. The second has 3.25 billion parameters, so it can be used on more powerful phones.

Pro

Gemini Pro is a medium-sized all-rounder model (100 billion parameters) that can cope with a wide range of tasks. It understands complex queries and provides quick answers. Its main purpose is the core of the latest version of the Bard chatbot. In addition, it is already used in Google's corporate data centers. Representatives of the corporation claim that it has surpassed a number of other generative AI models, including the well-known GPT-3.5 from OpenAI.

Developers and enterprise users can access Gemini Pro via API through Google AI Studio and Google Cloud Vertex AI services.

Ultra

Gemini Ultra is the largest and most powerful model, designed to solve extremely complex problems. The number of its parameters exceeds 1 trillion. Ultra currently exceeds the capabilities of all existing artificial intelligence models in the world. It was the first to beat a human in the standard MMLU test, scoring 90%. You can find out more about this in the next section.

Only select security experts, testers and key business partners of the corporation currently have access to Ultra. Google is set to open it up to all of its developers and enterprise users in early 2024. At this time, the launch of the Bard Advanced AI-assistant is also planned, which will gain all the capabilities of this version of Gemini.

Gemini VS GPT-4

Tests conducted by Google showed that Gemini was better than OpenAI product. The corporation shared two tables comparing its own development with the GPT-4 model. According to the data presented in them, Gemini is the leader in the absolute majority of indicators. For example, in the MMLU tests, 90% of its answers were correct. ChatGPT's score is 86.4 percent. Interestingly, it managed to beat even a person with an expert level, who usually scores 89.8% in these tests.

Gemini managed to beat humans in the MMLU test

For reference. MMLU (Massive Multitask Language Understanding) is a standard test that measures the abilities of artificial intelligence. It consists of a set of tasks across 57 topic clusters, which include mathematics, physics, geography, history, law, economics, medicine, ethics, as well as complex questions on logical fallacies, moral problems in everyday life, and so on.

In 30 of 32 tests conducted as part of the LLM Gemini study, it beat GPT-4. Based on the results of three tests on the ability to comprehend information and draw correct conclusions, this model won a landslide victory in two of them. It also came first in both the coding and math tests.

When working with images, video and audio, Gemini again showed itself to be better than GPT-4, beating its competitor in absolutely all tests.

Gemini is better than GPT-4 at working with images, video and audio

Integrations

Google developed Gemini not only to modernize its Bard chatbot. The corporation emphasized that the new product will be integrated into all of its most important products, in particular the search engine, the Chrome browser, Google Ads, and the Duet AI assistant. There is no information yet about when exactly this will happen. Google limited itself to the vague wording “in the coming months.”

Bard

Gemini Pro has already been implemented in the Bard chatbot. The developers are convinced that such a core will take it to the next level and hope that it will allow it to bypass ChatGPT. Before this integration, Bard performed poorly compared to the OpenAI product.

Despite the multilingual nature of the current version of the Bard chatbot, the Gemini model in its composition is exclusively English-language. There are plans to support other languages in the future.

Those who want to use the most powerful version of Gemini Ultra will have to pay. The paid version will be called Bard Advanced and will appear in early 2024, but its cost is still unknown. By the way, OpenAI was the first to use this approach, offering ChatGPT-3.5 for free and a subscription to ChatGPT-4 for $20.

Pixel Smartphones

Pixel smartphones received built-in support for the Gemini Nano model along with the December update of the Pixel 8 Pro. It is true that its capabilities are still limited. It currently controls the Summarize feature in the Android Recorder app. In addition, this AI can take over the Android Smart Reply function, but only if you use the Google keyboard and exclusively in the WhatsApp messenger. In 2024, Gemini will be implemented in other instant messengers as well as in other parts of the system on Pixel devices.

Problems and Disadvantages

The Gemini model of artificial intelligence truly represents a major leap in the development of its capabilities. However, it is not without its drawbacks, which are common to any LLM. Among the main disadvantages are:

the risk of generating false information;
access to low-quality educational materials;
some limited understanding of the real world.

Google does not deny that their revolutionary new product can make mistakes and even present as facts information that contradicts common sense, that is, “hallucinate.” Representatives of the corporation believe that it needs additional testing, especially the Ultra version, which has not yet been fully explored. Currently, developers are very meticulously studying and evaluating the work of Gemini to minimize the risk of harm to the user.

Let's sum it up

If 2023 is considered the date when AI gained widespread popularity and went into mass use, then 2024 could well be Google Gemini's high point. This AI model will be used to write program code, improve and automate operations (both cloud and peripheral), increase sales, and integrate chatbots and AI assistants in applications, smartphones and more.

Gemini's superior performance compared to other artificial intelligence models and humans allows us to make a very optimistic, even borderline fantastic, forecast about the capabilities of AI in the future. And yet we should not forget about the need to conduct additional research in order to finally overcome the shortcomings. As for Gemini specifically, this model is expected to bring more useful and intelligent features to almost all Google products in the future.

***

Apix-Drive is a simple and efficient system connector that will help you automate routine tasks and optimize business processes. You can save time and money, direct these resources to more important purposes. Test ApiX-Drive and make sure that this tool will relieve your employees and after 5 minutes of settings your business will start working faster.