Generative AI (genAI) is the name for algorithms that can produce new content (instead of presenting content that already exists, like a search engine) in response to a written input, or prompt. The new content can be text, images, code, or other media. As with any other resources, you need to evaluate the result to check its reliability, bias, timeliness, authority and accuracy. This guide will help you to do that.
With the increase in popularity and accessibility of genAI tools, you'll need to develop something called ‘AI literacy’. This will enable you to evaluate different AI technologies so that you can use genAI safely, ethically and responsibly. Using this guide is your first step!
Remember that new content does not mean new ideas.
Check out the short introductory video below (recorded March 2024) for more information.
Large Language Models (LLMs), like OpenAI's GPT4 and its successors, have been given access to, and trained on, massive amounts of data from the Internet. They respond to your question (or 'prompt') by very quickly looking through that data, and producing a response to your prompt based on probability - how likely is it that certain words or phrases found in those data sets will answer your question? The most likely combination is made into a response according to the parameters of your prompt.
Because their responses are predictions, you can't guarantee that they are correct, accurate, relevant or up to date. Think of LLMs as chatbots that tell you what they think you want to hear, instead of providing you with facts. This is fine if you want a poem about rain in the style of a Japanese poet, but not if you want an authoritative answer to an important question.
Remember that anything you ‘feed’ into LLMs in the form of prompts, sample answers, questions, etc., can be added to their training data. By using ChatGPT and tools like it, you’re giving them more information to learn from. Never post any personal information about yourself, other people, or your work/organisation into an LLM, unless you definitely know that data security is in place.
An algorithm is a set of instructions given to a computer to solve a problem or to perform calculations that transform data into useful information.
The alignment problem refers to the discrepancy between our intended objectives for an AI system and the output it produces. A misaligned system can be advanced in performance, yet behave in a way that’s against human values. We saw an example of this in 2015 when an image-recognition algorithm used by Google Photos was found auto-tagging pictures of black people as “gorillas”.
Deep learning is a category within the machine-learning branch of AI. Deep-learning systems use advanced neural networks and can process large amounts of complex data to achieve higher accuracy.
These systems perform well on relatively complex tasks and can even exhibit human-like intelligent behaviour.
A diffusion model is an AI model that learns by adding random “noise” to a set of training data before removing it, and then assessing the differences. The objective is to learn about the underlying patterns or relationships in data that are not immediately obvious.
These models are designed to self-correct as they encounter new data and are therefore particularly useful in situations where there is uncertainty, or if the problem is very complex.
Explainable AI is an emerging, interdisciplinary field concerned with creating methods that will increase users’ trust in the processes of AI systems.
Due to the inherent complexity of certain AI models, their internal workings are often opaque, and we can’t say with certainty why they produce the outputs they do. Explainable AI aims to make these “black box” systems more transparent.
Data labelling is the process through which data points are categorised to help an AI model make sense of the data. This involves identifying data structures (such as image, text, audio or video) and adding labels (such as tags and classes) to the data.
Humans do the labelling before machine learning begins. The labelled data are split into distinct datasets for training, validation and testing.
The training set is fed to the system for learning. The validation set is used to verify whether the model is performing as expected and when parameter tuning and training can stop. The testing set is used to evaluate the finished model’s performance.
Machine learning is a branch of AI that involves training AI systems to be able to analyse data, learn patterns and make predictions without specific human instruction.
While large language models are a specific type of AI model used for language-related tasks, natural language processing is the broader AI field that focuses on machines’ ability to learn, understand and produce human language.
Parameters are the settings used to tune machine-learning models. You can think of them as the programmed weights and biases a model uses when making a prediction or performing a task.
Since parameters determine how the model will process and analyse data, they also determine how it will perform. An example of a parameter is the number of neurons in a given layer of the neural network. Increasing the number of neurons will allow the neural network to tackle more complex tasks – but the trade-off will be higher computation time and costs.
Training data are the (usually labelled) data used to teach AI systems how to make predictions. The accuracy and representativeness of training data have a major impact on a model’s effectiveness.
Reproduced from AI to Z: all the terms you need to know to keep up in the AI hype age by Samar Fatima and Kok-Leong Ong in The Conversation, used under CC BY-ND 4.0
Looking for other definitions? A more in-depth glossary can be found at AIPRM’s 'Ultimate Generative AI Glossary'.
We'd love to hear your feedback on this portal. If you think changes could be made, or more information added, please visit our feedback page.