Unleashing the Power of IBM Watsonx Multimodal Models with Python

14 min readOct 11, 2024

By Charan H U | October 11, 2024

Hey Innovators!

Ready to take your AI game to the next level? Say hello to IBM Watsonx Multimodal Models — your new secret weapon for handling text, images, and everything in between. Whether you’re building smart assistants, analyzing complex visuals, or just geeking out over some cool tech, Watsonx has got your back.

What’s the Buzz About Watsonx Multimodal?

Imagine having an AI that doesn’t just understand what you say but also gets what you show. That’s Watsonx Multimodal for you. It’s IBM’s cutting-edge suite of models designed to process and integrate multiple data types — text and images — seamlessly. Think of it as the ultimate multitasker, handling diverse inputs to deliver smarter, more intuitive outputs.

Why Settle for Less?

Most AI models are one-trick ponies — great at text or awesome with images, but not both. Watsonx Multimodal breaks that barrier, offering versatility that opens doors to a myriad of applications. From enhancing customer support with image recognition to automating content generation with nuanced text understanding, the possibilities are endless.

Why Multimodal AI? Because One Dimension is So Last Year

1. Enhanced Comprehension

Humans naturally integrate information from various senses to understand the world. Why should AI be any different? Multimodal models combine text and image data, offering a deeper, more contextual understanding that single-modal models simply can’t match.

2. Superior Performance

By leveraging multiple data sources, multimodal AI can tackle complex tasks more efficiently. Whether it’s generating detailed reports or creating stunning visuals, Watsonx Multimodal models deliver top-notch performance across the board.

3. Diverse Applications

Healthcare: Analyze medical images alongside patient records for comprehensive diagnostics.
E-commerce: Boost product recommendations with integrated image and text analysis.
Education: Develop interactive learning tools that blend textual explanations with illustrative images.

4. Natural Interactions

Forget robotic responses. Multimodal AI enables more human-like interactions, understanding both what you say and what you show, making conversations with AI feel more natural and engaging.

Meet the Watsonx Multimodal Lineup

IBM isn’t playing around. Their Watsonx Multimodal suite features four powerhouse models, each tailored for specific tasks and scales:

meta-llama/llama-3–2–90b-vision-instruct
meta-llama/llama-3–2–11b-vision-instruct
meta-llama/llama-3–1–70b-instruct
meta-llama/llama-3–1–8b-instruct

Model Breakdown

Series 3–2:

Sizes: 90 billion (90b), 11 billion (11b)
Features: Vision-instruct capabilities for seamless image and text processing.

Series 3–1:

Sizes: 70 billion (70b), 8 billion (8b)
Features: Text-instruct capabilities, focusing on advanced text understanding and generation.

These models are designed to be the Swiss Army knives of AI — versatile, powerful, and ready to tackle whatever you throw their way.

Getting Started: Setting Up Watsonx Multimodal in Python

Alright, let’s get down to business. Here’s how you can set up and start using Watsonx Multimodal Models with Python.

Prerequisites:

Python 3.7+: Make sure you have the latest version installed.
IBM Watsonx Account: Sign up and grab those API keys.
Environment Variables: Keep your API keys secure using environment variables.

Step 1: Install the Necessary Packages

Fire up your terminal and run:

pip install requests python-dotenv wolframalpha Pillow matplotlib pygments ibm-watsonx-ai nest_asyncio

Step 2: Configure Your Environment Variables

Create a .env file in your project directory and stash your keys there:

# .env
WATSONX_API_KEY=your_watsonx_api_key
WATSONX_PROJECT_ID=your_watsonx_project_id
WOLFRAM_ALPHA_KEY=your_wolfram_alpha_api_key
TAVILY_API_KEY=your_tavily_api_key

Pro Tip: Never hardcode your API keys. Keep them as secure as your most prized gadgets.

Step 3: Directory Setup

Organize your project like a pro:

your_project/
│
├── images/
│   ├── Llama_Repo.jpeg
│   ├── tree.jpg
│   ├── ww1.jpg
│   ├── ww2.png
│   └── tire_pressure.png
│
├── util.py
├── test_llama_util.py
└── .env

The Power Core: `util.py`

This is your all-in-one toolkit, handling everything from environment variables to image processing and interacting with Watsonx Multimodal Models.

# util.py

import os
import json
import base64
from io import BytesIO
from typing import List, Dict, Union, Any

import requests
from dotenv import load_dotenv, find_dotenv
from wolframalpha import Client
from PIL import Image
import matplotlib.pyplot as plt
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
from pygments import highlight, lexers, formatters
import nest_asyncio

# Apply nest_asyncio to allow nested event loops
nest_asyncio.apply()

# Load environment variables at module load
load_dotenv(find_dotenv())


def get_env_variable(var_name: str) -> str:
    """
    Retrieves the value of the specified environment variable.

    Parameters:
    - var_name (str): The name of the environment variable.

    Returns:
    - str: The value of the environment variable.

    Raises:
    - EnvironmentError: If the environment variable is not set.
    """
    value = os.getenv(var_name)
    if not value:
        raise EnvironmentError(f"Environment variable '{var_name}' is not set.")
    return value


def get_watsonx_credentials() -> Credentials:
    """
    Sets up the credentials for IBM Watsonx.

    Returns:
    - Credentials: The IBM Watsonx credentials.

    Raises:
    - EnvironmentError: If required environment variables are missing.
    """
    api_key = get_env_variable("WATSONX_API_KEY")
    url = os.getenv("WATSONX_URL", "https://us-south.ml.cloud.ibm.com")
    return Credentials(api_key=api_key, url=url)


def get_watsonx_project_id() -> str:
    """
    Retrieves the IBM Watsonx project ID.

    Returns:
    - str: The project ID.

    Raises:
    - EnvironmentError: If the project ID is not set.
    """
    return get_env_variable("WATSONX_PROJECT_ID")


def get_wolfram_alpha_api_key() -> str:
    """
    Retrieves the Wolfram Alpha API key.

    Returns:
    - str: The Wolfram Alpha API key.

    Raises:
    - EnvironmentError: If the API key is not set.
    """
    return get_env_variable("WOLFRAM_ALPHA_KEY")


def get_tavily_api_key() -> str:
    """
    Retrieves the Tavily API key.

    Returns:
    - str: The Tavily API key.

    Raises:
    - EnvironmentError: If the API key is not set.
    """
    return get_env_variable("TAVILY_API_KEY")


def cprint(response: Any) -> None:
    """
    Pretty prints a JSON response with syntax highlighting.

    Parameters:
    - response (Any): The JSON-serializable object to print.
    """
    formatted_json = json.dumps(response, indent=4)
    colorful_json = highlight(
        formatted_json, lexers.JsonLexer(), formatters.TerminalFormatter()
    )
    print(colorful_json)


def disp_image(address: str) -> None:
    """
    Displays an image from a URL or local file path.

    Parameters:
    - address (str): The URL of the image or the local file path.

    Raises:
    - Exception: If the image cannot be fetched or opened.
    """
    try:
        if address.startswith(("http://", "https://")):
            response = requests.get(address)
            response.raise_for_status()
            img = Image.open(BytesIO(response.content))
        else:
            img = Image.open(address)
        
        plt.imshow(img)
        plt.axis("off")
        plt.show()
    except Exception as e:
        raise Exception(f"Failed to display image: {e}")


def resize_image(
    img: Image.Image,
    max_dimension: int = 1120,
    save_path: str = "images/resized_image.jpg"
) -> Image.Image:
    """
    Resizes an image while maintaining aspect ratio.

    Parameters:
    - img (PIL.Image.Image): The image to resize.
    - max_dimension (int, optional): The maximum width or height. Default is 1120.
    - save_path (str, optional): Path to save the resized image. Default is "images/resized_image.jpg".

    Returns:
    - PIL.Image.Image: The resized image.
    """
    original_width, original_height = img.size
    scaling_factor = max_dimension / max(original_width, original_height)
    new_size = (int(original_width * scaling_factor), int(original_height * scaling_factor))
    resized_img = img.resize(new_size, Image.ANTIALIAS)
    
    os.makedirs(os.path.dirname(save_path), exist_ok=True)
    resized_img.save(save_path)
    
    print(f"Original size: {original_width}x{original_height}")
    print(f"New size: {new_size[0]}x{new_size[1]}")
    
    return resized_img


def merge_images(
    image_paths: List[str],
    save_path: str = "images/merged_image_horizontal.jpg"
) -> Image.Image:
    """
    Merges multiple images horizontally.

    Parameters:
    - image_paths (List[str]): List of image file paths to merge.
    - save_path (str, optional): Path to save the merged image. Default is "images/merged_image_horizontal.jpg".

    Returns:
    - PIL.Image.Image: The merged image.

    Raises:
    - Exception: If any of the images cannot be opened.
    """
    try:
        images = [Image.open(path) for path in image_paths]
    except Exception as e:
        raise Exception(f"Failed to open one or more images: {e}")
    
    widths, heights = zip(*(img.size for img in images))
    
    total_width = sum(widths)
    max_height = max(heights)
    
    merged_image = Image.new("RGB", (total_width, max_height))
    
    x_offset = 0
    for img in images:
        merged_image.paste(img, (x_offset, 0))
        x_offset += img.width
    
    os.makedirs(os.path.dirname(save_path), exist_ok=True)
    merged_image.save(save_path)
    
    print(f"Merged image dimensions: {merged_image.size}")
    return merged_image


def encode_image(address: str) -> str:
    """
    Encodes an image from a remote URL or a local file path to a Base64 string.

    Parameters:
    - address (str): The URL of the image or the local file path.

    Returns:
    - str: The Base64 encoded string of the image.

    Raises:
    - Exception: If the image cannot be fetched, read, or is invalid.
    """
    try:
        if address.startswith(("http://", "https://")):
            response = requests.get(address)
            response.raise_for_status()
            image_data = response.content
        else:
            with open(address, "rb") as image_file:
                image_data = image_file.read()
        
        # Validate image data
        Image.open(BytesIO(image_data)).verify()
        
        # Encode to Base64
        return base64.b64encode(image_data).decode("utf-8")
    except requests.exceptions.RequestException as req_err:
        raise Exception(f"HTTP request failed: {req_err}")
    except FileNotFoundError:
        raise Exception(f"Local file not found: {address}")
    except Exception as e:
        raise Exception(f"Failed to encode image: {e}")


def wolfram_alpha(query: str) -> str:
    """
    Queries Wolfram Alpha and returns the result.

    Parameters:
    - query (str): The query string.

    Returns:
    - str: The result from Wolfram Alpha.

    Raises:
    - Exception: If the query fails or returns no results.
    """
    try:
        wolfram_alpha_key = get_wolfram_alpha_api_key()
        client = Client(wolfram_alpha_key)
        result = client.query(query)
        
        results = []
        for pod in result.pods:
            if pod.get("@title") in {"Result", "Results"}:
                for sub in pod.subpods:
                    results.append(sub.plaintext)
        
        if not results:
            raise Exception("No results found in Wolfram Alpha response.")
        
        return "\n".join(results)
    except Exception as e:
        raise Exception(f"Failed to query Wolfram Alpha: {e}")


def llama(
    input_data: Union[str, List[Dict[str, Any]]],
    model_size: int = 11,
    temperature: float = 0.0,
    raw: bool = False,
    debug: bool = False
) -> Union[str, Dict[str, Any]]:
    """
    Generates a completion using IBM Watsonx models, supporting both text and chat completions.

    Parameters:
    - input_data (str or List[Dict[str, Any]]): The prompt string or list of messages for the chat session.
    - model_size (int, optional): Specifies the model size. Supported sizes are:
        - 90, 11 for series "3-2"
        - 70, 8 for series "3-1"
      Defaults to 11.
    - temperature (float, optional): Sampling temperature. Default is 0.
    - raw (bool, optional): If True, returns the raw API response. Default is False.
    - debug (bool, optional): If True, prints the payload being sent. Default is False.

    Returns:
    - Union[str, Dict[str, Any]]: The generated response content or the raw response if `raw` is True.

    Raises:
    - Exception: If required environment variables are missing, if the model parameters are invalid, or if the API returns an error.
    - TypeError: If `input_data` is neither a string nor a list of messages.
    - ValueError: If an invalid `model_size` is provided.
    """
    # Define valid model sizes and their corresponding series
    model_series_mapping = {
        90: "3-2",
        11: "3-2",
        70: "3-1",
        8: "3-1"
    }
    
    if model_size not in model_series_mapping:
        raise ValueError(
            f"Invalid model_size '{model_size}'. Supported sizes are {list(model_series_mapping.keys())}."
        )
    
    model_series = model_series_mapping[model_size]
    
    # Construct the model_id based on series and size
    model_id = f"meta-llama/llama-3-{model_series}-{model_size}b"
    if model_series == "3-2":
        model_id += "-vision-instruct"
    else:  # "3-1"
        model_id += "-instruct"
    
    credentials = get_watsonx_credentials()
    project_id = get_watsonx_project_id()

    params = {
        "temperature": temperature,
        "max_new_tokens": 4096,
        "stop_sequences": ["<|eot_id|>", "<|eom_id|>"],
    }

    model_inference = ModelInference(
        model_id=model_id,
        params=params,
        credentials=credentials,
        project_id=project_id
    )

    try:
        if isinstance(input_data, str):
            if debug:
                print(f"Sending prompt: {input_data}")
            response = model_inference.generate(prompt=input_data)
            response_type = "generate"
        elif isinstance(input_data, list):
            if debug:
                print(f"Sending messages: {input_data}")
            response = model_inference.chat(messages=input_data)
            response_type = "chat"
        else:
            raise TypeError("`input_data` must be either a string or a list of messages (dicts).")
    except Exception as e:
        raise Exception(f"Failed to get response from Watsonx ({response_type}): {e}")

    if raw:
        return response

    # Extract and return the generated content
    try:
        if response_type == "generate":
            return response["results"][0]["generated_text"]
        else:  # "chat"
            return response["choices"][0]["message"]["content"]
    except (KeyError, IndexError) as e:
        raise Exception(f"Unexpected response structure: {e}")


def get_boiling_point(liquid_name: str, celsius: float) -> List[Any]:
    """
    Placeholder function to get the boiling point of a liquid.

    Parameters:
    - liquid_name (str): The name of the liquid.
    - celsius (float): Temperature in Celsius.

    Returns:
    - List[Any]: Empty list as a placeholder.

    TODO:
    - Implement the actual logic to retrieve boiling points.
    """
    # TODO: Implement this function
    return []

Testing the Beast: `test_llama_util.py`

This script showcases how to leverage the utility functions to interact with Watsonx Multimodal Models, perform image processing, and handle various query types. It’s like your own personal lab for experimenting with AI magic.

# test_llama_util.py

import warnings
import sys
from typing import List, Dict, Any

# Suppress warnings for cleaner output
warnings.filterwarnings("ignore")

# Import necessary functions from util.py
from util import encode_image, llama, disp_image

# Define a helper function to handle API calls with error handling
def llamapi(prompt: str, base64_image: str, model_size: int = 11) -> str:
    """
    Helper function to interact with the llama model.

    Parameters:
    - prompt (str): The text prompt to send to the model.
    - base64_image (str): The Base64 encoded image string.
    - model_size (int, optional): The size of the model to use. Defaults to 11.

    Returns:
    - str: The response from the model.

    Raises:
    - Exception: If the llama function fails.
    """
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
            ],
        },
    ]
    try:
        result = llama(messages, model_size)
        return result
    except Exception as e:
        print(f"Error in llamapi: {e}", file=sys.stderr)
        return ""

# Test Cases
def main():
    # Test 1: Text Input Only
    print("=== Test 1: Text Input Only ===")
    messages: List[Dict[str, Any]] = [{"role": "user", "content": "Who wrote the book Charlotte's Web?"}]
    
    # Using model_size=90 (series "3-2")
    try:
        response_32 = llama(messages, model_size=90)
        print("Response (3-2, 90b):", response_32)
    except Exception as e:
        print(f"Failed to get response with model_size=90: {e}", file=sys.stderr)
    
    # Using model_size=70 (series "3-1")
    try:
        response_31 = llama(messages, model_size=70)
        print("Response (3-1, 70b):", response_31)
    except Exception as e:
        print(f"Failed to get response with model_size=70: {e}", file=sys.stderr)
    
    # Using default model_size=11 (series "3-2")
    try:
        response_32_d = llama(messages)
        print("Response (3-2, default 11b):", response_32_d)
    except Exception as e:
        print(f"Failed to get response with default model_size=11: {e}", file=sys.stderr)
    
    # Test 2: Reprompting with New Question
    print("\n=== Test 2: Reprompting with New Question ===")
    messages_extended: List[Dict[str, Any]] = [
        {"role": "user", "content": "Who wrote the book Charlotte's Web?"},
        {"role": "assistant", "content": response_32_d},
        {"role": "user", "content": "3 of the best quotes"},
    ]
    
    # Using model_size=90
    try:
        response_32_extended = llama(messages_extended, model_size=90)
        print("Extended Response (3-2, 90b):", response_32_extended)
    except Exception as e:
        print(f"Failed to get extended response with model_size=90: {e}", file=sys.stderr)
    
    # Using model_size=70
    try:
        response_31_extended = llama(messages_extended, model_size=70)
        print("Extended Response (3-1, 70b):", response_31_extended)
    except Exception as e:
        print(f"Failed to get extended response with model_size=70: {e}", file=sys.stderr)
    
    # Test 3: Question About an Image (Local Image)
    print("\n=== Test 3: Question About an Image (Local Image) ===")
    local_image_path = "images/Llama_Repo.jpeg"
    try:
        disp_image(local_image_path)
    except Exception as e:
        print(f"Failed to display local image: {e}", file=sys.stderr)
    
    try:
        base64_image = encode_image(local_image_path)
    except Exception as e:
        print(f"Failed to encode local image: {e}", file=sys.stderr)
        base64_image = ""
    
    if base64_image:
        try:
            result = llamapi("describe the image in one sentence", base64_image, model_size=11)
            print("Image Description (11b):", result)
        except Exception as e:
            print(f"Failed to get image description with model_size=11: {e}", file=sys.stderr)
    
    # Test 4: Question About an Image (Remote Image)
    print("\n=== Test 4: Question About an Image (Remote Image) ===")
    image_url = "https://raw.githubusercontent.com/meta-llama/llama-models/refs/heads/main/Llama_Repo.jpeg"
    try:
        disp_image(image_url)
    except Exception as e:
        print(f"Failed to display remote image: {e}", file=sys.stderr)
    
    try:
        base64_image_remote = encode_image(image_url)
    except Exception as e:
        print(f"Failed to encode remote image: {e}", file=sys.stderr)
        base64_image_remote = ""
    
    if base64_image_remote:
        try:
            result_remote = llamapi("describe the image in one sentence", base64_image_remote, model_size=90)
            print("Image Description (90b):", result_remote)
        except Exception as e:
            print(f"Failed to get image description with model_size=90: {e}", file=sys.stderr)
    
    # Test 5: Follow-up Question About an Image
    print("\n=== Test 5: Follow-up Question About an Image ===")
    messages_follow_up: List[Dict[str, Any]] = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "describe the image in one sentence"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image_remote}"},
                },
            ],
        },
        {"role": "assistant", "content": result_remote},
        {"role": "user", "content": "how many of them are purple? And where it is?"},
    ]
    
    try:
        follow_up_result = llama(messages_follow_up)
        print("Follow-up Response (default 11b):", follow_up_result)
    except Exception as e:
        print(f"Failed to get follow-up response: {e}", file=sys.stderr)
    
    # Test 6: Plant Recognition
    print("\n=== Test 6: Plant Recognition ===")
    plant_image_path = "images/tree.jpg"
    try:
        disp_image(plant_image_path)
    except Exception as e:
        print(f"Failed to display plant image: {e}", file=sys.stderr)
    
    plant_question = "What kind of plant is this in my garden? Describe it in a short paragraph."
    try:
        base64_plant_image = encode_image(plant_image_path)
    except Exception as e:
        print(f"Failed to encode plant image: {e}", file=sys.stderr)
        base64_plant_image = ""
    
    if base64_plant_image:
        try:
            plant_result = llamapi(plant_question, base64_plant_image)
            print("Plant Recognition Result:", plant_result)
        except Exception as e:
            print(f"Failed to get plant recognition result: {e}", file=sys.stderr)
    
    # Test 7: Dog Breed Recognition
    print("\n=== Test 7: Dog Breed Recognition ===")
    dog_image_paths = ["images/ww1.jpg", "images/ww2.png"]
    dog_question = "What dog breed is this? Describe in one paragraph, and 3-5 short bullet points."
    
    for dog_image_path in dog_image_paths:
        try:
            disp_image(dog_image_path)
        except Exception as e:
            print(f"Failed to display dog image ({dog_image_path}): {e}", file=sys.stderr)
        
        try:
            base64_dog_image = encode_image(dog_image_path)
        except Exception as e:
            print(f"Failed to encode dog image ({dog_image_path}): {e}", file=sys.stderr)
            base64_dog_image = ""
        
        if base64_dog_image:
            try:
                dog_result = llamapi(dog_question, base64_dog_image)
                print(f"Dog Breed Recognition Result ({dog_image_path}):", dog_result)
            except Exception as e:
                print(f"Failed to get dog breed recognition result ({dog_image_path}): {e}", file=sys.stderr)
    
    # Test 8: Tire Pressure Warning
    print("\n=== Test 8: Tire Pressure Warning ===")
    tire_image_path = "images/tire_pressure.png"
    try:
        disp_image(tire_image_path)
    except Exception as e:
        print(f"Failed to display tire pressure image: {e}", file=sys.stderr)
    
    tire_question = "What's the problem this is about? What should be good numbers?"
    try:
        base64_tire_image = encode_image(tire_image_path)
    except Exception as e:
        print(f"Failed to encode tire pressure image: {e}", file=sys.stderr)
        base64_tire_image = ""
    
    if base64_tire_image:
        try:
            tire_result = llamapi(tire_question, base64_tire_image)
            print("Tire Pressure Warning Result:", tire_result)
        except Exception as e:
            print(f"Failed to get tire pressure warning result: {e}", file=sys.stderr)

if __name__ == "__main__":
    main()

Showtime: Real-World Applications

Let’s dive into some real-world scenarios where Watsonx Multimodal Models shine.

1. Text-Based Queries

Example: Who Wrote “Charlotte’s Web”?

# Define the chat messages
messages = [{"role": "user", "content": "Who wrote the book Charlotte's Web?"}]

# Generate a chat completion using model_size=90 (series "3-2")
response_32 = llama(messages, model_size=90)
print("Response (3-2, 90b):", response_32)

# Generate a chat completion using model_size=70 (series "3-1")
response_31 = llama(messages, model_size=70)
print("Response (3-1, 70b):", response_31)

# Generate a chat completion using the default model_size=11 (series "3-2")
response_32_d = llama(messages)
print("Response (3-2, default 11b):", response_32_d)

Output:

Response (3-2, 90b): The book "Charlotte's Web" was written by E.B. White.
Response (3-1, 70b): The book "Charlotte's Web" was written by E.B. White.
Response (3-2, default 11b): The book "Charlotte's Web" was written by E.B. White. It was first published in 1952 and has since become a beloved children's classic. E.B. White was an American author, best known for his children's books, including "Charlotte's Web", "Stuart Little", and "The Trumpet of the Swan".

2. Image-Based Queries

Example: Describe an Image

# Local Image
local_image_path = "images/Llama_Repo.jpeg"
disp_image(local_image_path)

# Encode the image
base64_image = encode_image(local_image_path)

# Create the message with image
result = llamapi("describe the image in one sentence", base64_image, model_size=11)
print("Image Description (11b):", result)

Output:

Image Description (11b): The image depicts three alpacas in a barn, with one wearing a party hat and another with a purple body, sitting at a table with a glass of orange liquid and some greenery.

3. Combined Text and Image Queries

Example: Follow-up Question About an Image

messages_follow_up = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "describe the image in one sentence"},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{base64_image_remote}"},
            },
        ],
    },
    {"role": "assistant", "content": result_remote},
    {"role": "user", "content": "how many of them are purple? And where it is?"},
]

follow_up_result = llama(messages_follow_up)
print("Follow-up Response (default 11b):", follow_up_result)

Output:

Image Description (90b): The image depicts three llamas, one of which is purple and wearing a party hat, standing behind a table with a glass of beer on it.
Follow-up Response (default 11b): One of the llamas is purple, and it is in the middle of the image.

Best Practices: Level Up Your AI Game

1. Secure Your API Keys

Environment Variables: Always store your API keys in environment variables. Use .env files with python-dotenv to keep them safe and out of your codebase.
Never Hardcode: Hardcoding API keys is a rookie move. Keep them secure and manage access diligently.

2. Handle Errors Like a Pro

Use Try-Except Blocks: Wrap your API calls and image processing in try-except blocks to catch and handle errors gracefully.
Informative Messages: Provide clear and concise error messages to make debugging easier.

3. Optimize Image Handling

Resize for Efficiency: Large images can slow down processing. Use the resize_image function to maintain quality while reducing size.
Validate Images: Always verify that your images are not corrupted using libraries like PIL before processing.

4. Utilize Helper Functions

DRY Principle: Don’t Repeat Yourself. Use helper functions like llamapi to streamline repetitive tasks and keep your code clean.
Modular Design: Keep your utility functions organized in separate modules (util.py) for easy maintenance and scalability.

5. Monitor API Usage

Stay Within Limits: Keep an eye on your API usage to avoid hitting rate limits or incurring unexpected costs.
Efficient Requests: Batch your requests when possible and optimize your queries to make the most out of each API call.

6. Keep Everything Updated

Library Versions: Regularly update your libraries to benefit from the latest features and security patches.
Model Updates: Stay informed about new model releases or updates from IBM to leverage the best performance and capabilities.

References

IBM Watsonx Documentation(https://www.ibm.com/docs/en/watsonx)
Python Dotenv Documentation(https://saurabh-kumar.com/python-dotenv/)
Pillow (PIL) Documentation(https://pillow.readthedocs.io/en/stable/)
Matplotlib Documentation(https://matplotlib.org/stable/contents.html)
Wolfram Alpha API(https://products.wolframalpha.com/api/)
Pygments Documentation(https://pygments.org/docs/)
Nest Asyncio Documentation(https://github.com/erdewit/nest_asyncio)

Wrapping It Up: Your AI Journey Starts Now

IBM Watsonx Multimodal Models are a game-changer in the AI landscape, offering unparalleled versatility and power. Whether you’re a seasoned developer or just getting started, these models provide the tools you need to build intelligent, responsive, and innovative applications.

By integrating text and image processing, Watsonx Multimodal empowers you to create solutions that are not only smart but also intuitive and user-friendly. So, gear up, dive into the scripts, and start building the next big thing in AI. The future is multimodal, and it’s here to stay.

Stay sharp, stay innovative, and let your AI creations soar! 🚀

Unleashing the Power of IBM Watsonx Multimodal Models with Python

Hey Innovators!

What’s the Buzz About Watsonx Multimodal?

Why Settle for Less?

Why Multimodal AI? Because One Dimension is So Last Year

1. Enhanced Comprehension

2. Superior Performance

3. Diverse Applications

4. Natural Interactions

Meet the Watsonx Multimodal Lineup

Model Breakdown

Getting Started: Setting Up Watsonx Multimodal in Python

Prerequisites:

Step 1: Install the Necessary Packages

Step 2: Configure Your Environment Variables

Step 3: Directory Setup

The Power Core: `util.py`

Testing the Beast: `test_llama_util.py`

Showtime: Real-World Applications

1. Text-Based Queries

2. Image-Based Queries

3. Combined Text and Image Queries

Best Practices: Level Up Your AI Game

1. Secure Your API Keys

2. Handle Errors Like a Pro

3. Optimize Image Handling

4. Utilize Helper Functions

5. Monitor API Usage

6. Keep Everything Updated

References

Wrapping It Up: Your AI Journey Starts Now

Written by Charan H U

No responses yet

Unleashing the Power of IBM Watsonx Multimodal Models with Python

Hey Innovators!

What’s the Buzz About Watsonx Multimodal?

Why Settle for Less?

Why Multimodal AI? Because One Dimension is So Last Year

1. Enhanced Comprehension

2. Superior Performance

3. Diverse Applications

4. Natural Interactions

Meet the Watsonx Multimodal Lineup

Model Breakdown

Getting Started: Setting Up Watsonx Multimodal in Python

Prerequisites:

Step 1: Install the Necessary Packages

Step 2: Configure Your Environment Variables

Step 3: Directory Setup

The Power Core: util.py

Testing the Beast: test_llama_util.py

Showtime: Real-World Applications

1. Text-Based Queries

2. Image-Based Queries

3. Combined Text and Image Queries

Best Practices: Level Up Your AI Game

1. Secure Your API Keys

2. Handle Errors Like a Pro

3. Optimize Image Handling

4. Utilize Helper Functions

5. Monitor API Usage

6. Keep Everything Updated

References

Wrapping It Up: Your AI Journey Starts Now

Written by Charan H U

No responses yet

The Power Core: `util.py`

Testing the Beast: `test_llama_util.py`