페이지 이동경로
  • Docs>
  • Karlo>
  • How to use

Karlo

How to use

This document walks you through the better use of Karlo APIs.

Version upgrade

As of February 22, 2024, the Karlo model used in the Karlo API has been upgraded to version 2.1 with enhanced image generation quality.

Fundamental

Prompt

To ensure that your suggestions are easy to understand for Karlo, we recommend you to the guidelines below.

Principles Good example Bad example
Use objective and clear descriptions man looking up at a clock, sunset, apocalypse landscape, in the style of surrealism, dark orange and amber The day is about to arrive
Write in a combination of words A teenage girl has dark black eyes, flowy hair A girl

Negative prompt

Enter anything you want to exclude or control from the image when it is generated. You can input negative prompts like "Worst quality" or "Low quality" to prevent low-quality images. An example is below.

To generate high-quality images, it is important to use effective prompts that adhere to principles. Negative prompts are recommended to be used only for excluding or preventing specific elements.

Parameter Text
Prompt a majestic reindeer, standing in snow forest at night, hyperrealistic, unreal engine 5, starstation, sharp focus, 8k
Negative prompt low quality, low contrast, draft, amateur, cut off, cropped, frame
Negative prompt sample

Caution

Prompt

In general, there are certain elements that AI models cannot naturally depict. We recommend you avoid the following prompts when using Karlo.

  • Descriptions of jointed parts, such as hands or feet.
  • Presentations that require multiple people to make a specific facial expression.
  • Any suggestions that require complex composition or placement.

Image for Make variation

When requesting Make variation, we recommend you use an image file that meets the Image file specification. In particular, the file should not be too large in size to use Karlo API efficiently.

Also, the content of the original image should not be overly difficult to understand what is depicted. Karlo needs to be able to understand what is depicted in the original image and what the mood is. We recommend you to use images with clearly identifiable components, such as people or objects.

Word choice

Noun

You should use standard language nouns for prompts you want to convey to Karlo. If you use dialects or slang, Karlo may not understand them and not generate correct images. Also, Karlo is very specific about the meaning of prompts, so it is important to avoid homonyms and choose nouns that convey your intent well. Below are examples of Generate image of a tiger, a sleeping tiger, and a baby tiger.

Noun sample

Time

You can include expressions for a time of day in your prompts, such as day, night, dawn, and evening, as well as expressions for a time of year, such as summer and Christmas. Below are examples of when you include day, night, and autumn of Generate image for the same object.

Time sample

Color tone

You can include color descriptions in your prompts. Below are examples that include color descriptors bright, warm, and vivid, respectively, in a request of Generate image of the same object.

Color tone sample

Style

Including "by" and the artist's name in your suggestion will generate images with a similar mood to a specific artist's style. (Example: by Renoir) Below are examples of Generate image of a dog in the garden in the style of several artists.

Style sample

Advanced: Parameter

version

version is a parameter for selecting the model version to apply to the Karlo API. The configurable parameters differ depending on the applied model version. Refer to the list of configurable parameters for each model version below.

  • Configurable in all versions: upscale, prior_num_inference_steps, prior_guidance_scale, num_inference_steps, guidance_scale, seed
  • Configurable in version 2.0 (v2.0) only: scheduler, face_refiner, bbox_size_threshold, bbox_filter_threshold

upscale

upscale is a parameter that specifies whether to enlarge the image. If you request the upscale value to be true, you can generate an image up to 2048 pixels in width and height.

Below is an example of a Generate image request with upscale as true and false (default). Each request has the same prompt and seed.

upscale

prior_num_inference_steps

prior_num_inference_steps is a parameter that sets the variety in the image generation process. The result depends on the value:

Value Advantage Disadvantage
High The generated image reflects the prompt strictly.
The generated image can be more creative.
The desired content may not be included.
The generated image can be abstract or low-quality.
Low The generated images can be similar because of the low variety. The generated images can be less creative.

Below is an example of a Generate image request with prior_num_inference_steps as 10 (minimum), 25 (default), and 100 (maximum). Each request has the same prompt and seed.

prior_num_inference_steps

prior_guidance_scale

prior_guidance_scale is a parameter that sets the scale of the variety that set by prior_num_inference_steps. Karlo may generate different content from the prompt if the value is excessively low.

You can adjust one of the prior_guidance_scale or prior_num_inference_steps and fix another at the same value to improve the quality of the generated image.

Below is an example of a Generate image request with prior_guidance_scale as 1.0 (minimum), 5.0 (default), and 20.0 (maximum). Each request has the same prompt and seed.

prior_guidance_scale

num_inference_steps

num_inference_steps is a parameter that sets the level of detail based on the variety by prior_num_inference_steps. The result depends on the value:

Value Advantage Disadvantage
High The generated image becomes more detailed.
The contents become more organized based on the prompt.
The value affects the result not efficiently.
Low The presentation can be less detailed to meet the usage. The generated image can be low-quality or different content from the prompt.

Below is an example of a Generate image request with num_inference_steps as 10 (minimum), 50 (default), and 100 (maximum). Each request has the same prompt and seed.

num_inference_steps

guidance_scale

guidance_scale is a parameter that sets the guidance scale of the decoder denoising process. If the higher the value is, the generated image reflects the prompt strictly. However, Karlo may generate low-quality images if the value is excessively high.

Below is an example of a Generate image request with num_inference_steps as 5.0 (default) and 20.0 (maximum). Each request has the same prompt and seed.

guidance_scale

scheduler

scheduler is a parameter that sets the scheduler used by the decoder denoising process. You can choose one of decoder_ddim_v_prediction or decoder_ddpm_v_prediction. Even if you request with the same prompt and seed value, the result may differ by scheduler.

  • decoder_ddim_v_prediction: Tend to generate images with sharper representation.
  • decoder_ddpm_v_prediction: Tend to generate images with blur representation.
scheduler

seed

seed is a parameter that sets the seed value for each image. You can generate the same image with the same prompt and seed value. seed is useful when you want to improve the generated image with other parameters.

seed

face_refiner

face_refiner is a parameter related to the refinement of facial structure in the image. It can be requested separately via Refine facial structure, or as a parameter in a request via Generate image, Make variation, Modify image. For the details on each sub-parameter, refer to the below.

bbox_size_threshold

bbox_size_threshold is a parameter that sets the maximum size of the facial area to apply the face_refiner feature, as a ratio to the overall image. Only areas smaller than the set size are recognized as faces.

Below is an example of Refine facial structure request using the same source image with bbox_size_threshold as 0.5(not apply face reshaping) and 0.9(apply face reshaping).

bbox_size_threshold
bbox_filter_threshold

bbox_filter_threshold is a parameter that sets the threshold for determining whether an image is a human face. The higher the value, the stricter the criteria and the higher the probability of determining that it is not a human face.

Below is an example of Refine facial structure request using the same source image with bbox_filter_threshold as 0.95(not apply face reshaping) and 0.8(apply face reshaping).

bbox_size_threshold