This document walks you through the better use of Karlo APIs.
As of February 22, 2024, the Karlo model used in the Karlo API has been upgraded to version 2.1 with enhanced image generation quality.
To ensure that your suggestions are easy to understand for Karlo, we recommend you to the guidelines below.
Principles | Good example | Bad example |
---|---|---|
Use objective and clear descriptions | man looking up at a clock, sunset, apocalypse landscape, in the style of surrealism, dark orange and amber | The day is about to arrive |
Write in a combination of words | A teenage girl has dark black eyes, flowy hair | A girl |
Enter anything you want to exclude or control from the image when it is generated. You can input negative prompts like "Worst quality" or "Low quality" to prevent low-quality images. An example is below.
To generate high-quality images, it is important to use effective prompts that adhere to principles. Negative prompts are recommended to be used only for excluding or preventing specific elements.
Parameter | Text |
---|---|
Prompt | a majestic reindeer, standing in snow forest at night, hyperrealistic, unreal engine 5, starstation, sharp focus, 8k |
Negative prompt | low quality, low contrast, draft, amateur, cut off, cropped, frame |
In general, there are certain elements that AI models cannot naturally depict. We recommend you avoid the following prompts when using Karlo.
When requesting Make variation, we recommend you use an image file that meets the Image file specification. In particular, the file should not be too large in size to use Karlo API efficiently.
Also, the content of the original image should not be overly difficult to understand what is depicted. Karlo needs to be able to understand what is depicted in the original image and what the mood is. We recommend you to use images with clearly identifiable components, such as people or objects.
You should use standard language nouns for prompts you want to convey to Karlo. If you use dialects or slang, Karlo may not understand them and not generate correct images. Also, Karlo is very specific about the meaning of prompts, so it is important to avoid homonyms and choose nouns that convey your intent well. Below are examples of Generate image of a tiger, a sleeping tiger, and a baby tiger.
You can include expressions for a time of day in your prompts, such as day, night, dawn, and evening, as well as expressions for a time of year, such as summer and Christmas. Below are examples of when you include day, night, and autumn of Generate image for the same object.
You can include color descriptions in your prompts. Below are examples that include color descriptors bright, warm, and vivid, respectively, in a request of Generate image of the same object.
Including "by" and the artist's name in your suggestion will generate images with a similar mood to a specific artist's style. (Example: by Renoir) Below are examples of Generate image of a dog in the garden in the style of several artists.
version
is a parameter for selecting the model version to apply to the Karlo API. The configurable parameters differ depending on the applied model version. Refer to the list of configurable parameters for each model version below.
upscale
, prior_num_inference_steps
, prior_guidance_scale
, num_inference_steps
, guidance_scale
, seed
v2.0
) only: scheduler
, face_refiner
, bbox_size_threshold
, bbox_filter_threshold
upscale
is a parameter that specifies whether to enlarge the image. If you request the upscale
value to be true
, you can generate an image up to 2048 pixels in width and height.
Below is an example of a Generate image request with upscale
as true
and false
(default). Each request has the same prompt
and seed
.
prior_num_inference_steps
is a parameter that sets the variety in the image generation process. The result depends on the value:
Value | Advantage | Disadvantage |
---|---|---|
High | The generated image reflects the prompt strictly. The generated image can be more creative. |
The desired content may not be included. The generated image can be abstract or low-quality. |
Low | The generated images can be similar because of the low variety. | The generated images can be less creative. |
Below is an example of a Generate image request with prior_num_inference_steps
as 10 (minimum), 25 (default), and 100 (maximum). Each request has the same prompt
and seed
.
prior_guidance_scale
is a parameter that sets the scale of the variety that set by prior_num_inference_steps
. Karlo may generate different content from the prompt if the value is excessively low.
You can adjust one of the prior_guidance_scale
or prior_num_inference_steps
and fix another at the same value to improve the quality of the generated image.
Below is an example of a Generate image request with prior_guidance_scale
as 1.0 (minimum), 5.0 (default), and 20.0 (maximum). Each request has the same prompt
and seed
.
num_inference_steps
is a parameter that sets the level of detail based on the variety by prior_num_inference_steps
. The result depends on the value:
Value | Advantage | Disadvantage |
---|---|---|
High | The generated image becomes more detailed. The contents become more organized based on the prompt. |
The value affects the result not efficiently. |
Low | The presentation can be less detailed to meet the usage. | The generated image can be low-quality or different content from the prompt. |
Below is an example of a Generate image request with num_inference_steps
as 10 (minimum), 50 (default), and 100 (maximum). Each request has the same prompt
and seed
.
guidance_scale
is a parameter that sets the guidance scale of the decoder denoising process. If the higher the value is, the generated image reflects the prompt strictly. However, Karlo may generate low-quality images if the value is excessively high.
Below is an example of a Generate image request with num_inference_steps
as 5.0 (default) and 20.0 (maximum). Each request has the same prompt
and seed
.
scheduler
is a parameter that sets the scheduler used by the decoder denoising process. You can choose one of decoder_ddim_v_prediction
or decoder_ddpm_v_prediction
. Even if you request with the same prompt
and seed
value, the result may differ by scheduler
.
decoder_ddim_v_prediction
: Tend to generate images with sharper representation.decoder_ddpm_v_prediction
: Tend to generate images with blur representation.seed
is a parameter that sets the seed value for each image. You can generate the same image with the same prompt
and seed
value. seed
is useful when you want to improve the generated image with other parameters.
face_refiner
is a parameter related to the refinement of facial structure in the image. It can be requested separately via Refine facial structure, or as a parameter in a request via Generate image, Make variation, Modify image. For the details on each sub-parameter, refer to the below.
bbox_size_threshold
is a parameter that sets the maximum size of the facial area to apply the face_refiner
feature, as a ratio to the overall image. Only areas smaller than the set size are recognized as faces.
Below is an example of Refine facial structure request using the same source image with bbox_size_threshold
as 0.5
(not apply face reshaping) and 0.9
(apply face reshaping).
bbox_filter_threshold
is a parameter that sets the threshold for determining whether an image is a human face. The higher the value, the stricter the criteria and the higher the probability of determining that it is not a human face.
Below is an example of Refine facial structure request using the same source image with bbox_filter_threshold
as 0.95
(not apply face reshaping) and 0.8
(apply face reshaping).