This document walks you through the better use of Karlo APIs.
The Karlo model version for Karlo APIs has been changed to 2.0 on July 6, 2023, and to 2.0.4.0 on November 10, 2023 with additional features related to image editing. The new version offers enhanced features than previous versions.
When requesting to convert an image or edit an image, we recommend you use an image file that meets the specification. In particular, the file should not be too large in size to use Karlo API efficiently.
Also, the content of the original image should not be overly difficult to understand what is depicted. Karlo needs to be able to understand what is depicted in the original image and what the mood is. We recommend you to use images with clearly identifiable components, such as people or objects.
To ensure that your suggestions are easy to understand for Karlo, we recommend you to the guidelines below.
Principles | Good example | Bad example |
---|---|---|
Use objective and clear descriptions | A cat, blue-eyed, white fur | A good white cat with blue eyes seems nice |
Write in a concise sentence | A wooden bench, old, along the street, sunny day, alone | A wooden old bench along the street, it's a sunny day, there is nobody on the bench and the street |
Write in a combination of words | A black mug, and many books, on a table | There are a black mug and many books on the table. |
Enter anything you want to exclude or control from the image when it is generated. You can input negative prompts like "Worst quality" or "Low quality" to prevent low-quality images. An example is below.
Parameter | Text |
---|---|
Prompt | a majestic reindeer, standing in snow forest at night, hyperrealistic, unreal engine 5, starstation, sharp focus, 8k |
Negative prompt | low quality, low contrast, draft, amateur, cut off, cropped, frame |
You can set weights on the keywords in the prompt to more easily create the intended image. To increase the weight, add parentheses ( )
around the keyword, and to decrease the weight, add brackets [ ]
. If you want to exclude a subject from the results, instead of weighting, you should use negative prompts. Below is an example of setting a weight on the eggs
keyword in the a photo of eggs on a frying pan
prompt to generate the intended image.
In general, there are certain elements that AI models cannot naturally depict. We recommend you avoid the following prompts when using Karlo.
You should use standard language nouns for prompts you want to convey to Karlo. If you use dialects or slang, Karlo may not understand them and not generate correct images. Also, Karlo is very specific about the meaning of prompts, so it is important to avoid homonyms and choose nouns that convey your intent well. Below are examples of Generate image of a tiger, a baby tiger, and a sleeping tiger.
You can include expressions for a time of day in your prompts, such as day, night, dawn, and evening, as well as expressions for a time of year, such as summer and Christmas. Below are examples of when you include day, night, and autumn of Generate image for the same object.
You can include color descriptions in your prompts. Below are examples that include color descriptors bright, warm, and vivid, respectively, in a request of Generate image of the same object.
Including "by" and the artist's name in your suggestion will generate images with a similar mood to a specific artist's style. (Example: by Renoir) Below are examples of Generate image of a dog in the garden in the style of several artists.
You can specify unwanted compositions with a negative prompt. Below is an example of Generate image with a negative prompt.
Parameter | Text |
---|---|
Prompt | hyper realistic photo of cyberpunk sports car driving away |
Negative prompt | object out of frame, out of frame, body out of frame |
You can improve character representation with a negative prompt. By including texts in Caution, you can make a person in the image more natural. Below is an example of Generate image with a negative prompt.
Parameter | Text |
---|---|
Prompt | a young woman with red hair in white shirt, sharp oil painting, intricate details, medium shot |
Negative prompt | body out of frame, out of frame, bad anatomy, distortion, disfigured, poorly drawn face, poorly drawn hands |
To exclude unnecessary texts, signatures, and watermarks in the image, use a negative prompt. Below is an example of Generate image with a negative prompt.
Parameter | Text |
---|---|
Prompt | note, bright, warm mood |
Negative prompt | text, letter, signature, watermark |
upscale
is a parameter that sets whether to upscale the generated image by 2x or 4x. Karlo performs Upscale when the request includes upscale
as true
. By Generate image, you can generate images with a size of up to 640 pixels in width and height. If using upscale
, you can upscale the image with a size of up to 2048 pixels in width and height. The best quality image can be generated when the width
and height
are set to 512 pixels as the default and upscale
is used.
Below is an example of a Generate image request with upscale
as true
and false
(default). Each request has the same prompt
and seed
.
prior_num_inference_steps
is a parameter that sets the variety in the image generation process. The result depends on the value:
Value | Advantage | Disadvantage |
---|---|---|
High | The generated image reflects the prompt strictly. The generated image can be more creative. |
The desired content may not be included. The generated image can be abstract or low-quality. |
Low | The generated images can be similar because of the low variety. | The generated images can be less creative. |
Below is an example of a Generate image request with prior_num_inference_steps
as 10 (minimum), 25 (default), and 100 (maximum). Each request has the same prompt
and seed
.
prior_guidance_scale
is a parameter that sets the scale of the variety that set by prior_num_inference_steps
. Karlo may generate different content from the prompt if the value is excessively low.
You can adjust one of the prior_guidance_scale
or prior_num_inference_steps
and fix another at the same value to improve the quality of the generated image.
Below is an example of a Generate image request with prior_guidance_scale
as 1.0 (minimum), 5.0 (default), and 20.0 (maximum). Each request has the same prompt
and seed
.
num_inference_steps
is a parameter that sets the level of detail based on the variety by prior_num_inference_steps
. The result depends on the value:
Value | Advantage | Disadvantage |
---|---|---|
High | The generated image becomes more detailed. The contents become more organized based on the prompt. |
The value affects the result not efficiently. |
Low | The presentation can be less detailed to meet the usage. | The generated image can be low-quality or different content from the prompt. |
Below is an example of a Generate image request with num_inference_steps
as 10 (minimum), 50 (default), and 100 (maximum). Each request has the same prompt
and seed
.
guidance_scale
is a parameter that sets the guidance scale of the decoder denoising process. If the higher the value is, the generated image reflects the prompt strictly. However, Karlo may generate low-quality images if the value is excessively high.
Below is an example of a Generate image request with num_inference_steps
as 5.0 (default) and 20.0 (maximum). Each request has the same prompt
and seed
.
scheduler
is a parameter that sets the scheduler used by the decoder denoising process. You can choose one of decoder_ddim_v_prediction
or decoder_ddpm_v_prediction
. Even if you request with the same prompt
and seed
value, the result may differ by scheduler
.
decoder_ddim_v_prediction
: Tend to generate images with sharper representation.decoder_ddpm_v_prediction
: Tend to generate images with blur representation.seed
is a parameter that sets the seed value for each image. You can generate the same image with the same prompt
and seed
value. seed
is useful when you want to improve the generated image with other parameters.
face_refiner
is a parameter related to the refinement of facial structure in the image. It can be requested separately via Refine facial structure, or as a parameter in a request via Generate image, Make variation, Modify image. For the details on each sub-parameter, refer to the below.
bbox_size_threshold
is a parameter that sets the maximum size of the facial area to apply the face_refiner
feature, as a ratio to the overall image. Only areas smaller than the set size are recognized as faces.
Below is an example of Refine facial structure request using the same source image with bbox_size_threshold
as 0.5
(not apply face reshaping) and 0.9
(apply face reshaping).
bbox_filter_threshold
is a parameter that sets the threshold for determining whether an image is a human face. The higher the value, the stricter the criteria and the higher the probability of determining that it is not a human face.
Below is an example of Refine facial structure request using the same source image with bbox_filter_threshold
as 0.95
(not apply face reshaping) and 0.8
(apply face reshaping).