페이지 이동경로
  • Docs>
  • Karlo>
  • How to use

Karlo

How to use

This document walks you through the better use of Karlo APIs.

Version upgrade

The Karlo model version for Karlo APIs has been changed to 2.0 on July 6, 2023, and to 2.0.4.0 on November 10, 2023 with additional features related to image editing. The new version offers enhanced features than previous versions.

Fundamental

Original image

When requesting to convert an image or edit an image, we recommend you use an image file that meets the specification. In particular, the file should not be too large in size to use Karlo API efficiently.

Also, the content of the original image should not be overly difficult to understand what is depicted. Karlo needs to be able to understand what is depicted in the original image and what the mood is. We recommend you to use images with clearly identifiable components, such as people or objects.

Prompt

To ensure that your suggestions are easy to understand for Karlo, we recommend you to the guidelines below.

Principles Good example Bad example
Use objective and clear descriptions A cat, blue-eyed, white fur A good white cat with blue eyes seems nice
Write in a concise sentence A wooden bench, old, along the street, sunny day, alone A wooden old bench along the street, it's a sunny day, there is nobody on the bench and the street
Write in a combination of words A black mug, and many books, on a table There are a black mug and many books on the table.

Negative prompt

Enter anything you want to exclude or control from the image when it is generated. You can input negative prompts like "Worst quality" or "Low quality" to prevent low-quality images. An example is below.

Parameter Text
Prompt a majestic reindeer, standing in snow forest at night, hyperrealistic, unreal engine 5, starstation, sharp focus, 8k
Negative prompt low quality, low contrast, draft, amateur, cut off, cropped, frame
Negative prompt sample

Keyword weighting

You can set weights on the keywords in the prompt to more easily create the intended image. To increase the weight, add parentheses ( ) around the keyword, and to decrease the weight, add brackets [ ]. If you want to exclude a subject from the results, instead of weighting, you should use negative prompts. Below is an example of setting a weight on the eggs keyword in the a photo of eggs on a frying pan prompt to generate the intended image.

Keyword emphasis
Emphasis sample
Keyword de-emphasis
De-emphasis sample

Caution

In general, there are certain elements that AI models cannot naturally depict. We recommend you avoid the following prompts when using Karlo.

  • Descriptions of jointed parts, such as hands or feet.
  • Presentations that require multiple people to make a specific facial expression.
  • Any suggestions that require complex composition or placement.

Word choice

Noun

You should use standard language nouns for prompts you want to convey to Karlo. If you use dialects or slang, Karlo may not understand them and not generate correct images. Also, Karlo is very specific about the meaning of prompts, so it is important to avoid homonyms and choose nouns that convey your intent well. Below are examples of Generate image of a tiger, a baby tiger, and a sleeping tiger.

Noun sample

Time

You can include expressions for a time of day in your prompts, such as day, night, dawn, and evening, as well as expressions for a time of year, such as summer and Christmas. Below are examples of when you include day, night, and autumn of Generate image for the same object.

Time sample

Color tone

You can include color descriptions in your prompts. Below are examples that include color descriptors bright, warm, and vivid, respectively, in a request of Generate image of the same object.

Color tone sample

Style

Including "by" and the artist's name in your suggestion will generate images with a similar mood to a specific artist's style. (Example: by Renoir) Below are examples of Generate image of a dog in the garden in the style of several artists.

Style sample

Composition

You can specify unwanted compositions with a negative prompt. Below is an example of Generate image with a negative prompt.

Parameter Text
Prompt hyper realistic photo of cyberpunk sports car driving away
Negative prompt object out of frame, out of frame, body out of frame
Composition sample

Character

You can improve character representation with a negative prompt. By including texts in Caution, you can make a person in the image more natural. Below is an example of Generate image with a negative prompt.

Parameter Text
Prompt a young woman with red hair in white shirt, sharp oil painting, intricate details, medium shot
Negative prompt body out of frame, out of frame, bad anatomy, distortion, disfigured, poorly drawn face, poorly drawn hands
Character sample

Letter

To exclude unnecessary texts, signatures, and watermarks in the image, use a negative prompt. Below is an example of Generate image with a negative prompt.

Parameter Text
Prompt note, bright, warm mood
Negative prompt text, letter, signature, watermark
Letter sample

Advanced: Parameter

upscale

upscale is a parameter that sets whether to upscale the generated image by 2x or 4x. Karlo performs Upscale when the request includes upscale as true. By Generate image, you can generate images with a size of up to 640 pixels in width and height. If using upscale, you can upscale the image with a size of up to 2048 pixels in width and height. The best quality image can be generated when the width and height are set to 512 pixels as the default and upscale is used.

Below is an example of a Generate image request with upscale as true and false (default). Each request has the same prompt and seed.

upscale

prior_num_inference_steps

prior_num_inference_steps is a parameter that sets the variety in the image generation process. The result depends on the value:

Value Advantage Disadvantage
High The generated image reflects the prompt strictly.
The generated image can be more creative.
The desired content may not be included.
The generated image can be abstract or low-quality.
Low The generated images can be similar because of the low variety. The generated images can be less creative.

Below is an example of a Generate image request with prior_num_inference_steps as 10 (minimum), 25 (default), and 100 (maximum). Each request has the same prompt and seed.

prior_num_inference_steps

prior_guidance_scale

prior_guidance_scale is a parameter that sets the scale of the variety that set by prior_num_inference_steps. Karlo may generate different content from the prompt if the value is excessively low.

You can adjust one of the prior_guidance_scale or prior_num_inference_steps and fix another at the same value to improve the quality of the generated image.

Below is an example of a Generate image request with prior_guidance_scale as 1.0 (minimum), 5.0 (default), and 20.0 (maximum). Each request has the same prompt and seed.

prior_guidance_scale

num_inference_steps

num_inference_steps is a parameter that sets the level of detail based on the variety by prior_num_inference_steps. The result depends on the value:

Value Advantage Disadvantage
High The generated image becomes more detailed.
The contents become more organized based on the prompt.
The value affects the result not efficiently.
Low The presentation can be less detailed to meet the usage. The generated image can be low-quality or different content from the prompt.

Below is an example of a Generate image request with num_inference_steps as 10 (minimum), 50 (default), and 100 (maximum). Each request has the same prompt and seed.

num_inference_steps

guidance_scale

guidance_scale is a parameter that sets the guidance scale of the decoder denoising process. If the higher the value is, the generated image reflects the prompt strictly. However, Karlo may generate low-quality images if the value is excessively high.

Below is an example of a Generate image request with num_inference_steps as 5.0 (default) and 20.0 (maximum). Each request has the same prompt and seed.

guidance_scale

scheduler

scheduler is a parameter that sets the scheduler used by the decoder denoising process. You can choose one of decoder_ddim_v_prediction or decoder_ddpm_v_prediction. Even if you request with the same prompt and seed value, the result may differ by scheduler.

  • decoder_ddim_v_prediction: Tend to generate images with sharper representation.
  • decoder_ddpm_v_prediction: Tend to generate images with blur representation.
scheduler

seed

seed is a parameter that sets the seed value for each image. You can generate the same image with the same prompt and seed value. seed is useful when you want to improve the generated image with other parameters.

seed

face_refiner

face_refiner is a parameter related to the refinement of facial structure in the image. It can be requested separately via Refine facial structure, or as a parameter in a request via Generate image, Make variation, Modify image. For the details on each sub-parameter, refer to the below.

bbox_size_threshold

bbox_size_threshold is a parameter that sets the maximum size of the facial area to apply the face_refiner feature, as a ratio to the overall image. Only areas smaller than the set size are recognized as faces.

Below is an example of Refine facial structure request using the same source image with bbox_size_threshold as 0.5(not apply face reshaping) and 0.9(apply face reshaping).

bbox_size_threshold
bbox_filter_threshold

bbox_filter_threshold is a parameter that sets the threshold for determining whether an image is a human face. The higher the value, the stricter the criteria and the higher the probability of determining that it is not a human face.

Below is an example of Refine facial structure request using the same source image with bbox_filter_threshold as 0.95(not apply face reshaping) and 0.8(apply face reshaping).

bbox_size_threshold