====== AutoComplete ======

AutoComplete is a Minecraft hack that generates auto-completions for the user's chat messages, using large language models like GPT-3, GPT-4 and LLaMA.

===== Settings =====

==== OpenAI model ====
{{template>:template:enum
|NAME=OpenAI model
|DESCRIPTION=""The model to use for OpenAI API calls.\\ \\ **GPT-4o-2024-08-06** is one of the smartest models at the time of writing and will often produce the best completions. However, it's meant to be an assistant rather than an auto-completion system, so you will see it produce some odd completions at times.\\ \\ **GPT-3.5-Turbo-Instruct** is an older, non-chat model based on GPT-3.5 that works well for auto-completion tasks.""
|DEFAULT=gpt-4o-2024-08-06
|VALUES=gpt-4o-2024-08-06, gpt-4o-2024-05-13, gpt-4o-mini-2024-07-18, gpt-4-turbo-2024-04-09, gpt-4-0125-preview, gpt-4-1106-preview, gpt-4-0613, gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-3.5-turbo-instruct, davinci-002, babbage-002
}}

==== Max tokens ====
{{template>:template:slider
|NAME=Max tokens
|DESCRIPTION=""The maximum number of tokens that the model can generate.\\ \\ Higher values allow the model to predict longer chat messages, but also increase the time it takes to generate predictions.\\ \\ The default value of 16 is fine for most use cases.""
|DEFAULT=16 tokens
|MIN=1 token
|MAX=100 tokens
|INCREMENT=1 token
}}

==== Temperature ====
{{template>:template:slider
|NAME=Temperature
|DESCRIPTION=""Controls the model's creativity and randomness. A higher value will result in more creative and sometimes nonsensical completions, while a lower value will result in more boring completions.""
|DEFAULT=1
|MIN=0
|MAX=2
|INCREMENT=0.01
}}

Note: Temperature values above 1 will cause most language models to generate complete nonsense and should only be used for comedic effect.

==== Top P ====
{{template>:template:slider
|NAME=Top P
|DESCRIPTION=""An alternative to temperature. Makes the model less random by only letting it choose from the most likely tokens.\\ \\ A value of 100% disables this feature by letting the model choose from all tokens.""
|DEFAULT=100%
|MIN=0%
|MAX=100%
|INCREMENT=1%
}}

==== Presence penalty ====
{{template>:template:slider
|NAME=Presence penalty
|DESCRIPTION=""Penalty for choosing tokens that already appear in the chat history.\\ \\ Positive values encourage the model to use synonyms and talk about different topics. Negative values encourage the model to repeat the same word over and over again.""
|DEFAULT=0
|MIN=-2
|MAX=2
|INCREMENT=0.01
}}

==== Frequency penalty ====
{{template>:template:slider
|NAME=Frequency penalty
|DESCRIPTION=""Similar to presence penalty, but based on how often the token appears in the chat history.\\ \\ Positive values encourage the model to use synonyms and talk about different topics. Negative values encourage the model to repeat existing chat messages.""
|DEFAULT=0
|MIN=-2
|MAX=2
|INCREMENT=0.01
}}

==== Stop sequence ====
{{template>:template:enum
|NAME=Stop sequence
|DESCRIPTION=""Controls how AutoComplete detects the end of a chat message.\\ \\ **Line Break** is the default value and is recommended for most language models.\\ \\ **Next Message** works better with certain code-optimized language models, which have a tendency to insert line breaks in the middle of a chat message.""
|DEFAULT=Line Break
|VALUES=Line Break, Next Message
}}

Note: "certain code-optimized language models" is a reference to OpenAI's ''code-davinci-002'' model, which worked much better when using the "Next Message" option and is unfortunately no longer available. It's possible that open source code models like StarCoder will see a similar improvement when using the "Next Message" option.

==== Context length ====
{{template>:template:slider
|NAME=Context length
|DESCRIPTION=""Controls how many messages from the chat history are used to generate predictions.\\ \\ Higher values improve the quality of predictions, but also increase the time it takes to generate them, as well as cost (for APIs like OpenAI) or RAM usage (for self-hosted models).""
|DEFAULT=10 messages
|MIN=0 (unlimited)
|MAX=100 messages
|INCREMENT=1 message
}}

==== Filter server messages ====
{{template>:template:checkbox
|NAME=Filter server messages
|DESCRIPTION=""Only shows player-made chat messages to the model.\\ \\ This can help you save tokens and get more out of a low context length, but it also means that the model will have no idea about events like players joining, leaving, dying, etc.""
|DEFAULT=not checked
}}

==== Custom model ====
{{template>:template:textfield
|NAME=Custom model
|DESCRIPTION=""If set, this model will be used instead of the one specified in the \"OpenAI model\" setting.\\ \\ Use this if you have a fine-tuned OpenAI model or if you are using a custom endpoint that is OpenAI-compatible but offers different models.""
|DEFAULT=(empty)
}}

==== Custom model type ====
{{template>:template:enum
|NAME=Custom model type
|DESCRIPTION=""Whether the custom model should use the chat endpoint or the legacy endpoint.\\ \\ If \"Custom model\" is left blank, this setting is ignored.""
|DEFAULT=Chat
|VALUES=Chat, Legacy
}}

==== OpenAI chat endpoint ====
{{template>:template:textfield
|NAME=OpenAI chat endpoint
|DESCRIPTION=""Endpoint for OpenAI's chat completion API.""
|DEFAULT=''https://api.openai.com/v1/chat/completions''
}}

The "OpenAI chat endpoint" setting allows the user to use OpenAI's chat completion API through a proxy. This is necessary in some countries where OpenAI's APIs are banned.

It may also be useful for Microsoft Azure customers who have their own endpoint, but this has not been tested yet. There are subtle differences in the Azure version of the API, so it's possible that it won't work with AutoComplete.

==== OpenAI legacy endpoint ====
{{template>:template:textfield
|NAME=OpenAI legacy endpoint
|DESCRIPTION=""Endpoint for OpenAI's legacy completion API.""
|DEFAULT=''https://api.openai.com/v1/completions''
}}

==== Max suggestions per draft ====
{{template>:template:slider
|NAME=Max suggestions per draft
|DESCRIPTION=""How many suggestions the AI is allowed to generate for the same draft message.""
|DEFAULT=3
|MIN=1
|MAX=10
|INCREMENT=1
}}

The "Max suggestions per draft" setting controls how many different suggestions the AI will try to generate for the same draft message. Higher values will result in more suggestions, but will also use up more tokens and be more expensive for OpenAI API users. This setting can be useful for exploring different response options.

Setting "Max suggestions per draft" to a higher value than "Max suggestions shown" is usually not a good idea, as there will be no way to see the additional suggestions.

==== Max suggestions kept ====
{{template>:template:slider
|NAME=Max suggestions kept
|DESCRIPTION=""Maximum number of suggestions kept in memory.""
|DEFAULT=100 messages
|MIN=10 messages
|MAX=1000 messages
|INCREMENT=10 messages
}}

The "Max suggestions kept" setting only controls at what point old suggestions are deleted from memory. Higher values don't use any additional tokens and only consume a tiny amount of RAM. This is why the range of values is so much higher than for the other settings.

==== Max suggestions shown ====
{{template>:template:slider
|NAME=Max suggestions shown
|DESCRIPTION=""How many suggestions can be shown above the chat box.\\ \\ If this is set too high, the suggestions will obscure some of the existing chat messages. How high you can set this depends on your screen resolution and GUI scale.""
|DEFAULT=5
|MIN=1
|MAX=10
|INCREMENT=1
}}

The "Max suggestions shown" setting controls how many suggestions can be shown at once on the screen. Depending on the user's screen resolution and GUI scale, higher values may cause the suggestions to cover up other parts of the UI.

Setting "Max suggestions per draft" to a higher value than "Max suggestions shown" is usually not a good idea, as there will be no way to see the additional suggestions.

===== Changes =====
{{template>:template:changes|FEATURE=autocomplete}}

{{tag>client-side}}