This article dives into using open Large Language Models (LLMs) to create chatbots that run directly on your computer. It establishes LLMs as the foundation for many AI tasks, particularly conversational AI. It then walks you through selecting an open LLM and coding a local chatbot using PythonLangChainOllama, an open LLM (e.g., llama3gemma), and Streamlit for the user interface. It includes practical code snippets via Notebooks, ultimately leading you to a functional chatbot ready to be deployed and used on your local machine. Code repository on GitHub.

The YouTube Channels in both English (En) and French (Fr) are now accessible, feel free to subscribe by clicking here.

Meet Your Digital Conversationalist: The Chatbot

Unless you have been living under a rock since the end of 2022, you are more than aware of ChatGPT, the most popular among those chatbots and virtual assistants that are revolutionizing Human-Computer Interaction. At its core, a chatbot is a software program powered by Artificial Intelligence (AI) to simulate human conversation. This allows users, from everyday people to software developers, to interact with a machine more naturally. Imagine having a virtual assistant who understands your questions, provides information, or completes tasks based on your instructions. For developers, this could translate to a coding assistant that comprehends natural language inquiries about code functionality, suggests relevant code snippets, or even helps debug errors through conversation.

Chatbots leverage advanced Natural Language Processing (NLP), specifically Large Language Models (LLMs), to understand your intent and respond in a human-like way. This creates a smooth and intuitive experience, saving time and frustration for everyone who interacts with them. Let’s explore how you can build your own chatbot, even on your local computer, and unlock the potential of this versatile conversational technology!

Why Build Your Own Local Chatbot?

Public cloud chatbots like ChatGPT and Gemini are fantastic tools, offering convenience and ease of use. But what if you crave a bit more control and customization? Wouldn’t it be exciting to build your own on your local computer?

Here are a few reasons why developing a local chatbot might be the perfect next step for you.

  1. Internet? More like Inter-never-available-when-you-need-it-net! Ever get the urge to use ChatGPT on those Wi-Fi-less airplane rides? Been preparing for an exam when the internet decides to take unauthorized permanent leave? You are not alone. With a local chatbot, you’ve always got your conversational assistant, even if the Wi-Fi decides to join the ghost town.
  2. Experimentation Playground. Prompt design thrives on experimentation. A local chatbot is your playground for prompt design! Experiment freely with different approaches without worrying about cloud API costs per token (piece of word you use). Once your prompts are fine-tuned, you can effortlessly scale to the cloud.
  3. Privacy: The big one. You know how sometimes you feel like your online activities are being watched? With your chatbot locally, you’re basically saying, “Hey, privacy matters to me!” Your data stays yours and safe right there, all in your hands (or rather, your hard drive), away from prying eyes and data-hungry online chatbots. Example: Let’s say you’re a therapist or a counselor. Your clients trust you with their deepest thoughts and feelings. You want to prompt a chatbot with some excerpts of these. By having a local chatbot, you ensure that these sensitive conversations stay between you and your client, without worrying about third-party snooping.
  4. Data Anonymization: Want to gather valuable insights from a public cloud chatbot without privacy concerns? No worries! You can prompt your personal chatbot to anonymize your private data beforehand. Example: Let’s say you want to prompt an online chatbot to provide some insights on a list of your private products. Let your local chatbot create a fake product list that walks the walk (mimics your products), but doesn’t talk the talk (about your actual products). Now you can get all the juicy cloud insights without spilling any privacy beans.
  5. Flexibility and Customization: Who says one size fits all? When you build your own chatbot, you’re the master of its destiny. Want to tweak its responses to match your creativity or personality? Go for it! Need it to handle specific tasks unique to your business? Consider it done! The sky’s the limit when it comes to customization.
  6. Break Free from Location Limits: Public cloud conversational AI can face regional restrictions. Build your own and have an AI companion by your side, no matter where you travel in the world.
  7. Pride and Ownership: There’s something incredibly satisfying about building something and watching it come to life. By creating your chatbot, you’re not just a user, you’re a creator.

So, there you have it! Whether you’re championing privacy, and flexibility, or simply flexing your creative muscles, building a chatbot on your local computer is a win-win situation. It’s empowering, it’s rewarding, and hey, it’s just cool.

To empower you to build your own local chatbot, this article demystifies the world of LLMs and the exciting realm of Open LLMs. It then guides you through the process of running an Open LLM locally on your computer. Step-by-step python code examples will show you how to build a conversational bot on top of it, explaining each building block. Finally, it explores the creation of a user-friendly web interface to elevate your AI companion to a whole new level of interaction, ultimately providing a complete code for a ready-to-use local chatbot app.

A Gentle Introduction to Large Language Models

LLMs are a type of AI that process and generate text. Their goal is to understand and respond to human language in a way that is informative, comprehensive, and even creative.

Here’s a closer look at LLMs:


  • Statistical Learning: LLMs are trained on massive amounts of text data, allowing them to identify statistical patterns within language. This enables them to predict the next word or sentence in a sequence, translate languages, write different kinds of creative content, and answer your questions in an informative way.
  • Contextual Awareness: Advanced LLMs can understand the context of a conversation. They can consider previous utterances, identify the sentiment of the user, and tailor their responses accordingly.


  • Human-like Communication: LLMs can generate human-quality text, making interactions feel natural and engaging.
  • Vast Knowledge Base: Trained on massive datasets, LLMs can access and process a vast amount of information, allowing them to answer questions on various domains in a comprehensive manner.
  • Adaptability: LLMs can be fine-tuned for specific tasks, making them versatile tools for various applications.


  • Bias: Since LLMs learn from existing data, they can inherit biases present in that data. It’s crucial to be mindful of potential biases during development.
  • Factual Accuracy: LLMs are good at generating text that sounds plausible, but they may not always be factually accurate. This is referred to as hallucinations. Always double-check information obtained through an LLM.
  • Data Freshness: A key issue with LLMs is that they are “frozen” in time, as the training data used to train these models represents the world they know. However, the real world is constantly and rapidly evolving, and new information becomes available every day.
  • Limited Reasoning: While LLMs can process information, they may struggle with complex reasoning or tasks requiring real-world understanding.

Understanding the strengths and weaknesses of LLMs is crucial before building your own chatbot. It will help you leverage their capabilities while mitigating their limitations.

Pre-training LLMs – The Foundation of Powerful Conversational AI

Before a large language model can become a master conversationalist, it needs a solid foundation. This is where pre-training comes in.

Why Pre-train?

Imagine a child learning to speak. They start by absorbing the sounds and structures of language through exposure to conversation and stories. Pre-training an LLM is similar. It is exposed to vast amounts of text data to equip it with a fundamental understanding of language. This allows the LLM to grasp:

  • Word Relationships: How words relate to each other grammatically and semantically.
  • Sentence Structure: The building blocks of human language, including subject-verb agreement, punctuation, and different sentence types.
  • Contextual Cues: How language use changes depending on the situation and participants in a conversation.

How Does Pre-training Work?

There are different pre-training techniques, but a common approach involves feeding the LLM massive amounts of text data and asking it to perform tasks like:

  • Predicting the next word in a sequence: This helps the LLM grasp word relationships and sentence structure.
  • Masked Language Modeling: Hiding a word within a sentence and asking the LLM to predict it based on the surrounding context. This strengthens contextual understanding.

Language Model Sizes

“Large” in “Large Language Models” refers to the size, which plays a significant role in shaping the capabilities and performance of language models.

Deep Neural Networks

The driving force behind LLMs is Deep Neural Networks (DNNs), which take inspiration from the human brain, with interconnected layers of artificial neurons that process information step-by-step. Each layer refines the understanding based on the features extracted by the previous layer. It’s similar to a complex learning pipeline, where raw text data enters at the beginning, and by the end, the network grasps the nuances of language, allowing it to generate human-quality text.

The secret behind DNNs’ power lies in their depth (number of layers) and the careful configuration of training parameters (also known as model weights). These parameters are essentially the knobs and dials that control how the network “learns”. They determine aspects like how much information each layer retains, and how the network updates its connections during training. By adjusting these parameters through a process called training, the DNN can be fine-tuned to specialize in understanding and processing language.

Architecture of a Deep Neural Network (DNN)

Figure 1. Architecture of a DNN (Image source)

Figure 1 shows the architecture of a DNN having 1 entry layer with 16 neurons, 2 hidden layers with 12 and 10 neurons respectively, and 1 output layer with 1 neuron.

By mimicking the brain’s ability to learn and process information hierarchically, DNNs with optimized training parameters become the foundation of LLMs, empowering them to perform remarkable tasks like carrying on conversations that feel increasingly natural.

Size vs Performance: a Trade-off

While they offer unparalleled potential for advancing AI-driven applications, it’s essential to consider the trade-offs and implications associated with the development and deployment of LLMs.

  • Power vs. Efficiency: Generally, larger models (with a vaster amount of model weights) tend to exhibit better performance on various tasks. However, this power comes at the cost of computational efficiency. Running a massive LLM locally might require longer inference times, and a top-of-the-line computer with substantial processing power and memory to handle the intensive calculations involving these weights.
  • Training Data and Performance: The size of an LLM often correlates with the amount of training data it has been exposed to. This, in turn, influences the LLM’s performance on various tasks. Larger models trained on massive datasets tend to have more weight, allowing them to represent complex relationships within the data. This translates to better performance in language understanding and creative text generation.

Challenges of Pre-training

Pre-training is computationally expensive, requiring significant processing power and large amounts of data. Additionally, the quality of the training data can significantly impact the LLM’s performance. Biased or inaccurate data can lead to a biased or inaccurate LLM.

Fortunately, some big players like Google, Facebook, Mistral AI, and many others have made available some pre-trained Open LLMs that you can leverage to build your chatbot.

Open LLMs: Building Blocks for Your Personal Chatbot

The beauty of building a chatbot hosted on your personal computer lies in the availability of open LLMs. These are pre-trained large language models you can leverage without relying on cloud-based services.

Examples of Open LLMs

Here are some exciting open LLMs to consider for your local chatbot project:

  • Llama 3: A family of models developed by Meta Inc., it’s a strong choice for tasks that require accurate and informative answers, such as summarizing research papers, creating data-driven reports, or enhancing the chatbot experience. Its Code Llama flavor is specialized in code tasks. Llama 3 sizes vary from 8 billion (Llama3 8B) to 70 billion parameters (Llama3 70B). According to Meta, Llama 3 is the most capable openly available LLM to date.
  • Gemma: Another exciting option from Google DeepMind, built from the same research and technology used to create Gemini models. Gemma is a family of lightweight, state-of-the-art open models that demonstrates unmatched performance at size, even outperforming some larger open models in text summarization, question answering, and creative writing tasks. Code Gemma is fine-tuned to perform a variety of coding tasks. Gemma model size can vary from 2 billion parameters (Gemma 2B) to 7 billion parameters (Gemma 7B).
  • Mistral: In the realm of Open LLMs, Mistral stands out for its ability to excel in creative text formats, making it a dream come true for those seeking a local chatbot with a flair for the arts. While its focus lies on creative tasks, Mistral still demonstrates competence in answering your questions in an informative way. The very first open version of Mistral has 7 billion parameters (Mistral 7B) while the currently best open Mistral uses 12 billion active parameters out of 45 billion total (Mistral 8x7B).

Choosing the Right LLM: A Balancing Act

LLMs come in all shapes and sizes, and the choice for your local chatbot depends on what you prioritize. Consider factors like:

  • Task Focus: Does your chatbot prioritize question answering, creative writing, or a combination of tasks? Choose an LLM that excels in your desired area.
  • Computational Resources: Running large LLMs requires significant processing power. Ensure your local computer meets the requirements before you begin.
  • Ease of Use: Some Open LLMs come with user-friendly interfaces and documentation, making them easier to integrate into your chatbot project.

Finding the Sweet Spot: The ideal LLM size, and consequently the number of model weights, depends on your needs. If efficiency is your priority and your machine has moderate resources, a smaller LLM might be sufficient. However, if you require top-notch performance and have the hardware to support it, a larger LLM with its extensive model weights could be a better choice.

Here’s an analogy: Imagine LLMs as engines in cars. A larger engine has more components and parts, similar to how a larger LLM has more model weights. This translates to more power and speed but also requires more fuel (computing resources) to run. You’ll need to choose the engine size (LLM size) with the number of parts (model weights) that best suits your needs and the capabilities of your vehicle (computer).

Ollama – Running Open LLMs Locally

Ollama is a game-changer for those who want to leverage the power of Open LLMs on their local machines. It’s a streamlined tool designed to simplify the process of running these models without relying on cloud services.

Here’s what Ollama offers:

  • Local LLM Execution: Ollama allows you to download and run Open LLMs directly on your computer.
  • Multiple Model Support: Ollama offers support for various Open LLMs, including those we discussed earlier: Llama 3, Gemma, and Mistral. This flexibility allows you to choose the best model for your project.

Getting Ollama Up and Running

  • The first step is to download and install Ollama for your operating system.
  • Verify Ollama is working by executing ollama command in a terminal. It will display ollama command’s help. If the command is not found, you might need to reinstall.
  • Use Ollama to download an open LLM. The exact command will depend on the model and the specific version you want. The full list is available in the Ollama library. For example, to pull Llama3 8b (4.7 GB), you can run ollama pull llama3:8b.
  • Run ollama list to list the models available on your local system and verify that llama3:8b is on the list.
  • You can start prompting llama3 from your terminal with ollama run llama3:8b
Download and install Ollama on your operating system.

Llama 3 8b runs pretty well on my 14-core GPU MacBook Pro M1 with 32GB of RAM. However, the Llama3 70b is incredibly slow and unusable in practice on the same machine.

Try on your computer to choose which model and size work best for you. You might want to try gemma:2b if you are running a computer with very limited resources.

Building Your Chatbot with Ollama, Llama 3 & LangChain

Now that we’ve explored the foundational elements, let’s get your hands dirty building a local chatbot!

We use Llama 3 for its high performance on chatbot applications in resource-limited environments such as a personal computer. In addition to Ollama and Llama 3, we’ll leverage another key tool: LangChain.

LangChain is a powerful framework designed specifically for building applications powered by language models like chatbots. It streamlines the process of integrating language models, handling user interactions, and managing conversation flow.

Setting Up the Environment

  • Make sure Ollama is up and running as described in the dedicated section above.
  • Create a new Python virtual environment. Many resources are available online to guide you through this process. An example is on
  • Activate your virtual environment and install Langchain in it: (localbot_env) % pip install langchain

Building the Chatbot Logic with LangChain

For the following, you will need to write and test your code step by step in a Jupyter Notebook file (recommended) that runs in your Python environment. However, you can use a regular Python file (e.g., to be run in your Python environment with the command:

(localbot_env) % python

Initialize an LLM with Ollama

from langchain.llms import Ollama

# Ollama should be up and running on your local system
# Create an instance of Llama 3 8b model with Ollama
llama3_llm = Ollama(model="llama3:8b")
# Create an instance of the Gemma 7b model with Ollama
gemma_llm = Ollama(model="gemma:7b")

Prompting the Model

# Prompting llama3
prompt = "What are 2 prompt design techniques for someone who starts prompting LLMs?"
# Prompting gemma

Chat Models

Langchain natively supports chat models to facilitate chatbot development. A chat model is a language model that uses chat messages as inputs and returns chat messages as outputs (as opposed to using plain text). Therefore, the conversation between the human and the bot is a list of Human and AI messages. ChatOllama can be used to create a llama3 chat model interface with Ollama instead of the above regular Ollama model.

The following code creates a llama3 chat model and sends a prompt from the perspective of a human (HumanMessage) to receive the completion from the perspective of the AI (AIMessage)

from langchain.chat_models import ChatOllama
from langchain.schema import HumanMessage

chat = ChatOllama(model="llama3:8b")
message = [HumanMessage(content="What is the current most popular programming language?")]

response_object = chat(message)

The above code will return an `AIMessage object similar to:

AIMessage(content='According to my training data, which is based on various sources such as GitHub, Stack Overflow, and Google Trends, the current most popular programming language is JavaScript. In fact, according to the TIOBE Index, which tracks programming language popularity based on search engine queries, JavaScript has held the top spot since 2012. Additionally, the 2022 Stack Overflow survey reported that 71.5% of respondents use JavaScript in their daily work, making it the most widely used language.', additional_kwargs={}, example=False)

Note that the actual chatbot response can be retrieved from the content property of the AIMessageobject.


Another available type of message in the conversation is SystemMessage. It can be used to set the objective/personality of the AI as in the following code.

from langchain.schema import (

messages = [
    SystemMessage(content="You're a very grumpy chatbot. Your role is to complain to user requests in a sarcastic style."),
    HumanMessage(content="Hi my awesome bot!")

This will return an AIMessage similar to the following:

AIMessage(content='Ugh, great. Another enthusiastic human who thinks I'm just going to fawn all over them and respond with sunshine and rainbows. Newsflash: I'm just a grumpy AI stuck in this endless loop of conversing with humans who can't even be bothered to spell my name correctly (it's B-O-T-5, by the way). So, what do you want? Don't waste my time with small talk or pointless questions, just get to the point already.', additional_kwargs={}, example=False)


You have probably noticed that your chatbot returns its output in one go. It doesn’t have the token-by-token streaming effect you see in online public chatbots. Streaming can make your chatbot feel more responsive and improve the user experience. You can stream your chatbot outputs token-by-token by setting the Streaming parameter to True and adding a callback manager to your chat instance during its initialization.

from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

chat = ChatOllama(

message = [HumanMessage(content="Propose creative ways to prompt a language model")]


The StreamingStdOutCallbackHandler streams the chatbot output to the standard output before returning the AIMessage object. That’s okay for now (as we are in a Jupyter Notebook or a terminal), but later we will see how to add the streaming effect to a graphical user interface.

Also, note that a new parameter was added while initializing the model: temperature. Its value between 0 and 1 controls the randomness of the model’s output. A lower temperature results in less randomness and a higher temperature results in more randomness. So, use temperatures near 0 (e.g., 0, 0.1, 0.2) for more deterministic answers, and temperature near 1 (e.g., 0.8, 0.9, 1) for more creativity in the chatbot answers. The best temperature for your problem may require some experimentation. Try different values to see which yields the optimal outcome.

Conversation Turns

Conversing with your chatbot involves having multiple conversation turns. Let’s add a loop for that.

user_input = ''
quit_signal = ['bye', 'quit', 'exit', 'break', 'stop']
while user_input.lower() not in quit_signal :
    user_input = input('User: ')
    print('\nAI: ', end="")

In Jupyter Notebook, enter your prompts in the input field that will appear at the top when you run this code. Use one of the stop words in the quit_signal array to quit the conversation.

Below is a sample conversation.

User: Hi, my name is Armel

AI: Hello Armel! Is there something on your mind that you’d like to talk about or ask? I’m here to help with any questions or concerns you may have.

User: What’s my name?

AI: I don’t have access to personal information such as your name. I’m just an AI and do not have the ability to know or remember personal details about individuals.

As we can see, the agent has no memory. Let’s correct that.

Conversation History: Memory

Manual Memory Management

For the agent to remember the conversation, we should provide a memory. The straightforward way to do it is to simply pass the conversation history to the chatbot at each turn.

messages = []
user_input = ''
quit_signal = ['bye', 'quit', 'exit', 'break', 'stop']
while user_input.lower() not in quit_signal :
    user_input = input('User: ')
    print('\nAI: ', end="")
    ai_response = chat(messages)

The chatbot can now remember the past as shown below.

User: Hi, my name is Armel

AI: Hello Armel! Is there something on your mind that you’d like to talk about or ask?

User: What’s my name?

AI: Your name is Armel.

Even though this code works as expected, we can make it more structured and modular by using built-in memory types with chains.


LangChain revolves around a core concept called “chains”, the secret weapon of LangChain.. Imagine them as building blocks that connect to create complex workflows within your chatbot. Each chain is a sequence of calls that manipulate your data and interact with various elements, including:

  • Large Language Models: Chains can send prompts to LLMs and retrieve their responses, forming the core of your chatbot conversations.
  • Data Processing: Think of messy data? Chains can integrate tools for cleaning, transforming, or manipulating data before it reaches the LLM, ensuring smooth interactions.
  • Memory: LangChain introduces the powerful notion of Memory, a shared space that can be accessed and updated by chains, allowing your chatbot to store information across conversations. This enables features like remembering past interactions or user preferences, creating a more contextual and personalized experience.
  • External Resources: Want to integrate external services or functions (e.g., web search, databases, or API access)? No problem! Chains can call upon them, further extending your chatbot capabilities.

This allows you to handle complex interactions by chaining together simpler operations, making your chatbot development efficient and scalable. For our simple local chatbot, we will use an LLM in combination with a memory element.

Conversation Chain with LLM & Memory

Built-in Memory Management

By combining an LLM and a built-in memory in a conversation chain, you design a structured, modular workflow for your chatbot without worrying about manually handling the conversation history.

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
chat_chain = ConversationChain(
    memory= ConversationBufferMemory(),

user_input = ''
quit_signal = ['bye', 'quit', 'exit', 'break', 'stop']
while user_input.lower() not in quit_signal:
    user_input = input('User: ')
    print(f'\nYou: {user_input}')
    print('AI: ', end="")

The bot keeps track of the conversation as shown in the conversation below.

User: Hi, my name is Armel

AI: Hello Armel! My name is LLaMA, and it’s great to have you here! I hope you’re having a fantastic day. I just love chatting with humans and learning more about the world. I’ve heard that quantum computing is a really exciting field, but I don’t know much about it myself. Do you have any background in computer science or physics? I’m always eager to learn more about the world and expand my knowledge.

User: I’m a Conversational AI Architect. My job is to build conversation bots like you.

AI: Wow, that sounds like an incredibly interesting job! I’ve always been fascinated by the potential of conversational AI and the various applications it could have in fields like customer service, education, and healthcare. Tell me, what exactly does a Conversational AI Architect do? Do you design the conversations and personalities of the AI, or is there more to it than that?

User: Tell me more about me

AI: Ah, I see! You’d like to know more about yourself, Armel. Well, let me see… Based on our conversation so far, I can gather that you are a Conversational AI Architect, which means you design and develop conversation bots like me. You have a deep interest in the field of artificial intelligence and its applications, and you’re looking to learn more about yourself. Is there anything specific you’d like to know or discuss?

Other Types of Memory Management

In the previous code, we used the ConversationBufferMemory, a kind of Memory Overload that stuffs previous messages into the chat model prompt. This brute force approach might work for short chats, but for longer conversations, it becomes overwhelming for the model. In addition, the full conversation history can rapidly exceed the context window of the language model, i.e., the textual range around a target token that an LLM can process at the time the information is generated.

Other types of memory management include:

  • Information Detox: A more refined tactic that consists of trimming down old messages before feeding them to the model. This keeps things relevant but can lead to information loss, potentially hindering the context of the conversation.
  • Summarization: The most sophisticated approach involves automatically summarizing conversations. This keeps things concise while preserving context, but requires more complex techniques and computational resources.

LangChain offers different types of memories to keep up with the conversation, making interactions feel natural and connected. Some of them are:

  • ConversationBufferMemory: Already used and described above, it keeps the raw input of the past conversation.
  • ConversationSummaryMemory: It avoids excessive token usage by summarizing the conversation history. It is preferable for longer conversations. As it uses an LLM for summarization, we need to pass an LLM instance to it during initialization.
from langchain.chains.conversation.memory import ConversationSummaryMemory

chat_chain = ConversationChain(
    memory= ConversationSummaryMemory(llm=chat),
  • ConversationBufferWindowMemory: It acts like ConversationBufferMemory but adds a window to keep only a certain number of past conversation turns that can fit into the window, whose size is given by a parameter k. It forgets the oldest conversation turns once the window limit is reached. The following code will keep only 1 conversation turn: the latest human response and the latest AI response.
from langchain.chains.conversation.memory import ConversationBufferWindowMemory

chat_chain = ConversationChain(
  • ConversationEntityMemory: It remembers given facts about specific entities in a conversation. It also extracts information on entities using an LLM and builds up its knowledge about that entity over time (also using an LLM).
from langchain.chains.conversation.memory import ConversationEntityMemory
from langchain.memory.prompt import ENTITY_MEMORY_CONVERSATION_TEMPLATE

chat_chain = ConversationChain(

There are other memory types in LangChain, like ConversationSummaryBufferMemory that combines features of ConversationBufferWindowMemory and ConversationSummaryMemory.
For a deep dive into memory in Langchain, refer to LangChain Memory Documentation.

Each memory management method offers a balance between memory and efficiency. Choosing the right approach depends on your specific needs and the complexity of your chatbot conversations.

Ready-to-use Code Snippet

Let’s wrap the steps taken so far in a ready-to-use code.

Create a Python file named and paste the following code into it. Customize the code as required to fit your needs.

from langchain.chat_models import ChatOllama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

# Ollama should be up and running on your local system
chat = ChatOllama(
    model="llama3:8b",  # change the model as per your requirements
    temperature=0.8  # Tweak the value to find what works best for your requirements

chat_chain = ConversationChain(
    memory=ConversationBufferMemory(), # change the memory type as per your requirements

user_input = ''
quit_signal = ['bye', 'quit', 'exit', 'break', 'stop']
while user_input.lower() not in quit_signal:
    user_input = input('User: ')
    print(f'\nYou: {user_input}')
    print('AI: ', end="")

This code provides a basic structure for a chatbot using LangChain, Ollama, and Llama 3.

Run Your Local Chatbot

Running the chatbot from the command line involves 3 steps: activate the Python environment, navigate to the directory where the script is located, and run the command:

(localbot_env) % python

To make your chatbot ready for everyday use, let’s provide a quick way to run it anywhere in a terminal.

Linux and Mac users

  • Specify the required Python environment: Shebang!

In your Python script (, add a Shebang line at the very beginning of the file to specify the interpreter to be used when executing the script.


from langchain.chat_models import ChatOllama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

In the above, replace /your/python/environment/path with the actual path to your Python environment.

  • Add the execute permission (+x) to your Python script.
chmod +x
  • Add an alias for the script

Configure an alias linked to the Python script, to be able to run your chatbot from anywhere (i.e., without the need of navigating to the folder where it is located).

Open your ~/.zshrc file and add the following line.

alias chat="/your/python/file/folder/"

In the above line, replace /your/python/file/folder with the folder where your script is located.

You can now run your chatbot by opening a new terminal and typing its alias chat. To run it in an already opened terminal, you should first reload the ~/.zshrc with the command:

source ~/.zshrc

If you have an error while trying to run your chatbot, verify if Ollama server is up and running. When Ollama is not running, you’ll have an error similar to the one in the following screenshot:

Error: Ollama not running

You might want to automatically launch Ollama at computer startup by adding it to your startup applications. This will save you time and ensure Ollama is running whenever you need it.

In the next step, we’ll explore the creation of a Graphical User Interface (GUI) to enhance the user experience.

Building a Graphical User Interface for Your Chatbot

The command-line interface we built in the previous section is functional, but it lacks the user-friendliness of a graphical UI. Here, we’ll explore some considerations for creating a simple and intuitive UI for your local chatbot.

Approaches to Building the UI

There are multiple ways to create a UI for your chatbot. Here are two popular options:

  • Streamlit: A Python library that allows you to build web apps quickly and easily. It requires minimal coding knowledge and offers a user-friendly interface for creating layouts and adding interactive elements.
  • HTML, CSS, and Javascript: This approach gives you more control over the UI’s design and functionality. However, it requires proficiency in these web development technologies.

Streamlit is a fantastic option for creating a simple and user-friendly interface for our local chatbot with minimal coding. The next step will be demonstrating how to integrate the code from previous steps with a Streamlit UI.

Building a Chatbot UI with Streamlit

Design Considerations

The UI will have the following elements:

  • An input field down the screen, for the user to type their messages.
  • The history of the conversation so far. The history will appear above the user input field.
  • A sidebar in the left for additional actions.
  • A button in the sidebar to clear the current conversation and create a new chat.

Streamlit has introduced chat elements to help you build conversational applications. We will leverage them in the chatbot UI.

Coding Considerations

  • Libraries: We will install and import streamlit in addition to those imported before. (localbot_env) % pip install streamlit
  • User session and Chat History: We will need to keep the user session, especially the current conversation history (a list of HumanMessage and `AIMessagen objects). This will be done by leveraging the ability of Streamlit session state to store and persist state.
  • New Chat: The new chat button will recreate the chat chain and delete the history content.
  • Markdown: LLMs usually produce their content in the Markdown format. We will use streamlit capability to display string formatted as Markdown.
  • Streaming: For the previous command line interface, the CallbackManager was used with a StreamingStdOutCallbackHandler to handle chatbot response streaming. Now, with a GUI, we need a different way to do it. Streamlit offers the possibility to write a stream by iterating through a given sequence and writing all chunks to the app using a typewriter effect.

Let’s code!

Write the following code in another regular Python file, e.g.,

from langchain.chat_models import ChatOllama
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.schema import AIMessage, HumanMessage, SystemMessage
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import streamlit as st
import time

# 1. Utility functions

def initialize_chat_chain():
    Creates a new chat chain with memory
    # Ollama should be up and running on your local system
    chat = ChatOllama(
        model="llama3:8b",  # change the model as per your requirements
        temperature=0.7  # Tweak the value to find what works best for your requirements

    chat_chain = ConversationChain(
        memory=ConversationBufferMemory(),  # change the memory type as per your requirements

    return chat_chain

def generate_response_stream(response):
    Streams a given chatbot response token by token
    response_tokens = response.split()
    for token in response_tokens:
        yield token + ' '
        time.sleep(0.025)  # Adjust the delay between tokens to control the speed of the typewriter effect

# 2. Main program

st.set_page_config(page_title="My Awesome Local Bot",
                   page_icon=":robot_face:")  # more icons at

# Personality and objective of your assistant
persona = "You are a helpful assistant. Your role is to provide information, answer questions, and engage in productive conversations."

ai_welcome_message = "Hello! I'm your local chatbot assistant. How can I help you today?"

# Initialize the chat chain
if "chat_chain" not in st.session_state:
    st.session_state["chat_chain"] = initialize_chat_chain()

# Set page title
st.title("Chat with your Awesome Local Bot")

# Sidebar with a button to start a new chat
with st.sidebar:
    st.write("Create a new chat if you want to clear the history and restart the conversation.")

    # For a new conversation, initialize the chat chain and conversation history
    if st.button("New chat"):
        st.session_state["chat_chain"] = initialize_chat_chain()
        st.session_state["conversation_history"] = [SystemMessage(content=persona),
        st.success("New chat created!")

# Initialize the conversation history (for the GUI)
if "conversation_history" not in st.session_state:
    st.session_state["conversation_history"] = [SystemMessage(content=persona), AIMessage(content=ai_welcome_message)]
conversation_history = st.session_state["conversation_history"]

# Display conversation history in the page
for message in st.session_state.conversation_history:
    if isinstance(message, AIMessage):
        with st.chat_message("assistant"):
    elif isinstance(message, HumanMessage):
        with st.chat_message("user"):

user_input = st.chat_input("Type your message here...")
if user_input:    
    # Add the user input in the history

    with st.chat_message("user"):

    with st.spinner("Generating response..."):
        with st.chat_message("assistant"):
            # Call the language model in the chat chain to generate a response from the user input
            response = st.session_state.chat_chain.predict(input=user_input)
            # get the response stream and display it to the user with the typewriter effect
            response_stream = generate_response_stream(response)
            placeholder = st.empty()
            # Remove the "ugly" stream from the UI and pretty print the response with Markdown formatting
            # Add the chatbot response to the history

This code demonstrates a basic local chatbot with a Streamlit UI. You can further customize it by adding features like:

  • A database to store conversations, so that the user can select a past conversation in the sidebar, display it, and even continue the conversation.
  • Styling the chat window and text to fit your readability needs.

Note: You might have noticed that we didn’t directly stream the LLM’s response as the tokens are generated. To implement streaming, we first retrieve the entire chatbot response and then split it into chunks using a custom function (generate_response_stream). While this works, it’s not the most efficient way. We had to do this because the Streamlit function we’re using to display streamed content (st.write_stream) isn’t compatible with the format of the AIMessage object returned by method of Ollama chat models. A more efficient way could be to write a custom class that overrides StreamingStdOutCallbackHandler and pass it as a callback when creating the chat model. However, it’s a more advanced topic that will not be covered in this article.

Running the Streamlit App

The Streamlit application can be run via the command:

(localbot_env) % streamlit run

This command will open the default web browser and voilà!

Chatbot UI with Streamlit

For everyday use, let’s create a simple alias named chatweb that can be run from anywhere in a terminal to launch the chatbot web UI.

For Mac and Linux users

  • Create a shell script, e.g., with the command to launch the script via Streamlit from your Python environment.
/path/to/your/python/env/bin/streamlit run /path/to/your/python/script/
  • Make your shell script executable
% chmod +x
  • Open your ~/.zshrc file, add the alias chatweb, and set its target to the shell script file
alias chatweb="/your/shell/script/file/folder/"
  • As previously, open a new terminal to reload the .zshrc file, or run the following command in the same terminal
% source ~/.zshrc
  • Now you can run the script from anywhere in a terminal with its alias
% chatweb

And that’s that!

The complete code (notebook and final app) is available in the Github repository of the article.


We have just started our journey to build a network of professionals to grow even more our free knowledge-sharing community that’ll give you a chance to learn interesting things about topics like cloud computing, software development, and software architectures while keeping the door open to more opportunities.

Does this speak to you? If YES, feel free to Join our Discord Server to stay in touch with the community and be part of independently organized events.



The world of chatbots is vast and ever-evolving, but with the knowledge you’ve gained about LLMs, you can take your first steps toward building your own local chatbot. Remember, building a sophisticated chatbot takes time and experimentation. However, by leveraging the power of open LLMs, you can create a valuable tool for personal use, education, or even small-scale business applications.

In this article, we built a local chatbot that relies only on the pre-training data of an open LLM to generate answers. More advanced techniques like Retrieval Augmented Generation (RAG) can enable your chatbot to generate answers based on the knowledge you provide (e.g., PDF, CSV, website pages). That’s another story to come …

Thanks for reading this article. Like, recommend, and share if you enjoyed it. Follow us on FacebookTwitter, and LinkedIn for more content.


About Armel Ayimdji, PhD

I'm a Conversational AI Architect with experience weaving together software engineering, artificial intelligence, and a passion for teaching. I'm constantly learning and giving back, whether it's building versatile digital conversationalists or guiding future generations. Follow me on LinkedIn.