Raw-fi Data

Deploy Streamlit with Docker and routing a request via Nginx

2024-10-13

Visualizing a Dataframe

So far, I have extracted table data embedded in PDFs and Word documents with Python and converted them into a pandas Dataframe. We deployed our Streamlit application as Pdf to CSV for displaying extracted records from PDF.

To do this, we would like to explain how to dockerize it, and deploy it to the server.

Streamlit

Streamlit is an open source Python library that allows you to quickly transform Python scripts into interactive web applications. Streamlit allows data scientists and developers to easily create data-driven web applications without any web development expertise.

Streamlit has the following features:

  • Simple, Pythonic code: Build web applications with easy-to-read, concise code with just Python knowledge
  • Rapid prototyping: Rapidly bring your ideas to life with interactive widgets, charts, and layout tools
  • Live editing: Changes to your code instantly update your application for faster development cycles
  • Open source: Free to use and developed by an active community
  • Complexity of deployment: With Streamlit Community Cloud, you can easily deploy applications without worrying about server settings and infrastructure management

Dockerizing Streamlit

Streamlit has a dedicated deploy environment called "Streamlit Community Cloud", but most people will want to deploy their Streamlit applications on their own servers. In such case, running the Streamlit application in a Docker container ensures environmental consistency and simplifies deployment.

For Pdf to CSV, we created the following Dockerfile and deployed it to the server.

requirements.txt is here:

tabula-py==2.9.3
streamlit==1.38.0

tabula-py is for extracting table data from PDF.

Dockerfile is here:

FROM python:3.9-slim

WORKDIR /app

RUN apt-get update && apt-get install -y \
    build-essential \
    curl \
    software-properties-common \
    git \
    openjdk-17-jre \
    && rm -rf /var/lib/apt/lists/*

COPY ./app.py .
COPY ./requirements.txt .

RUN pip3 install -r requirements.txt

EXPOSE 8501

ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

The Dockerfile is pretty standard. app.py is the main streamlit application. All that's left to do is create a Container Image with docker build and deploy it. Here, the name of the built container image is "pdf-to-csv".

docker-compose.yml is here:

  pdf-to-csv:
    image: pdf-to-csv:0.0.1
    container_name: 'pdf-to-csv'

Then, start it with "docker compose up -d" and access localhost:8501 in your browser to access the deployed application.

Routing under a path

In raw-fi-data.com, we deploy this Streamlit application and make it accessible via nginx (this is also a container). At that time, we routed access to this application under a certain path.

The nginx default.conf is as follows:

  (...snip...)

  location /pdf-to-csv {
    proxy_pass http://pdf-to-csv:8501;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
  }

You also need to specify this path when starting Streamlit as follows:

streamlit run app.py --server.baseUrlPath=/pdf-to-csv

Conclusion

I explained how to deploy Streamlit on your own server. Streamlit allows you to display and manipulate Dataframes with very little code. Please give it a try.