Deploy Streamlit with Docker and routing a request via Nginx
Visualizing a Dataframe
So far, I have extracted table data embedded in PDFs and Word documents with Python and converted them into a pandas Dataframe. We deployed our Streamlit application as Pdf to CSV for displaying extracted records from PDF.
To do this, we would like to explain how to dockerize it, and deploy it to the server.
Streamlit
Streamlit is an open source Python library that allows you to quickly transform Python scripts into interactive web applications. Streamlit allows data scientists and developers to easily create data-driven web applications without any web development expertise.
Streamlit has the following features:
- Simple, Pythonic code: Build web applications with easy-to-read, concise code with just Python knowledge
- Rapid prototyping: Rapidly bring your ideas to life with interactive widgets, charts, and layout tools
- Live editing: Changes to your code instantly update your application for faster development cycles
- Open source: Free to use and developed by an active community
- Complexity of deployment: With Streamlit Community Cloud, you can easily deploy applications without worrying about server settings and infrastructure management
Dockerizing Streamlit
Streamlit has a dedicated deploy environment called "Streamlit Community Cloud", but most people will want to deploy their Streamlit applications on their own servers. In such case, running the Streamlit application in a Docker container ensures environmental consistency and simplifies deployment.
For Pdf to CSV, we created the following Dockerfile and deployed it to the server.
requirements.txt is here:
tabula-py==2.9.3
streamlit==1.38.0
tabula-py
is for extracting table data from PDF.
Dockerfile is here:
FROM python:3.9-slim
WORKDIR /app
RUN apt-get update && apt-get install -y \
build-essential \
curl \
software-properties-common \
git \
openjdk-17-jre \
&& rm -rf /var/lib/apt/lists/*
COPY ./app.py .
COPY ./requirements.txt .
RUN pip3 install -r requirements.txt
EXPOSE 8501
ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
The Dockerfile is pretty standard. app.py is the main streamlit application. All that's left to do is create a Container Image with docker build
and deploy it. Here, the name of the built container image is "pdf-to-csv".
docker-compose.yml is here:
pdf-to-csv:
image: pdf-to-csv:0.0.1
container_name: 'pdf-to-csv'
Then, start it with "docker compose up -d" and access localhost:8501 in your browser to access the deployed application.
Routing under a path
In raw-fi-data.com, we deploy this Streamlit application and make it accessible via nginx (this is also a container). At that time, we routed access to this application under a certain path.
The nginx default.conf is as follows:
(...snip...)
location /pdf-to-csv {
proxy_pass http://pdf-to-csv:8501;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
You also need to specify this path when starting Streamlit as follows:
streamlit run app.py --server.baseUrlPath=/pdf-to-csv
Conclusion
I explained how to deploy Streamlit on your own server. Streamlit allows you to display and manipulate Dataframes with very little code. Please give it a try.