- 11
- 16 653
debugverse
Czechia
Приєднався 23 сер 2024
Programming knowledge and tutorials
Відео
Scrape ANY Website in JUST 5 Lines of Code
Переглядів 2 тис.2 місяці тому
Scraping is easy, see? Autoscraper project repo github.com/alirezamika/autoscraper
DIY OpenAI Voice Assistant UNDER 100 Lines of Code
Переглядів 2722 місяці тому
Self-hosted voice assistant with Ollama, Langchain, Python FastAPI and a little bit of that good JS. Source code: github.com/debugverse/debugverse-youtube
SUPERFAST Audio Transcription with OpenAI Whisper Turbo - Python Tutorial
Переглядів 8013 місяці тому
a New OpenAI Whisper Turbo model lets anybody transcribe audio files locally into text with high performance and high accuracy. Source code: github.com/debugverse/debugverse-youtube
AI Image Organizer with Python Tutorial
Переглядів 1144 місяці тому
short tutorial for AI powered context-aware image renaming program in Python using Google Gemini Flash model.
AI Summarize HUGE Documents Locally! (Langchain + Ollama + Python)
Переглядів 11 тис.4 місяці тому
Today we are looking at a way to efficiently summarize huge PDF (or any other text) documents using clustering method with HuggingFace embeddings, Langchain Python framework and Ollama Llama 3.1 model.
5 FastAPI ProTips For Writing Better API
Переглядів 574 місяці тому
Five quick tips to make your Fast API app work faster and better.
AI ChatBot with Gemini + Tool calling (Golang Tutorial)
Переглядів 4524 місяці тому
In this tutorial we will take a look at the genAI package and write a chat bot in Go with the ability to call tools using a Gemini Flash model.
AI Generated Blog with Langchain + FastAPI Python Tutorial (Part 2)
Переглядів 6784 місяці тому
In the second part we focus on Pydantic models for data serialization, Mongo DB for storing and retrieving data and implementing Jinja2 templating.
AI Generated Blog with Langchain + FastAPI Python Tutorial (Part 1)
Переглядів 2634 місяці тому
In this project we will be creating an API which will be able to automatically generate a blog post on any topic, complete with title, tags and content. We will store the blog to MongoDB and create endpoints for listing posts and generating them.
AI PDF Summarize and Sentiment analysis with Langchain + Ollama Python tutorial
Переглядів 6754 місяці тому
Summarize a PDF using 100% local LLama3.1 AI model and generate a sentiment of it using Langhcain framework and structured output.
Hi, I am working on a company project. Can this help me extract the required data from a PDF? I receive a monthly PDF that includes all our company clients' monthly statements. I need to extract the 'Brought Forward' and 'Realized Loss/Profit Amount' from the PDF, which is nearly a thousand pages long. I will need to perform this process monthly.
I have worked on a similar task with both vision LLM and pdfminer so I would recommend those tools.
Nice video - thanks for sharing that
Very cool! Do you mind providing an example of how to filter the data like you mention in closing?
I looked at this. Basically, you use the results to provide your source pages, and then use that as the context. For example: filter = EmbeddingsClusteringFilter(embeddings=embeddings, num_clusters=10, num_closest=3) result = filter.transform_documents(documents=texts) context="" for i in result: context += f"{i.page_content} " # convert your result pages into a single text blob by combining them prompt = " Ask your question here... use the context within triple backticks ``` {context}```" response = llm.invoke(prompt) print(response) However... this is not a replacement for RAG, because remember that much of the document has been discarded and so you're unlikely to find your answer. k-means is basically just collating similar pages, but not necessarily the one with the unique information you need. K-means is therefore great for summarisation, but not necessarily good for specific questions. So, if your specific question relates to something that is summary-like, then if should be more relevant. Maybe I've missed something here, but that's my conclusion from playing with it.
I think the latest vision models will make RAG obsolete
Using gemini vision to describe the video?? Nice technique
one of the best videos i have ever seen. I just want to tell you Thank you and good job
Thank you very informative!
Will this work for a procedurally generated file containing a conversation? Or should I look at another method?
I really love your UA-cam contents but until now I didn't a tutorial video like how to cluster clients feedback for example one says we need electricity and an other one says lack of electricity, I want to develop a python code to automatically cluster these comments which are similar to each other into one unique sentence but I didn't want to delete the feedback column in excel but I want to create another column next to the feedback column then to do these clustering so that I can see how accurate are they for doing it(note: I don't want to make summarise no but having 500 feedback and have each of them a cluster but a when I filter the cluster then I should have in total 5 to ten or more for summarise similar feedbacks) . If it is possible with python or any other program I would be happy and grateful.
LangChain has moved to Pydantic 2. To update this code change "from langchain_core.pydantic_v1 import BaseModel, Field" to "from pydantic import BaseModel, Field". This caused me to get some errors with score which I had to change to a float since the definition of a number between 0 and 1 implies float to the system.
Wonderful. Thank you so much for sharing this valuable tool.
not working, i'll just stick with selectolax
what if images of tables and equations are there in that case?
WHY THE AI VOICE 😩
trust me it's better this way
this is how the knowledge is shared.
Why do you use the HuggingFaceBgeEmbeddings and not OllamaEmbeddings?
😎
Nice demo. Quick question, do you know how PyPDFLoader will process the images and tables within the PDF file? THanks.
Hi, the images are not processed by default and tables, if possible, are (clumsily) converted to text. if you are looking for more advanced extraction, one way I tried is to convert a PDF page to PNG and give it to Vision LLM for evaluation, which can understand pictures and graphs better.
Can you do a full instal tutorial for windows? I want to use Whisper v3 turbo in my python programm but still did not figured out a proper install ^^#
Hi, on Windows you can use openai-whisper package. see pypi.org/project/openai-whisper/ for more details. Either way on Windows I recommend using WSL backend for better compatibility
Excellent, thank you! A very clever strategy for large documents. However, I am a little at a loss in the search of a good embedding model for texts in Spanish. I am not sure whether the BGE models are the best option for these. Can you suggest one that could be integrated seamlessly within your code?
Hi, for Spanish language take a look at jinaai/jina-embeddings-v2-base-es . In your code simply replace the model_name variable and everything should work.
@@DebugVerseTutorials Thank you very much for your kind answer. I'll do that 😊🤗🤗
@@DebugVerseTutorials Hi, if I would to use the Ollama model, how can I know the exact name necessary to put in the model_name?
@@igorcastilhosdo ollama list to see the model available and copy the name.
you can use latest jina embeddings v3 as it is multilinugal.
Source code github.com/debugverse/debugverse-youtube
Source code github.com/debugverse/debugverse-youtube/tree/main/summarize_huge_documents_kmeans
Source code: github.com/debugverse/debugverse-youtube/tree/main/go-genai-chatbot
Source code github.com/debugverse/debugverse-youtube/tree/main/ai_blog
Source code github.com/debugverse/debugverse-youtube/tree/main/ai_blog
Code source, please.
Here you go github.com/debugverse/debugverse-youtube/tree/main/pdf_summary_and_sentiment