๐๏ธ acreom
acreom is a dev-first knowledge base with tasks running on local markdown files.
๐๏ธ AirbyteLoader
Airbyte is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.
๐๏ธ Airbyte CDK (Deprecated)
Note: AirbyteCDKLoader is deprecated. Please use AirbyteLoader instead.
๐๏ธ Airbyte Gong (Deprecated)
Note: This connector-specific loader is deprecated. Please use AirbyteLoader instead.
๐๏ธ Airbyte Hubspot (Deprecated)
Note: AirbyteHubspotLoader is deprecated. Please use AirbyteLoader instead.
๐๏ธ Airbyte JSON (Deprecated)
Note: AirbyteJSONLoader is deprecated. Please use AirbyteLoader instead.
๐๏ธ Airbyte Salesforce (Deprecated)
Note: This connector-specific loader is deprecated. Please use AirbyteLoader instead.
๐๏ธ Airbyte Shopify (Deprecated)
Note: This connector-specific loader is deprecated. Please use AirbyteLoader instead.
๐๏ธ Airbyte Stripe (Deprecated)
Note: This connector-specific loader is deprecated. Please use AirbyteLoader instead.
๐๏ธ Airbyte Typeform (Deprecated)
Note: This connector-specific loader is deprecated. Please use AirbyteLoader instead.
๐๏ธ Airbyte Zendesk Support (Deprecated)
Note: This connector-specific loader is deprecated. Please use AirbyteLoader instead.
๐๏ธ Airtable
* Get your API key here.
๐๏ธ Alibaba Cloud MaxCompute
Alibaba Cloud MaxCompute (previously known as ODPS) is a general purpose, fully managed, multi-tenancy data processing platform for large-scale data warehousing. MaxCompute supports various data importing solutions and distributed computing models, enabling users to effectively query massive datasets, reduce production costs, and ensure data security.
๐๏ธ Amazon Textract
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents.
๐๏ธ Apify Dataset
Apify Dataset is a scalable append-only storage with sequential access built for storing structured web scraping results, such as a list of products or Google SERPs, and then export them to various formats like JSON, CSV, or Excel. Datasets are mainly used to save results of Apify Actorsโserverless cloud programs for various web scraping, crawling, and data extraction use cases.
๐๏ธ ArcGIS
This notebook demonstrates the use of the langchaincommunity.documentloaders.ArcGISLoader class.
๐๏ธ Arxiv
arXiv is an open-access archive for 2 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.
๐๏ธ AssemblyAI Audio Transcripts
The AssemblyAIAudioTranscriptLoader allows to transcribe audio files with the AssemblyAI API and loads the transcribed text into documents.
๐๏ธ AstraDB
DataStax Astra DB is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API.
๐๏ธ Async Chromium
Chromium is one of the browsers supported by Playwright, a library used to control browser automation.
๐๏ธ AsyncHtml
AsyncHtmlLoader loads raw HTML from a list of URLs concurrently.
๐๏ธ Athena
Amazon Athena is a serverless, interactive analytics service built
๐๏ธ AWS S3 Directory
Amazon Simple Storage Service (Amazon S3) is an object storage service
๐๏ธ AWS S3 File
Amazon Simple Storage Service (Amazon S3) is an object storage service.
๐๏ธ AZLyrics
AZLyrics is a large, legal, every day growing collection of lyrics.
๐๏ธ Azure AI Data
Azure AI Studio provides the capability to upload data assets to cloud storage and register existing data assets from the following sources:
๐๏ธ Azure Blob Storage Container
Azure Blob Storage is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data.
๐๏ธ Azure Blob Storage File
Azure Files offers fully managed file shares in the cloud that are accessible via the industry standard Server Message Block (SMB) protocol, Network File System (NFS) protocol, and Azure Files REST API.
๐๏ธ Azure AI Document Intelligence
Azure AI Document Intelligence (formerly known as Azure Form Recognizer) is machine-learning
๐๏ธ BibTeX
BibTeX is a file format and reference management system commonly used in conjunction with LaTeX typesetting. It serves as a way to organize and store bibliographic information for academic and research documents.
๐๏ธ BiliBili
Bilibili is one of the most beloved long-form video sites in China.
๐๏ธ Blackboard
Blackboard Learn (previously the Blackboard Learning Management System) is a web-based virtual learning environment and learning management system developed by Blackboard Inc. The software features course management, customizable open architecture, and scalable design that allows integration with student information systems and authentication protocols. It may be installed on local servers, hosted by Blackboard ASP Solutions, or provided as Software as a Service hosted on Amazon Web Services. Its main purposes are stated to include the addition of online elements to courses traditionally delivered face-to-face and development of completely online courses with few or no face-to-face meetings
๐๏ธ Blockchain
Overview
๐๏ธ Brave Search
Brave Search is a search engine developed by Brave Software.
๐๏ธ Browserbase
Browserbase is a developer platform to reliably run, manage, and monitor headless browsers.
๐๏ธ Browserless
Browserless is a service that allows you to run headless Chrome instances in the cloud. It's a great way to run browser-based automation at scale without having to worry about managing your own infrastructure.
๐๏ธ Cassandra
Cassandra is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with vector search capabilities.
๐๏ธ ChatGPT Data
ChatGPT is an artificial intelligence (AI) chatbot developed by OpenAI.
๐๏ธ College Confidential
College Confidential gives information on 3,800+ colleges and universities.
๐๏ธ Concurrent Loader
Works just like the GenericLoader but concurrently for those who choose to optimize their workflow.
๐๏ธ Confluence
Confluence is a wiki collaboration platform that saves and organizes all of the project-related material. Confluence is a knowledge base that primarily handles content management activities.
๐๏ธ CoNLL-U
CoNLL-U is revised version of the CoNLL-X format. Annotations are encoded in plain text files (UTF-8, normalized to NFC, using only the LF character as line break, including an LF character at the end of file) with three types of lines:
๐๏ธ Copy Paste
This notebook covers how to load a document object from something you just want to copy and paste. In this case, you don't even need to use a DocumentLoader, but rather can just construct the Document directly.
๐๏ธ Couchbase
Couchbase is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications.
๐๏ธ CSV
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.
๐๏ธ Cube Semantic Layer
This notebook demonstrates the process of retrieving Cube's data model metadata in a format suitable for passing to LLMs as embeddings, thereby enhancing contextual information.
๐๏ธ Datadog Logs
Datadog is a monitoring and analytics platform for cloud-scale applications.
๐๏ธ Diffbot
Diffbot is a suite of ML-based products that make it easy to structure web data.
๐๏ธ Discord
Discord is a VoIP and instant messaging social platform. Users have the ability to communicate with voice calls, video calls, text messaging, media and files in private chats or as part of communities called "servers". A server is a collection of persistent chat rooms and voice channels which can be accessed via invite links.
๐๏ธ Docugami
This notebook covers how to load documents from Docugami. It provides the advantages of using this system over alternative data loaders.
๐๏ธ Docusaurus
Docusaurus is a static-site generator which provides out-of-the-box documentation features.
๐๏ธ Dropbox
Dropbox is a file hosting service that brings everything-traditional files, cloud content, and web shortcuts together in one place.
๐๏ธ DuckDB
DuckDB is an in-process SQL OLAP database management system.
๐๏ธ Email
This notebook shows how to load email (.eml) or Microsoft Outlook (.msg) files.
๐๏ธ EPub
EPUB is an e-book file format that uses the ".epub" file extension. The term is short for electronic publication and is sometimes styled ePub. EPUB is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers.
๐๏ธ Etherscan
Etherscan is the leading blockchain explorer, search, API and analytics platform for Ethereum,
๐๏ธ EverNote
EverNote is intended for archiving and creating notes in which photos, audio and saved web content can be embedded. Notes are stored in virtual "notebooks" and can be tagged, annotated, edited, searched, and exported.
๐๏ธ Facebook Chat
Messenger) is an American proprietary instant messaging app and platform developed by Meta Platforms. Originally developed as Facebook Chat in 2008, the company revamped its messaging service in 2010.
๐๏ธ Fauna
Fauna is a Document Database.
๐๏ธ Figma
Figma is a collaborative web application for interface design.
๐๏ธ FireCrawl
FireCrawl crawls and convert any website into LLM-ready data. It crawls all accessible subpages and give you clean markdown and metadata for each. No sitemap required.
๐๏ธ Geopandas
Geopandas is an open-source project to make working with geospatial data in python easier.
๐๏ธ Git
Git is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers collaboratively developing source code during software development.
๐๏ธ GitBook
GitBook is a modern documentation platform where teams can document everything from products to internal knowledge bases and APIs.
๐๏ธ GitHub
This notebooks shows how you can load issues and pull requests (PRs) for a given repository on GitHub. Also shows how you can load github files for a given repository on GitHub. We will use the LangChain Python repository as an example.
๐๏ธ Glue Catalog
The AWS Glue Data Catalog is a centralized metadata repository that allows you to manage, access, and share metadata about your data stored in AWS. It acts as a metadata store for your data assets, enabling various AWS services and your applications to query and connect to the data they need efficiently.
๐๏ธ Google AlloyDB for PostgreSQL
AlloyDB is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. AlloyDB is 100% compatible with PostgreSQL. Extend your database application to build AI-powered experiences leveraging AlloyDB's Langchain integrations.
๐๏ธ Google BigQuery
Google BigQuery is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.
๐๏ธ Google Bigtable
Bigtable is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. Extend your database application to build AI-powered experiences leveraging Bigtable's Langchain integrations.
๐๏ธ Google Cloud SQL for SQL server
Cloud SQL is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers MySQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.
๐๏ธ Google Cloud SQL for MySQL
Cloud SQL is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers MySQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.
๐๏ธ Google Cloud SQL for PostgreSQL
Cloud SQL for PostgreSQL is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud Platform. Extend your database application to build AI-powered experiences leveraging Cloud SQL for PostgreSQL's Langchain integrations.
๐๏ธ Google Cloud Storage Directory
Google Cloud Storage is a managed service for storing unstructured data.
๐๏ธ Google Cloud Storage File
Google Cloud Storage is a managed service for storing unstructured data.
๐๏ธ Google Firestore in Datastore Mode
Firestore in Datastore Mode is a NoSQL document database built for automatic scaling, high performance and ease of application development. Extend your database application to build AI-powered experiences leveraging Datastore's Langchain integrations.
๐๏ธ Google Drive
Google Drive is a file storage and synchronization service developed by Google.
๐๏ธ Google El Carro for Oracle Workloads
Google El Carro Oracle Operator
๐๏ธ Google Firestore (Native Mode)
Firestore is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Firestore's Langchain integrations.
๐๏ธ Google Memorystore for Redis
Google Memorystore for Redis is a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access. Extend your database application to build AI-powered experiences leveraging Memorystore for Redis's Langchain integrations.
๐๏ธ Google Spanner
Spanner is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.
๐๏ธ Google Speech-to-Text Audio Transcripts
The GoogleSpeechToTextLoader allows to transcribe audio files with the Google Cloud Speech-to-Text API and loads the transcribed text into documents.
๐๏ธ Grobid
GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents.
๐๏ธ Gutenberg
Project Gutenberg is an online library of free eBooks.
๐๏ธ Hacker News
Hacker News (sometimes abbreviated as HN) is a social news website focusing on computer science and entrepreneurship. It is run by the investment fund and startup incubator Y Combinator. In general, content that can be submitted is defined as "anything that gratifies one's intellectual curiosity."
๐๏ธ Huawei OBS Directory
The following code demonstrates how to load objects from the Huawei OBS (Object Storage Service) as documents.
๐๏ธ Huawei OBS File
The following code demonstrates how to load an object from the Huawei OBS (Object Storage Service) as document.
๐๏ธ HuggingFace dataset
The Hugging Face Hub is home to over 5,000 datasets in more than 100 languages that can be used for a broad range of tasks across NLP, Computer Vision, and Audio. They used for a diverse range of tasks such as translation,
๐๏ธ iFixit
iFixit is the largest, open repair community on the web. The site contains nearly 100k repair manuals, 200k Questions & Answers on 42k devices, and all the data is licensed under CC-BY-NC-SA 3.0.
๐๏ธ Images
This covers how to load images such as JPG or PNG into a document format that we can use downstream.
๐๏ธ Image captions
By default, the loader utilizes the pre-trained Salesforce BLIP image captioning model.
๐๏ธ IMSDb
IMSDb is the Internet Movie Script Database.
๐๏ธ Iugu
Iugu is a Brazilian services and software as a service (SaaS) company. It offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications.
๐๏ธ Joplin
Joplin is an open-source note-taking app. Capture your thoughts and securely access them from any device.
๐๏ธ Jupyter Notebook
Jupyter Notebook (formerly IPython Notebook) is a web-based interactive computational environment for creating notebook documents.
๐๏ธ Kinetica
This notebooks goes over how to load documents from Kinetica
๐๏ธ lakeFS
lakeFS provides scalable version control over the data lake, and uses Git-like semantics to create and access those versions.
๐๏ธ LarkSuite (FeiShu)
LarkSuite is an enterprise collaboration platform developed by ByteDance.
๐๏ธ LLM Sherpa
This notebook covers how to use LLM Sherpa to load files of many types. LLM Sherpa supports different file formats including DOCX, PPTX, HTML, TXT, and XML.
๐๏ธ Mastodon
Mastodon is a federated social media and social networking service.
๐๏ธ MediaWiki Dump
MediaWiki XML Dumps contain the content of a wiki (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup of the wiki database, the dump does not contain user accounts, images, edit logs, etc.
๐๏ธ Merge Documents Loader
Merge the documents returned from a set of specified data loaders.
๐๏ธ mhtml
MHTML is a is used both for emails but also for archived webpages. MHTML, sometimes referred as MHT, stands for MIME HTML is a single file in which entire webpage is archived. When one saves a webpage as MHTML format, this file extension will contain HTML code, images, audio files, flash animation etc.
๐๏ธ Microsoft Excel
The UnstructuredExcelLoader is used to load Microsoft Excel files. The loader works with both .xlsx and .xls files. The page content will be the raw text of the Excel file. If you use the loader in "elements" mode, an HTML representation of the Excel file will be available in the document metadata under the textashtml key.
๐๏ธ Microsoft OneDrive
Microsoft OneDrive (formerly SkyDrive) is a file hosting service operated by Microsoft.
๐๏ธ Microsoft OneNote
This notebook covers how to load documents from OneNote.
๐๏ธ Microsoft PowerPoint
Microsoft PowerPoint is a presentation program by Microsoft.
๐๏ธ Microsoft SharePoint
Microsoft SharePoint is a website-based collaboration system that uses workflow applications, โlistโ databases, and other web parts and security features to empower business teams to work together developed by Microsoft.
๐๏ธ Microsoft Word
Microsoft Word is a word processor developed by Microsoft.
๐๏ธ Near Blockchain
Overview
๐๏ธ Modern Treasury
Modern Treasury simplifies complex payment operations. It is a unified platform to power products and processes that move money.
๐๏ธ MongoDB
MongoDB is a NoSQL , document-oriented database that supports JSON-like documents with a dynamic schema.
๐๏ธ News URL
This covers how to load HTML news articles from a list of URLs into a document format that we can use downstream.
๐๏ธ Notion DB 1/2
Notion is a collaboration platform with modified Markdown support that integrates kanban boards, tasks, wikis and databases. It is an all-in-one workspace for notetaking, knowledge and data management, and project and task management.
๐๏ธ Notion DB 2/2
Notion is a collaboration platform with modified Markdown support that integrates kanban boards, tasks, wikis and databases. It is an all-in-one workspace for notetaking, knowledge and data management, and project and task management.
๐๏ธ Nuclia
Nuclia automatically indexes your unstructured data from any internal and external source, providing optimized search results and generative answers. It can handle video and audio transcription, image content extraction, and document parsing.
๐๏ธ Obsidian
Obsidian is a powerful and extensible knowledge base
๐๏ธ Open Document Format (ODT)
The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.
๐๏ธ Open City Data
Socrata provides an API for city open data.
๐๏ธ Oracle Autonomous Database
Oracle autonomous database is a cloud database that uses machine learning to automate database tuning, security, backups, updates, and other routine management tasks traditionally performed by DBAs.
๐๏ธ Oracle AI Vector Search: Document Processing
Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.
๐๏ธ Org-mode
A Org Mode document is a document editing, formatting, and organizing mode, designed for notes, planning, and authoring within the free software text editor Emacs.
๐๏ธ Pandas DataFrame
This notebook goes over how to load data from a pandas DataFrame.
๐๏ธ Pebblo Safe DocumentLoader
Pebblo enables developers to safely load data and promote their Gen AI app to deployment without worrying about the organizationโs compliance and security requirements. The project identifies semantic topics and entities found in the loaded data and summarizes them on the UI or a PDF report.
๐๏ธ Polars DataFrame
This notebook goes over how to load data from a polars DataFrame.
๐๏ธ Psychic
This notebook covers how to load documents from Psychic. See here for more details.
๐๏ธ PubMed
PubMedยฎ by The National Center for Biotechnology Information, National Library of Medicine comprises more than 35 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites.
๐๏ธ PySpark
This notebook goes over how to load data from a PySpark DataFrame.
๐๏ธ Quip
Quip is a collaborative productivity software suite for mobile and Web. It allows groups of people to create and edit documents and spreadsheets as a group, typically for business purposes.
๐๏ธ ReadTheDocs Documentation
Read the Docs is an open-sourced free software documentation hosting platform. It generates documentation written with the Sphinx documentation generator.
๐๏ธ Recursive URL
We may want to process load all URLs under a root directory.
๐๏ธ Reddit
Reddit is an American social news aggregation, content rating, and discussion website.
๐๏ธ Roam
ROAM is a note-taking tool for networked thought, designed to create a personal knowledge base.
๐๏ธ Rockset
Rockset is a real-time analytics database which enables queries on massive, semi-structured data without operational burden. With Rockset, ingested data is queryable within one second and analytical queries against that data typically execute in milliseconds. Rockset is compute optimized, making it suitable for serving high concurrency applications in the sub-100TB range (or larger than 100s of TBs with rollups).
๐๏ธ rspace
This notebook shows how to use the RSpace document loader to import research notes and documents from RSpace Electronic
๐๏ธ RSS Feeds
This covers how to load HTML news articles from a list of RSS feed URLs into a document format that we can use downstream.
๐๏ธ RST
A reStructured Text (RST) file is a file format for textual data used primarily in the Python programming language community for technical documentation.
๐๏ธ scrapfly
ScrapFly
๐๏ธ Sitemap
Extends from the WebBaseLoader, SitemapLoader loads a sitemap from a given URL, and then scrape and load all pages in the sitemap, returning each page as a Document.
๐๏ธ Slack
Slack is an instant messaging program.
๐๏ธ Snowflake
This notebooks goes over how to load documents from Snowflake
๐๏ธ Source Code
This notebook covers how to load source code files using a special approach with language parsing: each top-level function and class in the code is loaded into separate documents. Any remaining code top-level code outside the already loaded functions and classes will be loaded into a separate document.
๐๏ธ Spider
Spider is the fastest and most affordable crawler and scraper that returns LLM-ready data.
๐๏ธ Spreedly
Spreedly is a service that allows you to securely store credit cards and use them to transact against any number of payment gateways and third party APIs. It does this by simultaneously providing a card tokenization/vault service as well as a gateway and receiver integration service. Payment methods tokenized by Spreedly are stored at Spreedly, allowing you to independently store a card and then pass that card to different end points based on your business requirements.
๐๏ธ Stripe
Stripe is an Irish-American financial services and software as a service (SaaS) company. It offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications.
๐๏ธ Subtitle
The SubRip file format is described on the Matroska multimedia container format website as "perhaps the most basic of all subtitle formats." SubRip (SubRip Text) files are named with the extension .srt, and contain formatted lines of plain text in groups separated by a blank line. Subtitles are numbered sequentially, starting at 1. The timecode format used is hoursseconds,milliseconds with time units fixed to two zero-padded digits and fractions fixed to three zero-padded digits (0000,000). The fractional separator used is the comma, since the program was written in France.
๐๏ธ SurrealDB
SurrealDB is an end-to-end cloud-native database designed for modern applications, including web, mobile, serverless, Jamstack, backend, and traditional applications. With SurrealDB, you can simplify your database and API infrastructure, reduce development time, and build secure, performant apps quickly and cost-effectively.
๐๏ธ Telegram
Telegram Messenger is a globally accessible freemium, cross-platform, encrypted, cloud-based and centralized instant messaging service. The application also provides optional end-to-end encrypted chats and video calling, VoIP, file sharing and several other features.
๐๏ธ Tencent COS Directory
Tencent Cloud Object Storage (COS) is a distributed
๐๏ธ Tencent COS File
Tencent Cloud Object Storage (COS) is a distributed
๐๏ธ TensorFlow Datasets
TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as tf.data.Datasets, enabling easy-to-use and high-performance input pipelines. To get started see the guide and the list of datasets.
๐๏ธ TiDB
TiDB Cloud, is a comprehensive Database-as-a-Service (DBaaS) solution, that provides dedicated and serverless options. TiDB Serverless is now integrating a built-in vector search into the MySQL landscape. With this enhancement, you can seamlessly develop AI applications using TiDB Serverless without the need for a new database or additional technical stacks. Be among the first to experience it by joining the waitlist for the private beta at https://tidb.cloud/ai.
๐๏ธ 2Markdown
2markdown service transforms website content into structured markdown files.
๐๏ธ TOML
TOML is a file format for configuration files. It is intended to be easy to read and write, and is designed to map unambiguously to a dictionary. Its specification is open-source. TOML is implemented in many programming languages. The name TOML is an acronym for "Tom's Obvious, Minimal Language" referring to its creator, Tom Preston-Werner.
๐๏ธ Trello
Trello is a web-based project management and collaboration tool that allows individuals and teams to organize and track their tasks and projects. It provides a visual interface known as a "board" where users can create lists and cards to represent their tasks and activities.
๐๏ธ TSV
A tab-separated values (TSV) file is a simple, text-based file format for storing tabular data.[3] Records are separated by newlines, and values within a record are separated by tab characters.
๐๏ธ Twitter
Twitter is an online social media and social networking service.
๐๏ธ Unstructured File
This notebook covers how to use Unstructured package to load files of many types. Unstructured currently supports loading of text files, powerpoints, html, pdfs, images, and more.
๐๏ธ Upstage
This notebook covers how to get started with UpstageLayoutAnalysisLoader.
๐๏ธ URL
This example covers how to load HTML documents from a list of URLs into the Document format that we can use downstream.
๐๏ธ Vsdx
A visio file (with extension .vsdx) is associated with Microsoft Visio, a diagram creation software. It stores information about the structure, layout, and graphical elements of a diagram. This format facilitates the creation and sharing of visualizations in areas such as business, engineering, and computer science.
๐๏ธ Weather
OpenWeatherMap is an open-source weather service provider
๐๏ธ WebBaseLoader
This covers how to use WebBaseLoader to load all text from HTML webpages into a document format that we can use downstream. For more custom logic for loading webpages look at some child class examples such as IMSDbLoader, AZLyricsLoader, and CollegeConfidentialLoader.
๐๏ธ WhatsApp Chat
WhatsApp (also called WhatsApp Messenger) is a freeware, cross-platform, centralized instant messaging (IM) and voice-over-IP (VoIP) service. It allows users to send text and voice messages, make voice and video calls, and share images, documents, user locations, and other content.
๐๏ธ Wikipedia
Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history.
๐๏ธ XML
The UnstructuredXMLLoader is used to load XML files. The loader works with .xml files. The page content will be the text extracted from the XML tags.
๐๏ธ Xorbits Pandas DataFrame
This notebook goes over how to load data from a xorbits.pandas DataFrame.
๐๏ธ YouTube audio
Building chat or QA applications on YouTube videos is a topic of high interest.
๐๏ธ YouTube transcripts
YouTube is an online video sharing and social media platform created by Google.
๐๏ธ Yuque
Yuque is a professional cloud-based knowledge base for team collaboration in documentation.