Stammer.ai Docs
  • Welcome
    • 👋Welcome to Stammer.ai
    • 📑About Stammer.ai
    • 🆕New/ Latest Updates
  • START HERE
    • ❓What is Stammer.ai?
    • 🫡Agency Quick Start Guide
    • 📚Glossary
    • 🤑Our Community
  • AI Features
    • 🤖AI Agents (Chatbots)
      • How AI Agents Work
      • How to Build an AI Agent
    • 🔃AI Agent Dashboard
      • (White Label) Marketplace
      • Default Chatbot
      • Share Chatbot
      • Clone Chatbot
      • Chatbot UUID
      • Delete Chatbot
      • Summary (Analytics)
    • 💬Conversations
      • Review Past Conversations
      • Conversation Links
      • Training Better Responses
      • Live Chat
        • Human Handoff Automation (for Live Chat)
    • 📚Knowledge Base Explained
      • 💽Data Storage (Characters)
        • Storage (Characters) Example
      • Copy & Paste Text
      • Upload Documents
      • Scrape Websites
        • 24-hour Auto-Scraping
        • Scraping a Google Doc
        • Google Docs
      • Question & Answer Pairs
      • Data & Content Security
      • Citations/ Sources
      • HTML Rendering (Embed Images & Videos)
    • ✍️Prompting
      • What is a Base System Prompt?
      • Writing a Base System Prompt
      • Prompt Examples (Marketplace)
        • Base System Prompt Template (General)
    • 🔧AI Agent Settings
    • 🎨AI Chat Widget
      • Customize Chat Widget Apperance
      • Chat Widget Pop-Up
      • Initial Messages Pop-Up
      • Chat Widget Voice
    • ⚙️Advanced Settings for AI Agent
      • Model Version
      • Prioritize Question/Answer Results
      • Chatbot Visibility
      • Temperature
      • Message Template
      • Support Email
      • Domains
      • Show Data Sources in Chat Widget
      • Debug Mode
    • Leads 2.0
    • 📈Leads (Legacy)
      • Overview of Lead Generation
      • Lead Collection Form Fields
      • Lead Collection Webhook URL
      • Lead Collection Email Notification
      • In Chat Lead Collection Base System Prompt
      • Sending Lead Data to Zapier
      • Sending Lead Data to GHL
      • Sending Lead Data to Zoho CRM
    • 📅Scheduling
      • AI Scheduling - Simplified
      • AI Scheduling - Experimental (Beta)
        • What is Scheduling?
        • Calendar Integration
          • Google Calendar
        • Appointment Scheduling Activation
        • Appointment Scheduling Webhook URL
        • Appointment Scheduling Email Notification
        • Appointment Scheduling Base System Prompt
        • The Appointment Scheduling Functions & Customizing the Appointment Scheduling Prompt
    • 🧊Integrations
      • Embed Chatbot on a Website
      • Zapier App
      • GoHighLevel (GHL)
      • Instagram DM
      • Facebook Messenger
      • WhatsApp
      • ElevenLabs Voice
      • Discord
    • 🤖Troubleshooting
      • How to Respond in Any Language
      • Responses are Too Long
      • Debug Mode
      • Clickable Links in Responses
    • 🌌Functions (Labs)
      • Custom Functions
        • How Custom Functions Work
        • Code Generation Prompt for Custom Functions
        • Stock Market Data - Custom Function
        • Shopify - Custom Function
        • Airtable - Custom Function
        • Google Sheets - Custom Function
    • 🔄Vault
      • OpenAI API Key (ChatGPT)
      • Anthropic API Key (Claude)
      • xAI API Key (Grok)
      • Eleven Labs API Key
  • 👑MasterChat
  • White Label
    • 🤑What is White Labeling?
    • 👤Client Sub-Accounts
      • Create Sub-Account
      • Sub-Account Wallet
      • Sub-Account Storage Balance
      • Sub-Account Extra AI Agents
      • How to clone a chatbot to a sub-account
    • 🔗Custom Domain
      • Troubleshooting Custom Domain Deployment DNS Issues
    • 🏗️The SaaS Configurator
      • Create Subscription Package
        • Custom Packages + Settings
      • Add-Ons
        • Storage (Characters) - Add Ons
        • AI Agent Rebilling - Add Ons
        • MasterChat - Add Ons
        • Advanced Scraper - Add Ons
    • 👍White Label API
      • API Documentation
      • AI Agent API
        • Message AI Agent
        • Create AI Agent
        • Retrieve AI Agent
        • Update AI Agent
        • Delete AI Agent
      • Knowledge Base API
        • Add AI Agent Q/A
        • Add URLs to Scrape
        • Add Files (Pdf, Doc etc.)
      • Conversations
        • AI Agent Conversations
        • Retrieve Conversation
        • Delete A Conversation
      • Sub-Accounts
        • Create Sub-Account
        • Retrieve Sub-Account
        • Update Sub-Account
        • Delete Sub-Account
      • User
        • Retrieve User's Data
      • Examples
        • Create New Sub-Account (Zapier Example)
        • White Label API x Zapier X Slack Example
        • AI Chatbot Chrome Extension
    • 👾White Label (Your) Marketplace
    • 😇White Label Free Trials
      • Free Trial Settings (No CC Req.)
      • Free Trial Settings (CC Req.)
    • 🎨Custom Dashboard Styling
  • Agency Settings
    • ⛳Agency Dashboard
    • 👥User Permissions
    • 😎Agency Profile
    • 👥User Access
    • 💳Connect Stripe
    • 🔗Custom Menu Links
    • 🤘Default Prompts
  • Account Management
    • ⚙️Billing Usage & Settings
    • 💲Agency Billing
      • Subscription Plans
      • Change Subscription Plan
      • Update Credit Card
      • Download Invoice
      • Refunds
      • Cancel Subscription
    • 💱Agency Wallet
      • The Agency Wallet System
      • Agency vs Sub-Account Wallet
      • Auto Billing
      • Add Funds to Wallet
      • AI Messages Costs
      • AI Agents Costs
      • Storage (Characters) Costs
    • 🤲Admin Account Profile
      • Change Password
      • Google Auth Login
  • Support
    • 🤝Need Help?
    • 🔓Security & Compliance
      • Data Privacy & Security Report
      • GDPR
      • Terms & Conditions
      • Privacy Policy
      • Data Processing Agreement
      • AI Model Data Usage
      • Service Level Agreement
    • ⚡Discord Community
    • 📹Weekly Office Hours
    • 🙌FAQ
      • General
      • Localization
      • Data/Security
      • Set Up
      • Password
      • Technical Questions
      • Sales/Onboarding
      • Marketing/Pricing
      • Where can I find...?
      • Other
      • Alternatives to Stripe Connect
    • 📰Changelog/ Roadmap
    • 🍭Resources
      • Sales Deck
      • 💰AI Agency 101
        • How to Price Your AI Chatbots
        • How to Get More Clients
        • The Value Proposition of AI Agents
        • AI SaaS Funnel Explained
    • 🤑Affiliate Program
Powered by GitBook
On this page
  1. AI Features
  2. Knowledge Base Explained

Scrape Websites

Scrape any webpage(s) to add all of the content to your knowledge base.

PreviousUpload DocumentsNext24-hour Auto-Scraping

Last updated 1 year ago

Data Quality in AI Agents

When it comes to creating AI agents, especially with platforms like Stammer, the quality of data is paramount. Here's why:

  1. Bad Data Equals Bad Performance

    The performance of an AI agent is directly proportional to the quality of data it's trained on. Simply scraping all the content from a website and feeding it to a bot won't yield good results. Many websites contain outdated, irrelevant, or inaccurate information that can hinder the bot's performance.

  2. The Structure of Data Matters

    Not all content on a website is useful. For instance, some blog posts might lack relevant information about a product or might be generated automatically by the application. Including such data can lead to the bot being misinformed.

  3. Strategies for Effective Data Collection

    • Selective Scraping: Instead of scraping everything, focus on the most relevant and accurate pages.

    • Utilizing FAQ Pages: These pages are goldmines as they often contain question-answer pairs. Platforms like Notifier use knowledge base matching to find text that matches customer queries, making FAQ pages extremely valuable.

    • Handling Unstructured Data: If a webpage contains unstructured data, like paragraphs without clear headings, it's possible to preprocess this data using tools like Chat GPT and the WebPilot plugin. This helps in structuring the data in a more bot-friendly manner.

    • Dealing with JavaScript-heavy Pages: Some web pages rely heavily on JavaScript to display content. In such cases, tools like the WebPilot plugin might not work effectively. However, using the Chrome extension 'Page Plain Text' can help extract all the text from such pages, ensuring the bot gets all the necessary information.

  4. Future Developments

    There are plans to introduce a feature that allows users to preprocess data directly from the user interface. Feedback from users is crucial in refining and introducing such features.

Scraping a Google Doc

The website scraper is also able to see and scrape all of the text data from a public Google Doc

📚