The Company News Collecter is a scalable backend API built for gathering, scraping, and summarizing relevant business news from various online sources. It integrates the Google Serper API for news search, the Browserless API for web scraping, and OpenAI's GPT-3.5 model for summarization.
The tool efficiently retrieves up-to-date news articles based on specific queries, extracts content from targeted websites, and generates concise summaries tailored to specific business objectives. The project employs a map-reduce method for summarization by spliting the text into manageable chunks, processing each chunk to extract key insights, and then combining the results to produce the summary.
By customizing the search queries and objectives, users can tailor the focus on aspects like financial health, corporate announcements, supply chain stability, market position, and regulatory compliance of supplier companies.
Flow chart of News Collecting Agent:
- FastAPI: For creating the backend API.
- Supabase: As the backend PostgreSQL database and for authentication.
- LangChain: For integrating language models and building chains for summarization and analysis.
- OpenAI GPT-3.5: For generating summaries and extracting key business insights.
- Pydantic: Data validation for url and news objectives.
- Render: For deployment and hosting.
To get started with Supabase, follow these steps to set up your database and insert the necessary schema.
- Go to the Supabase website.
- Sign up for a new account or log in if you already have one.
-
After logging in, click on the "New Project" button.
-
Fill in the required details:
- Project Name: Give your project a name.
- Organization: Select your organization or create a new one.
- Database Password: Set a strong password for your database.
-
Click on "Create New Project".
- Once the project is created, navigate to the "Settings" tab.
- Click on "API" to find your
SUPABASE_URL
andSUPABASE_KEY
. - Copy these credentials and add them to your
.env
file as shown in the setup tutorial.
- Go to the "Table Editor" tab in your Supabase project.
- Click on "New Table" to create the tables required for your application.
Refer to the schema diagram above for the table structure. Use the following SQL commands to initialize the tables:
CREATE TABLE supplier (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255),
supplier_name VARCHAR(255),
profile_url VARCHAR(255)
);
CREATE TABLE summary (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
time TIMESTAMPTZ DEFAULT NOW(),
content TEXT,
news_url TEXT,
news_title TEXT,
supplier_id UUID REFERENCES supplier(id)
);
Use the following SQL command to insert dummy company information for testing purposes:
INSERT INTO supplier (email, supplier_name, profile_url) VALUES
('[email protected]', 'Tesla', 'https://www.tesla.com/profile'),
('[email protected]', 'Apple', 'https://www.apple.com/profile');
('[email protected]', 'Nvidia', 'https://www.nvidia.com/profile');
To ensure that your Supabase setup is correct, you can test the connection by running a simple query from your application.
from supabase import create_client
SUPABASE_URL = os.getenv("SUPABASE_URL")
SUPABASE_KEY = os.getenv("SUPABASE_KEY")
supabase = create_client(SUPABASE_URL, SUPABASE_KEY)
data = supabase.table('supplier').select('*').execute()
print(data)
After running the tool in practice, you might want to filter out irrelevant summarizations. Use the following SQL command to delete entries with "NOT RELEVANT" content or null values:
DELETE FROM summary
WHERE content LIKE '%NOT RELEVANT%' OR content IS NULL;
This tutorial will guide you through the setup and usage of the Company News Collector tool. Follow these steps to get the tool up and running.
- Python 3.8+: Ensure you have Python installed on your machine.
- pip: Python package installer.
- Environment Variables: Create a
.env
file in the project root with the following variables:SUPABASE_URL
SUPABASE_KEY
BROWSERLESS_API_KEY
SERP_API_KEY
OPENAI_API_KEY
Clone the repository from your version control system.
git clone <repository_url>
cd <repository_directory>
Install the necessary Python packages using pip.
pip install -r requirements.txt
Create a .env file in the root directory of your project and add the necessary environment variables.
touch .env
Add the following content to the .env file:
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_key
BROWSERLESS_API_KEY=your_browserless_api_key
SERP_API_KEY=your_serp_api_key
OPENAI_API_KEY=your_openai_api_key
Start the FastAPI server to make the tool accessible.
uvicorn app:app --host 127.0.0.1 --port 8000 --reload
- POST /scrape/: Starts the scraping and summarizing process for all suppliers.
To start the scraping process, you can send a POST request to the /scrape/ endpoint.
curl -X POST "http://127.0.0.1:8000/scrape/"
Here's an example of how to directly test the functionality in a Python script:
from ai_agent import generate_news_queries, search_news, scrape_website
acg_query = generate_news_queries('AGC Incorporated')
acg_financial_news = search_news(acg_query[4]['query'])
acg_f_news_1 = scrape_website(acg_query[4]['objective'], acg_financial_news[1]['link'])
print(acg_f_news_1)
print(acg_financial_news[1]['link'])
By following these steps, you will have your Company News Collector tool deployed on Render, making it accessible for demonstration and testing purposes. To deploy the Company News Information Collector on Render for demonstration purposes, follow these steps:
- Go to the Render website.
- Sign up for a new account or log in if you already have one.
- After logging in, click on the "New" button and select "Web Service".
- Connect your GitHub or GitLab account to Render and select the repository containing your project.
-
Fill in the required details:
- Name: Give your web service a name.
- Region: Choose a region closest to your user base.
- Branch: Select the branch you want to deploy (e.g.,
main
). - Build Command:
pip install -r requirements.txt
- Start Command:
uvicorn app:app --host 0.0.0.0 --port $PORT
-
Add the environment variables required for your project in the "Advanced" section:
SUPABASE_URL
SUPABASE_KEY
BROWSERLESS_API_KEY
SERP_API_KEY
OPENAI_API_KEY
-
Click on "Add Environment Variable" and input each variable with its corresponding value from your
.env
file.
- Review the settings and click on "Create Web Service".
- Render will start building and deploying your application. This process may take a few minutes.
- Once the deployment is complete, Render will provide you with a URL for your web service.
- You can now access your application through this URL.
To test your deployed application, send a POST request to the /scrape/
endpoint using a tool like curl
or Postman:
curl -X POST "https://<your-render-url>/scrape/"
Replace <your-render-url>
with the URL provided by Render.
Here is an example of what your Render deployment configuration might look like:
- Name: company-news-collector
- Region: US East (Ohio)
- Branch: main
- Build Command:
pip install -r requirements.txt
- Start Command:
uvicorn app:app --host 0.0.0.0 --port $PORT
This project is licensed under the MIT License - see the LICENSE.md file for details.