This Bash script allows you to scrape a website and convert its HTML content to Markdown or plain text. It provides flexibility through various options for customization.
-
Clone the Repository:
git clone https://github.com/yourusername/atviaroop.git
cd atviaroop
-
Run the Installation Script:
sudo bash install.sh
This script installs the
ativaroop.sh
script to/usr/local/bin/
and makes it executable. -
Usage Instructions:
- Once installed, you can use the
ativaroop.sh
script from the command line. - Run
ativaroop.sh
with the desired options and a website URL. For example:ativaroop.sh -o output_directory -m -t -c https://example.com
-o
: Specify the output directory (default is "website").-m
: Convert HTML to Markdown.-t
: Convert plain text to Markdown.-c
: Keep HTML files after conversion.
- Once installed, you can use the
-
View Logs (Optional):
- If needed, check the log file for details:
cat scrape_log_<timestamp>.txt
- If needed, check the log file for details:
-
Uninstall (Optional):
- If you wish to uninstall the script, you can manually remove it from
/usr/local/bin/
:sudo rm /usr/local/bin/ativaroop.sh
- If you wish to uninstall the script, you can manually remove it from
Note: Ensure that your users have the necessary permissions to run the installation script and execute the installed script. Also, they should have required dependencies like wget
and pandoc
installed on their system.
-o <output_directory>
: Specify the output directory (default: website)-m
: Convert HTML to Markdown (default: plain text)-t
: Convert plain text to Markdown (default: no conversion)-c
: Keep HTML files after conversion (default: remove)
ar -o output -m https://example.com
For detailed usage information, run:
Download and convert a website:
ar -o my_website -m -c https://example.com
Feel free to customize the script according to your specific needs.