Wondering about the best way to take user input on a webpage using JS with node.js and then use that input for web scraping


I am working on a simple web scraping api which can return the links of a bunch of articles with a keyword in the title. It works fine for a single keyword and looks like this:

"use strict"
const PORT = process.env.PORT || 8000
const express = require('express')
const axios = require('axios')
const cheerio = require('cheerio')

const app= express()

const SEARCH_TERM = "Biden"

const news = [
    {
        name: 'google news',
        address: 'https://news.google.com/topstories?hl=en-US&gl=US&ceid=US:en',
        base:'https://news.google.com'
    },
    {
        name: 'yahoo news',
        address: 'https://news.yahoo.com/',
        base: 'https://news.yahoo.com'
    },
    {
        name: 'huffington post',
        address: 'https://www.huffpost.com/',
        base: 'https://www.huffpost.com'
    },
    {
        name: 'cnn',
        address: 'https://www.cnn.com/',
        base: 'https://www.cnn.com'
    },
    {
        name: 'the guardian',
        address: 'https://www.theguardian.com/us',
        base: 'https://www.theguardian.com'
    },
    {
        name: 'usa today',
        address: 'https://www.usatoday.com/',
        base: 'https://www.usatoday.com'
    },
    {
        name: 'business insider',
        address: 'https://www.businessinsider.com/',
        base: 'https://www.businessinsider.com'
    },
    {
        name: 'bbc',
        address: 'https://www.bbc.com/',
        base: 'https://www.bbc.com'
    },
    {
        name: 'vice',
        address: 'https://www.vice.com/',
        base: 'https://www.vice.com'
    },
    {
        name: 'the new york post',
        address: 'https://nypost.com/',
        base: 'https://nypost.com'
    },
    {
        name: 'vox',
        address: 'https://www.vox.com/',
        base: 'https://www.vox.com'
    },
    {
        name: 'the atlantic',
        address: 'https://www.theatlantic.com/',
        base: 'https://www.theatlantic.com'
    },
    {
        name: 'the times of india',
        address: 'https://timesofindia.indiatimes.com/us',
        base: 'https://timesofindia.indiatimes.com'
    },
    {
        name: 'china daily',
        address: 'http://global.chinadaily.com.cn/',
        base: 'http://global.chinadaily.com.cn'
    },
    {
        name: 'the hindu',
        address: 'https://www.thehindu.com/',
        base: 'https://www.thehindu.com'
    },
    {
        name: 'the south china morning post',
        address: 'https://www.scmp.com/',
        base: 'https://www.scmp.com'
    },
    {
        name: 'al jazeera',
        address: 'https://www.aljazeera.com',
        base: 'https://www.aljazeera.com'
    },

]

news.forEach(newspaper => {
    axios.get(newspaper.address)
        .then(response => {
            const html = response.data
            const $ = cheerio.load(html)

            $(`a:contains(${SEARCH_TERM})`, html).each(function() {
                const title = $(this).text().trim()

                let  url = $(this).attr('href')

                if (url.startsWith("/")||url.startsWith(".")) {
                    url = newspaper.base + url
                }

                if (url.startsWith("/")) {
                    url = "NO_URL"
                }

                url = url.replace(/s+/g, '')

                articles.push({
                    title,
                    url: url,
                    source_website: newspaper.name
                })
            })
        })
})

const articles = []

app.get('/news',(req,res) => {
    res.json(articles)
})

app.listen(PORT, () => console.log('server running on PORT ' + PORT))

It returns a bunch of JSON files with the source and link to the articles when you run it with npm – in this case a bunch of articles about joe biden. I want to add the functionality of user input from the web page itself, so a message would pop up saying "Enter your search term: " and after you entered something the articles would appear, ie the search term would get assigned to the SEARCH_TERM const. I tried starting by taking input from the terminal but it would just print out the prompt again after I pressed enter and the first three characters of whatever I had inputed when I tried to use readline or prompt-sync. I should note I am using node.js. Any advice on how to get user input through the webpage would be much appreciated.

Source: JavaSript – Stack Overflow

November 7, 2021
Category : News
Tags: javascript | node.js | user-input

Leave a Reply

Your email address will not be published. Required fields are marked *

Sitemap | Terms | Privacy | Cookies | Advertising

Senior Software Developer

Creator of @LzoMedia I am a backend software developer based in London who likes beautiful code and has an adherence to standards & love's open-source.