Sunday 21 October 2018

[question] iconv-lite not decoding everything properly, even though I'm using proper decoding

Hi, I'm using this piece of code to download a webpage (using request library) and decode everything (using iconv-lite library). The loader function is for finding some elements from the body of the website, then returning them as a JavaScript object.request.get({url: url, encoding: null}, function(error, response, body) { // if webpage exists, process it, otherwise throw 'not found' error if (response.statusCode === 200) { body = iconv.decode(body, "iso-8859-1"); const $ = cheerio.load(body); async function show() { var data = await loader.getDay($, date, html_tags, thumbs, res, image_thumbnail_size); res.send(JSON.stringify(data)); } show(); } else { res.status(404); res.send(JSON.stringify({"error":"No content for this date."})) } }); The pages are encoded in ISO-8859-1 format, and the content is looking normal, there are no bad chars. When I wasn't using iconv-lite, some characters, eg. ü, were looking like this: �. Now, when I'm using the library like in the code provided above, most of the chars are looking good, but some, eg. š are an empty box, even though they're displayed without any problems on the website. I'm sure it's not cheerio's issue, because when I printed the output using res.send(body); or res.send(JSON.stringify({"body":body}));, the empty box character was still present there. Is there a way to fix that?EDIT: I copied the empty box character to Google, and it has changed to š, maybe that's important

Submitted October 22, 2018 at 06:39AM by hamstersztyk

No comments:

Post a Comment