Wednesday 21 December 2016

How to properly parse URLs with Shift-JIS characters in them?

Take this page for example. Links on that page, for whatever reason, have what appears to be Shift-JIS encoded characters in them. If I do something like thisvar jsdom = require("jsdom"); jsdom.env( "http://ift.tt/2ihNTMj", function (err, window) { var links = window.document.links; console.log(Array.from(links,link => link.href)); } ); those encoded characters are encoded as if we're working in UTF8 even though it's not.What should behttp://afo.wtrpg.com/omc_view.cgi?sort=crname&crname=%82%C2%82%A9PON http://ift.tt/2hc1CXj http://ift.tt/2ihC4pl http://ift.tt/2hc6vj1 http://ift.tt/2ihNPvT instead comes out ashttp://afo.wtrpg.com/omc_view.cgi?sort=crname&crname=%E3%81%A4%E3%81%8BPON http://ift.tt/2ihMOUP http://ift.tt/2hcajAN http://ift.tt/2hc6vj1 http://ift.tt/2ihK3D1

Submitted December 22, 2016 at 04:11AM by DoomTay

No comments:

Post a Comment