2 Replies Latest reply on Jul 19, 2007 11:35 PM by mingfai

    Newsgroup gateway problems

      I have been trying the ClearspaceX newsgroup gateway for some time. In general, it's functional under certain condition but doesn't seem to be usable for my case.


      1. The main problem for my case is that people post messages to my newsgroup in two different encoding, either utf-8 or big5 (traditional chinese). From my test, it seems if the Clearspacex encoding is set to utf-8, than all big5 newsgroup messages cannot be displayed, and vice versa. (i haven't checked in details. the messages should be able imported, but just not displayed)


      Well, it should not be a problem with clearspacex gateway. but this problem basically make the newsgroup import unusable.


      2. for newsgroup user name that ain't from clearspacex, they are all displayed as guest. I would prefer the original name be used, and with an indicator that they are a guest/non-registered user/"from newsgroup".


      btw, I would like to have an option to display or mask imported user name. And it's desirable for the gateway to automatically scan the message content for user names used in the thread.


      3. Line break: content of newsgroup messages are broken to 80 column per line. Sometimes, in reply message, the line will look this:


      YYYYYYY wrote:





      i.e. the original line is 80, and when the reply prefix is added to it, the line is longer than 80 characters and an line break is added to it.


      Jive discussion board interpret the new line as "inline reply" and the message is displayed oddly.


      Once again, this is a problem, but I don't meant the newsgroup gateway has any bug.


      4. It seems for HTML message, sometimes the content are turned to link like the following:



      as the link ain't valid, it's preferable not to display the content like this. I remember there is an option


      5. For imported message, it doesn't sort by post time. I'm not sure what is the current sorting criteria. It is odd because in the what's new, it show the post time, but a message that is posted one month ago could be newer than a message posted 3 mins ago


      Besides, I would like to ask if the Jive forum uses the same gateway, or another (better) implementation. If it's a different one, I may want to try that as well.




      Message was edited by: mingfai

        • Re: Newsgroup gateway problems

          with further live testing, i find only the newsgroup messages that is exactly the same as the gateway import encoding can be imported and displayed. for my case, my import encoding is big5, and all utf-8 and iso (i think is iso8859-1 english) messages cannot be displayed.


          if it's just a Chinese problem, then Jive may choose to ignore this "minority" group. but I think even for US, people may post message to newsgroup in either UTF-8 or iso8859-1, so maybe half the messages cannot be displayed. For my case, it's even worse because people use three different main encoding. (in fact, some may use simplified chinese but they are minority to me)


          Is there a solution or workaround? could any developer give me any hint on how to write some kind of custom filter to handle the different encoding? I suppose, when a newsgroup message is being imported, it is easy to detect the message encoding, and base on the message encoding, I can write custom filter/adapter to do conversion so data could correct insert to the database.


          Thank you.

            • Re: Newsgroup gateway problems

              even if no one seems to care about this, I try to provide more information.


              the Clearspace locale is utf-8, database locale is utf-8, the newsgroup gateway is big5.


              as mentioned before, people use utf-8 or big5 to post newsgroup message.


              and the result:

              - for message title, utf-8 title can be displayed. i.e. the newsgroup gateway encoding setting affect message body only

              - for message body, only big5 content can be displayed. It makes sense because I set it as big5.


              and the most desirable enhancement is:

              - for every message, detect the encoding and use the detected encoding

              - add a field in the database to indicate the detected encoding

              - separate the imported source message and the actual JIVEMESSAGE fields so we could do some conversion at database level if possible