Ampersand causes some characters to change in the TSV export

TalkBug Collectors

Join LibraryThing to post.

Ampersand causes some characters to change in the TSV export

1bnielsen
Edited: Oct 5, 3:40 am

Hi

I've come across a very weird bug:

Might be related to https://www.librarything.com/topic/246690 but this has nothing to do with less than or greater than signs.

I have a book (book id 272817366) with the following review:
" Størrelserne i Millimeter af Brown & Sharpe's Lære - amerikansk -",

When I export the book as TSV the review field is exported as
" Størrelserne i Millimeter af Brown & Sharpe's Lære - amerikansk -",

If I change the text to
" Størrelserne i Millimeter af Brown and Sharpe's Lære - amerikansk -",

it exports as
" Størrelserne i Millimeter af Brown and Sharpe's Lære - amerikansk -",

I'd really like to hear if anyone else can reproduce this bug. (My guess is that it is enough to create a new minimal book record with the above mentioned line as the entire review.)

I've boiled it down to this from a much larger example, but it can probably be reduced to something even smaller.

ETA: I've deleted the book, since I now understand the bug much better (thanks to lorax) and can work around it. It should be fixed though, since it slowly rots any fields with & and non-ascii characters. I.e. when you edit and save everything looks fine, but the TSV export of that record is screwed. The "workaround" is to replace & with something else like "and".

2lorax
Oct 4, 11:03 am

I know character encodings are weird, but wow, that's a weird one.

3bnielsen
Edited: Oct 4, 11:55 am

>2 lorax: Yes, I wonder how much is necessary to trigger the bug, but something like æøå and the ampersand is. Maybe also the " ?

Also it only affected 1 of the 9500 books in my catalog. And at least once in my experiments I've seen the bug affect the Publisher field in the TSV file too. But not the Comment field.

The other bug with < and > is also weird, but it is two bugs, I think. At least I couldn't get the ampersand-bug to go away by using a hex-escaped value for the ampersand.

I'm not sure I'd like to see the program code for export-to-tsv :-)

4lorax
Oct 4, 1:31 pm

Do you know if this is a relatively new bug? I'm wondering if it might be related to https://www.librarything.com/topic/363470 (which was solved but they may not have solved it for Reviews).

5bnielsen
Edited: Oct 5, 3:16 am

>4 lorax: I've been wondering the same because it hit a fairly old review. I use the TSV export a lot so my scripts croaked when the format of one line changed to ...æ...

I still have no idea about why the bug hit only one review. Maybe I just don't use & that often in a review? (æøå are normally used in Danish and I write all my reviews in Danish, so I'd expect this bug to hit a lot harder if æøå was enough to trigger it.) I'll do a bit of statistics later on.

BTW it is really nice to be able to export just a few books. Otherwise it would have taken me ages to isolate this.

6bnielsen
Oct 5, 3:22 am

>4 lorax: I think you are on to something. So my theory is that when I now edit a review with the conditions for the bug to show up, it will be saved as a mess. (And my guess is that æøå and & in the same field triggers it.) And yes, it must be a new bug.

It is a bit annoying not being able to have a publisher like:
Kristiania og Kjøbenhavn, H. Aschehoug & Co. (W. Nygaard), 1921
but having to rewrite it as
Kristiania og Kjøbenhavn, H. Aschehoug og Co. (W. Nygaard), 1921

but I can live with it. The bad thing is that the bug hits me whenever I edit one of the many books with & and æøå in either Publisher or Review.

7kristilabrie
Oct 9, 1:11 pm

I'm here once a week for a little while, so I'm starring this one to hopefully dive into next week! Unless someone beats me to it. A quirky bug indeed.

8bnielsen
Oct 10, 2:47 am

>7 kristilabrie: Thanks. I've used the workaround on the book that was hit by this, but I think of the bug as some kind of sneaky slow rotting of my catalog, so I hope you find and fix it.

9bnielsen
Oct 28, 1:50 am

Just hit me with another book. I corrected a typo and thus triggered the bug. Any news on fixing it?

10kristilabrie
Nov 6, 10:21 am

>9 bnielsen: Thanks for the reminder, I started testing on this last week and will try to jump back into this today/Friday. Appreciate your patience.

11bnielsen
Nov 6, 2:31 pm

>10 kristilabrie: Thanks. It's much appreciated!

12kristilabrie
Edited: Nov 12, 10:00 am

Okay, while I'm not sure *why* the characters aren't encoded properly, I did find the & bug on both the Review and Publication fields.

A manually added book ( https://www.librarything.com/work/33139686/book/274859967 ) with the 'buggy' phrase at >1 bnielsen: in all text fields:



After changing the "&" to "and" on the bad fields, the non-ascii characters display properly:

13bnielsen
Nov 8, 1:44 pm

>12 kristilabrie: Nice demonstration of the bug in action!

14bnielsen
Dec 14, 3:17 pm

Bump. (I just ran into the bug on another book. I have a lot of books where the bug will hit me if I just touch the review in a minor way, say to correct a typo. So i'd really like this bug to be fixed.)

15kristilabrie
Dec 17, 10:57 am

Thanks for the bump! Holiday craziness is upon us; give us another bump next month? Thanks for your patience.

16bnielsen
Dec 17, 12:06 pm

>15 kristilabrie: No problem. Apparently I get bidden by the bug approximately once a month :-)

17kristilabrie
Dec 17, 12:15 pm

>16 bnielsen: A lovely little character (bug) booster! Thanks. :)