Has anyone asked for a regex search?

TalkTalk about LibraryThing

Join LibraryThing to post.

Has anyone asked for a regex search?

1eqx
Sep 9, 9:30 pm

It would be so great if I could do a regular expression search? Has there been any interest in this?

2bnielsen
Sep 10, 12:38 am

>1 eqx: I export my catalogue as TSV file and do regex searches on that. I don't think LT is keen on adding features that might slow down the whole site. I think the current search possibilities are what Elasticsearch can provide without doing harm to the load of the rest of the system. But yes, there are regex searches I'd like to do on LT directly. Stuff like finding reviews with odd number of apostrophs. (I tend to do them offline and then convert the result to something like:
121775585 OR 121867307 OR 123404748 OR 123569442
i.e. something that is easy to search for in LT.

3waltzmn
Sep 10, 4:09 am

>2 bnielsen:

I know how to do basic regex searches, so I'd probably get some use out of the feature, but I think bnielsen's cautions are worth keeping in mind: Is it useful enough to justify the extra burden on the servers? For me, probably not.

In my particular case, a better mechanism for ensuring that Common Knowledge is consistent in how it names people, places, etc. would eliminate most of the need.

4eqx
Sep 13, 6:50 pm

>2 bnielsen: Yes, server load would be an issue...

5bnielsen
Sep 14, 4:39 am

>4 eqx:

Some of us also have great experience in making regex's that can kill performance quickly.
Like a regex that test if you have text string containing exactly a prime number of 1's.

6Felagund
Sep 14, 4:43 am

>5 bnielsen:
Oooooooh, nice :-D

7waltzmn
Sep 14, 5:07 am

>5 bnielsen: Like a regex that test if you have text string containing exactly a prime number of 1's.

All right, I'll bite. If I promise not to use it on any system you're using :-), can you tell me what the code for that one looked like? I've written some fairly complex regexes for some of my work, but that's out of my league!

8bnielsen
Edited: Sep 14, 5:57 am

Including generating all strings of 1's here it is. (It is very simple: it just discards strings that are exactly a repeat of smaller strings of a fixed size (larger than 1).)
The ++ is a bit of perl magic. It increases the value of a default variable $_ which is then used to construct the next 1-string of length $_. 1 is by convention not a prime number, so that and the empty string is removed by the ^1?$ pattern.
So the vital part is just ^(11+?)\1+$ which is pattern for a string starting with a string of more than one 1 repeated a positive number of times.

perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;'

So an inefficient (but very short) version of the sieve of Eratosthenes.

You can also start the list from a more interesting point:

perl -l -e '$_=250000; (1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;'

will output

250007
250013
250027
...

and knowing that 250007 is prime is probably a more unknown fact than 7 is prime :-)

9waltzmn
Sep 14, 6:07 am

>8 bnielsen: perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;'

You've demonstrated why I often find perl code very confusing. :-)

Thank you. That is impressive.

10bnielsen
Sep 14, 10:48 am

Here's a similar one for recognizing numbers divisible by 7.
Regular expressions are fun, but I'd prefer LT searches to be simple and fast rather than allow stuff like this:

perl -l -e '$_ =~ /^(?!$|0.)(07*(?:18(?2)|29(?3)|3(?4)|4(?5)|5(?7)|6(?9)|$))|(5*(?:07(?4)|18(?5)|29(?7)|4(?1)|6(?3)|3(?9)))(3*(?:18(?1)|29(?2)|07(?9)|4(?4)|5(?5)|6(?7)))(18*(?:07(?3)|29(?5)|5(?1)|6(?2)|3(?7)|4(?9)))(6*(29(?1)|07(?7)|18(?9)|3(?2)|4(?3)|5(?4)))(4*(07(?2)|18(?3)|29(?4)|6(?1)|3(?5)|5(?9)))(29*(07(?5)|18(?7)|3(?1)|4(?2)|5(?3)|6(?4)))
/ && print while ++$_;'