text-processing
Looking for an intermediate-strength hash function
I have a static set of ~35000 开发者_开发问答unique ASCII text strings from 20 to 60 bytes each. I want to introduce a unique index in them. Simply numbering would be undesirable for various reasons.[详细]
2023-04-01 21:35 分类:问答Python, loop through lines in a file; if line equals line in another file, return original line
Text file 1 has the following format: \'WORD\': 1 \'MULTIPLE WORDS\': 1 \'WORD\': 2 etc. I.e., a word separated by a colon followed by a number.[详细]
2023-03-31 11:43 分类:问答Obj-C / iOS: Look through a document for any one of several thousand words?
As part of a document reader I\'m writing for iPhone/iPad, I need the following functionality: Search through a do开发者_如何学编程cument of between appx 500 and 10000 words for words and phrases tha[详细]
2023-03-31 07:04 分类:问答Is there a tool for splitting german compound words in java?
I am sucessfully splitting Sentences into words with a StringTokenizer. Is there a tool wh开发者_如何学运维ich is able to split compound words like Projektüberwachung into their parts Projekt and ü[详细]
2023-03-30 23:16 分类:问答Python or command line utility - sort and filter file?
Given data of the form: a b 1.1 c d 2.3 b a 1.1 Is it possible to sort such a file based on the thired column and remove lines where the entry in the third column is duplicated, such that the outpu[详细]
2023-03-30 21:50 分类:问答How to find text files not containing text on Linux?
How do I find files not containing some text on Linux? Basically I\'m looking for the invers开发者_Python百科e of the following[详细]
2023-03-30 19:32 分类:问答Shift letter position in a word
I want a command/function, preferably bash, that takes a word/string and a number and shifts the letter positions in the word开发者_开发技巧 by that number, rotating the overflow back to the beginning[详细]
2023-03-30 03:57 分类:问答Extract items from n-line chunks in a file, count frequency of items for each chunk, Python
I have a text file containing5-line chunks of tab-delimited lines: 1 \\t DESCRIPTION \\t SENTENCE \\t ITEMS[详细]
2023-03-29 21:53 分类:问答How can I extract email addresses from between '<' and '>'?
I\'ve got a list of emails and names from Outlook, semicolon delimited, like this: fname lname <email>; fname2 lname2 <email2>; ... ; fnameN lnameN <emailN>[详细]
2023-03-29 17:15 分类:问答Open source equivalent to opencalais (Preferably PHP or Python?)
Is there an open source equivalent of op开发者_如何学运维encalais - preferably in PHP or Python?Conceptually it\'s an interesting idea. It seems to be parsing basic text or HTML content, then wrapping[详细]
2023-03-25 16:06 分类:问答