Monthly Archives: April 2012

Worth 100 words, in this case…

Worth 100 words, in this case...

Words appearing in random sample of 1000 10-Q Part II, Items 1A

Continue reading

Image | Posted on by | Leave a comment

Statistically Improbable Trash

Found mere blocks from my house. What are the odds?

Posted in dumb | Leave a comment

SEC disclosure text mining update

As noted in previous posts, I have been working to develop a means of automatically identifying those 10-Q reports with a “risk factors” section, and extracting that section to what I have been calling risk files. My initial very rough … Continue reading

Posted in SEC Project, Security | Leave a comment

Super-quick 10-Q Breakdown

I did some Perl hacking (and I mean hacking of the ugly kind, not the “just another Perl hacker” kind). Of the 58,798 10-Qs I have on hand, 42,601 have a “risk factors” section. I am close to being able … Continue reading

Posted in dumb, Programming, SEC Project, Security | Leave a comment

SEC disclosure text mining (minor) project update

I’ve been having fun dealing with the joys of unstructured text processing. The ambiguity in the previous sentence is deliberate: I mean both the joys of processing unstructured text, as well as the joy of unstructured processing of text. Of … Continue reading

Posted in Programming, SEC Project, Security | Leave a comment