from gensim.summarization import summarize import logging logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) f = open("telegraph.txt","r") text = f.read() print(summarize(text)) print(summarize(text, word_count=100)) print(summarize(text, ratio=0.5))
Test output of Donoghue v Stevenson with the output summary constrained to 100 words (I personally think the summariser has done an excellent job - it's calculated that the final paragraph of Lord Atkin's speech provides the best summary of the judgment!!):
My Lords, if your Lordships accept the view that this pleading discloses a relevant cause of action you will be affirming the proposition that by Scots and English law alike a manufacturer of products, which he sells in such a form as to show that he intends them to reach the ultimate consumer in the form in which they left him with no reasonable possibility of intermediate examination, and with the knowledge that the absence of reasonable care in the preparation or putting up of the products will result in an injury to the consumer's life or property, owes a duty to the consumer to take that reasonable care.
I chained this summary into RAKE to run a quick keyword extraction over the summary. The RAKE parameters were as follows:
rake_object = rake.Rake("smartstoplist.txt", 5, 3, 4)
The output was a spot on extraction:
[('reasonable care', 4.0), ('consumer', 1.3333333333333333), ('products', 1.0)]