2 min readfrom Machine Learning

[P] citracer: a small CLI tool to trace where a concept comes from in a citation graph

Hi all, I made a small tool that I've been using for my own literature reviews and figured I'd share in case it's useful to anyone else.

It takes a research PDF and a keyword, parses the bibliography with GROBID, finds the references that are cited near each occurrence of the keyword in the text, downloads those papers when they're on arXiv or OpenReview, and recursively walks the resulting graph. The output is an interactive HTML visualization.

There's also a "reverse" mode that uses Semantic Scholar's citation contexts endpoint to find papers citing a given work specifically about a keyword, without downloading any PDFs.

Short demo (2 min): https://youtu.be/0VxWgaKixSI

I built it because I was spending too much time clicking through Google Scholar to figure out which paper introduced a particular idea I'd seen mentioned in passing. It's not a replacement for tools like Connected Papers or Inspire HEP — those answer different questions. This one is narrowly focused on "show me the citations of this PDF that mention X".

Some honest caveats: - It depends on GROBID for parsing, which works well on ML/CS papers but can struggle on other domains. - The reverse mode relies entirely on Semantic Scholar's coverage and citation contexts, which aren't always complete. - Without a free Semantic Scholar API key, things get noticeably slower due to rate limiting. - It's a personal project, so expect rough edges.

The project is still very young and I'm pretty sure it'll only get more useful as it evolves. If anyone is interested in contributing — bug reports, edge cases, parser fixes, new features, doc improvements, anything — it would genuinely be welcome. PRs and issues open.

Repo: https://github.com/marcpinet/citracer PyPI: https://pypi.org/project/citracer/

If you try it on a paper you care about, I'd love to hear whether the chains it produces make sense.

submitted by /u/Roux55
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#natural language processing for spreadsheets
#generative AI for data analysis
#rows.com
#Excel alternatives for data analysis
#data visualization tools
#self-service analytics tools
#business intelligence tools
#google sheets
#collaborative spreadsheet tools
#real-time data collaboration
#financial modeling with spreadsheets
#intelligent data visualization
#real-time collaboration
#interactive charts
#data analysis tools
#spreadsheet API integration
#citracer
#CLI tool
#GROBID
#Semantic Scholar