Comments on the submission version of Kuroda (2010) at PACLIC 24


Review 1

This paper only presents an idea of parallel distributed parsing. The content is rather preliminary: no parser implementation, and no empirical results.


Review 2

The paper outlines parallel distributed parsing (PDP), an approach that integrates lexical and sublexical analyses. The paper sketches the approach and then moves on to offer brief comparisons with other approaches to parsing. The main claim of the paper is that the approach helps combat data-sparseness which is identified as one of major problems in current NLP applications.

Unfortunately, Section 2 of the paper is very unclear and virtually incomprehensible: it fails to provide even an intuitive grasp of the proposed method to the reader and simply lists barely sketched algorithms, ideas about patterns etc. Section 3 gives very sketchy comparsions with other frameworks that don't really work as (i) Section 2 is very unclear and (ii) the comparisons are very short.

The paper argues that PDP helps with data sparseness in NLP. This is not substantiated. I suspect the authors may have confused "data sparseness" as used in NLP (to denote lack of data to properly capture statistics/train machine learning for phenomena in the long tail of Zipfian distributions) with the somewhat limited ("sparse") output of a dependency parser as in Table 4 in the paper.

Review 3

This paper presents the proposal for parallel distributed parsing (PDP), for integrating lexical and sublexical analyses. Although the authors themselves acknowledge that the paper is rather preliminary, it is perhaps in a very early stage of work in progress, as the idea and its implications need to be further worked out in more details, with more backing for each of the claims provided. As it is, the paper would be more appropriate for a workshop.

There are typos and missing words(e.g. "because we expect it give us", "of the two case", "like those like the one specified in", "Parallel Word Expert Parsing (WEP)") so the paper should be thoroughly proof-read by a native speaker of English.

In the introduction, in the discussion of the role (or lack of) of "last March" in relation to "reimbursement" it is important to discuss how it relates to attachment disambiguation (last March attaching to offered). It is not clear in the text how the examples provided are supporting the claim for the need to integrate morphological analysis with syntactic analysis. It is also not explicitly discussed in the text how the proposal addresses the problem of data sparseness.

Examples would also help to make the discussions more clear. For instance, in section 2.1. they would help explain what you mean by "the parsing is “distributed” because the resulting components are not only allowed but also required to have redundancies among them" (what kind of redundancies,...) or how you find "morphemes embedded in a word" in example 1.

More detailed comments
- Footnote 1, referred to in page 1, is in page 2.
- The contents of Table 1 should be explained more clearly in the text, as well as the notation adopted (what is encoded in the rows and columns, and in the diagonal, as well as the abbreviations used).
- The following explanation should be rephrased : "is “parallel” in that it gives a set of parses that are independently performed on all of the elementary units, say words or morphemes of the input, in parallel."